7. 翻译SQL的WHERE语句

  1. # 读取employee数据集
  2. In[48]: employee = pd.read_csv('data/employee.csv')
  3. # 对各项做下了解
  4. In[49]: employee.DEPARTMENT.value_counts().head()
  5. Out[49]: Houston Police Department-HPD 638
  6. Houston Fire Department (HFD) 384
  7. Public Works & Engineering-PWE 343
  8. Health & Human Services 110
  9. Houston Airport System (HAS) 106
  10. Name: DEPARTMENT, dtype: int64
  11. In[50]: employee.GENDER.value_counts()
  12. Out[50]: Male 1397
  13. Female 603
  14. Name: GENDER, dtype: int64
  15. In[51]: employee.BASE_SALARY.describe().astype(int)
  16. Out[51]: count 1886
  17. mean 55767
  18. std 21693
  19. min 24960
  20. 25% 40170
  21. 50% 54461
  22. 75% 66614
  23. max 275000
  24. Name: BASE_SALARY, dtype: int64
  1. # 创建布尔条件,并从'UNIQUE_ID', 'DEPARTMENT', 'GENDER', 'BASE_SALARY'四列选取
  2. In[52]: depts = ['Houston Police Department-HPD',
  3. 'Houston Fire Department (HFD)']
  4. criteria_dept = employee.DEPARTMENT.isin(depts)
  5. criteria_gender = employee.GENDER == 'Female'
  6. criteria_sal = (employee.BASE_SALARY >= 80000) & \
  7. (employee.BASE_SALARY <= 120000)
  8. In[53]: criteria_final = criteria_dept & criteria_gender & criteria_sal
  9. In[54]: select_columns = ['UNIQUE_ID', 'DEPARTMENT', 'GENDER', 'BASE_SALARY']
  10. employee.loc[criteria_final, select_columns].head()
  11. Out[54]:

7. 翻译SQL的WHERE语句 - 图1

更多

  1. # 使用between选取80000到120000之间的薪水
  2. In[55]: criteria_sal = employee.BASE_SALARY.between(80000, 120000)
  3. # 排除最常出现的5家单位
  4. In[56]: top_5_depts = employee.DEPARTMENT.value_counts().index[:5]
  5. criteria = ~employee.DEPARTMENT.isin(top_5_depts)
  6. employee[criteria].head()
  7. Out[56]:

7. 翻译SQL的WHERE语句 - 图2

功能一样的SQL语句是:

  1. SELECT
  2. *
  3. FROM
  4. EMPLOYEE
  5. WHERE
  6. DEPARTMENT not in
  7. (
  8. SELECT
  9. DEPARTMENT
  10. FROM (
  11. SELECT
  12. DEPARTMENT,
  13. COUNT(1) as CT
  14. FROM
  15. EMPLOYEE
  16. GROUP BY
  17. DEPARTMENT
  18. ORDER BY
  19. CT DESC
  20. LIMIT 5
  21. )
  22. );