6. Seaborn和Pandas的不同点

  1. # 读取employee数据集
  2. In[74]: employee = pd.read_csv('data/employee.csv',
  3. parse_dates=['HIRE_DATE', 'JOB_DATE'])
  4. employee.head()
  5. Out[74]:

6. Seaborn和Pandas的不同点 - 图1

  1. # 用seaborn画出每个部门的柱状图
  2. In[75]: import seaborn as sns
  3. In[76]: sns.countplot(y='DEPARTMENT', data=employee)
  4. Out[76]: <matplotlib.axes._subplots.AxesSubplot at 0x11e287128>

6. Seaborn和Pandas的不同点 - 图2

  1. # 要是用pandas来做,需要先聚合数据
  2. In[77]: employee['DEPARTMENT'].value_counts().plot('barh')
  3. Out[77]: <matplotlib.axes._subplots.AxesSubplot at 0x11e30a240>

6. Seaborn和Pandas的不同点 - 图3

  1. # 用seaborn找到每个种族的平均工资
  2. In[78]: ax = sns.barplot(x='RACE', y='BASE_SALARY', data=employee)
  3. ax.figure.set_size_inches(16, 4)
  4. Out[78]:

6. Seaborn和Pandas的不同点 - 图4

  1. # 用pandas来做,需要先按照race分组
  2. In[79]: avg_sal = employee.groupby('RACE', sort=False)['BASE_SALARY'].mean()
  3. ax = avg_sal.plot(kind='bar', rot=0, figsize=(16,4), width=.8)
  4. ax.set_xlim(-.5, 5.5)
  5. ax.set_ylabel('Mean Salary')
  6. Out[79]: Text(0,0.5,'Mean Salary')

6. Seaborn和Pandas的不同点 - 图5

  1. # seaborn还支持在分组内使用第三个参数
  2. In[80]: ax = sns.barplot(x='RACE', y='BASE_SALARY', hue='GENDER',
  3. data=employee, palette='Greys')
  4. ax.figure.set_size_inches(16,4)
  5. Out[80]:

6. Seaborn和Pandas的不同点 - 图6

  1. # pandas则要对race和gender同时分组,并对gender做unstack
  2. In[81]: employee.groupby(['RACE', 'GENDER'], sort=False)['BASE_SALARY'] \
  3. .mean().unstack('GENDER') \
  4. .plot(kind='bar', figsize=(16,4), rot=0,
  5. width=.8, cmap='Greys')
  6. Out[81]: <matplotlib.axes._subplots.AxesSubplot at 0x11ecf45c0>

6. Seaborn和Pandas的不同点 - 图7

  1. # 用seaborn话race和gender的盒图
  2. In[82]: ax = sns.boxplot(x='GENDER', y='BASE_SALARY', data=employee, hue='RACE', palette='Greys')
  3. ax.figure.set_size_inches(14,4)
  4. Out[82]:

6. Seaborn和Pandas的不同点 - 图8

  1. # pandas则要为gender创建两个独立的Axes,然后根据race画盒图
  2. In[83]: fig, ax_array = plt.subplots(1, 2, figsize=(14,4), sharey=True)
  3. for g, ax in zip(['Female', 'Male'], ax_array):
  4. employee.query('GENDER== @g') \
  5. .boxplot(by='RACE', column='BASE_SALARY', ax=ax, rot=20)
  6. ax.set_title(g + ' Salary')
  7. ax.set_xlabel('')
  8. fig.suptitle('')
  9. /Users/Ted/anaconda/lib/python3.6/site-packages/numpy/core/fromnumeric.py:57: FutureWarning: reshape is deprecated and will raise in a subsequent release. Please use .values.reshape(...) instead
  10. return getattr(obj, method)(*args, **kwds)
  11. Out[83]: Text(0.5,0.98,'')

6. Seaborn和Pandas的不同点 - 图9

  1. # pandas也可以列表分离多个变量,但是画的图不优雅
  2. In[84]: ax = employee.boxplot(by=['GENDER', 'RACE'],
  3. column='BASE_SALARY',
  4. figsize=(16,4), rot=15)
  5. ax.figure.suptitle('')
  6. /Users/Ted/anaconda/lib/python3.6/site-packages/numpy/core/fromnumeric.py:57: FutureWarning: reshape is deprecated and will raise in a subsequent release. Please use .values.reshape(...) instead
  7. return getattr(obj, method)(*args, **kwds)
  8. Out[84]: Text(0.5,0.98,'')

6. Seaborn和Pandas的不同点 - 图10