5. 堆叠面积图,以发现趋势

  1. # 读取meetup_groups数据集
  2. In[66]: meetup = pd.read_csv('data/meetup_groups.csv',
  3. parse_dates=['join_date'],
  4. index_col='join_date')
  5. meetup.head()
  6. Out[66]:

5. 堆叠面积图,以发现趋势 - 图1

  1. # 算出每周加入每个组的人
  2. In[67]: group_count = meetup.groupby([pd.Grouper(freq='W'), 'group']).size()
  3. group_count.head()
  4. Out[67]: join_date group
  5. 2010-11-07 houstonr 5
  6. 2010-11-14 houstonr 11
  7. 2010-11-21 houstonr 2
  8. 2010-12-05 houstonr 1
  9. 2011-01-16 houstonr 2
  10. dtype: int64
  1. # 将数据表unstack
  2. In[68]: gc2 = group_count.unstack('group', fill_value=0)
  3. gc2.tail()
  4. Out[68]:

5. 堆叠面积图,以发现趋势 - 图2

  1. # 做累积求和
  2. In[69]: group_total = gc2.cumsum()
  3. group_total.tail()
  4. Out[69]:

5. 堆叠面积图,以发现趋势 - 图3

  1. # 将每行分开,已找到其在总数中的百分比
  2. In[70]: row_total = group_total.sum(axis='columns')
  3. group_cum_pct = group_total.div(row_total, axis='index')
  4. group_cum_pct.tail()
  5. Out[70]:

5. 堆叠面积图,以发现趋势 - 图4

  1. # 话堆叠面积图
  2. In[71]: ax = group_cum_pct.plot(kind='area', figsize=(18,4),
  3. cmap='Greys', xlim=('2013-6', None),
  4. ylim=(0, 1), legend=False)
  5. ax.figure.suptitle('Houston Meetup Groups', size=25)
  6. ax.set_xlabel('')
  7. ax.yaxis.tick_right()
  8. plot_kwargs = dict(xycoords='axes fraction', size=15)
  9. ax.annotate(xy=(.1, .7), s='R Users', color='w', **plot_kwargs)
  10. ax.annotate(xy=(.25, .16), s='Data Visualization', color='k', **plot_kwargs)
  11. ax.annotate(xy=(.5, .55), s='Energy Data Science', color='k', **plot_kwargs)
  12. ax.annotate(xy=(.83, .07), s='Data Science', color='k', **plot_kwargs)
  13. ax.annotate(xy=(.86, .78), s='Machine Learning', color='w', **plot_kwargs)
  14. Out[71]: Text(0.86,0.78,'Machine Learning')

5. 堆叠面积图,以发现趋势 - 图5

更多

  1. # 用饼图查看每组随时间的分布情况
  2. In[72]: pie_data = group_cum_pct.asfreq('3MS', method='bfill') \
  3. .tail(6).to_period('M').T
  4. pie_data
  5. Out[72]:

5. 堆叠面积图,以发现趋势 - 图6

  1. In[73]: from matplotlib.cm import Greys
  2. greys = Greys(np.arange(50,250,40))
  3. ax_array = pie_data.plot(kind='pie', subplots=True,
  4. layout=(2,3), labels=None,
  5. autopct='%1.0f%%', pctdistance=1.22,
  6. colors=greys)
  7. ax1 = ax_array[0, 0]
  8. ax1.figure.legend(ax1.patches, pie_data.index, ncol=3)
  9. for ax in ax_array.flatten():
  10. ax.xaxis.label.set_visible(True)
  11. ax.set_xlabel(ax.get_ylabel())
  12. ax.set_ylabel('')
  13. ax1.figure.subplots_adjust(hspace=.3)
  14. Out[73]:

5. 堆叠面积图,以发现趋势 - 图7