第07章分组聚合、过滤、转换 - 6. 检查分组对象 - 《Pandas Cookbook 带注释源码》

6. 检查分组对象

#  查看分组对象的类型
 In[42]: college = pd.read_csv('data/college.csv')
         grouped = college.groupby(['STABBR', 'RELAFFIL'])
         type(grouped)
Out[42]: pandas.core.groupby.DataFrameGroupBy

#  用dir函数找到该对象所有的可用函数
 In[43]: print([attr for attr in dir(grouped) if not attr.startswith('_')])
['CITY', 'CURROPER', 'DISTANCEONLY', 'GRAD_DEBT_MDN_SUPP', 'HBCU', 'INSTNM', 'MD_EARN_WNE_P10', 'MENONLY', 'PCTFLOAN', 'PCTPELL', 'PPTUG_EF', 'RELAFFIL', 'SATMTMID', 'SATVRMID', 'STABBR', 'UG25ABV', 'UGDS', 'UGDS_2MOR', 'UGDS_AIAN', 'UGDS_ASIAN', 'UGDS_BLACK', 'UGDS_HISP', 'UGDS_NHPI', 'UGDS_NRA', 'UGDS_UNKN', 'UGDS_WHITE', 'WOMENONLY', 'agg', 'aggregate', 'all', 'any', 'apply', 'backfill', 'bfill', 'boxplot', 'corr', 'corrwith', 'count', 'cov', 'cumcount', 'cummax', 'cummin', 'cumprod', 'cumsum', 'describe', 'diff', 'dtypes', 'expanding', 'ffill', 'fillna', 'filter', 'first', 'get_group', 'groups', 'head', 'hist', 'idxmax', 'idxmin', 'indices', 'last', 'mad', 'max', 'mean', 'median', 'min', 'ndim', 'ngroup', 'ngroups', 'nth', 'nunique', 'ohlc', 'pad', 'pct_change', 'plot', 'prod', 'quantile', 'rank', 'resample', 'rolling', 'sem', 'shift', 'size', 'skew', 'std', 'sum', 'tail', 'take', 'transform', 'tshift', 'var']

#  用ngroups属性查看分组的数量
 In[44]: grouped.ngroups
Out[44]: 112

#  查看每个分组的唯一识别标签，groups属性是一个字典，包含每个独立分组与行索引标签的对应
 In[45]: groups = list(grouped.groups.keys())
         groups[:6]
Out[45]: [('AK', 0), ('AK', 1), ('AL', 0), ('AL', 1), ('AR', 0), ('AR', 1)]

#  用get_group，传入分组标签的元组。例如，获取佛罗里达州所有与宗教相关的学校
 In[46]: grouped.get_group(('FL', 1)).head()
Out[46]:

#  groupby对象是一个可迭代对象，可以挨个查看每个独立分组
 In[47]: from IPython.display import display
 In[48]: i = 0
         for name, group in grouped:
             print(name)
             display(group.head(2))
             i += 1
             if i == 5:
                 break

#  groupby对象使用head方法，可以在一个DataFrame钟显示每个分组的头几行
 In[49]: grouped.head(2).head(6)
Out[49]:

#  nth方法可以选出每个分组指定行的数据，下面选出的是第1行和最后1行
 In[50]: grouped.nth([1, -1]).head(8)
Out[50]:

6. 检查分组对象

6. 检查分组对象

更多