第08章数据清理 - 8. 当多个变量被存储为列名时进行清理 - 《Pandas Cookbook 带注释源码》

8. 当多个变量被存储为列名时进行清理

#  读取weightlifting数据集
 In[57]:weightlifting = pd.read_csv('data/weightlifting_men.csv')
        weightlifting
out[57]:

#  用melt方法，将sex_age放入一个单独的列
 In[58]:wl_melt = weightlifting.melt(id_vars='Weight Category', 
                                     var_name='sex_age', 
                                     value_name='Qual Total')
        wl_melt.head()
out[58]:

#  用split方法将sex_age列分为两列
 In[59]:sex_age = wl_melt['sex_age'].str.split(expand=True)
        sex_age.head()
out[59]:      0         1
      0     M35     35-39
      1     M35     35-39
      2     M35     35-39
      3     M35     35-39
      4     M35     35-39

#  给列起名
 In[60]:sex_age.columns = ['Sex', 'Age Group']
        sex_age.head()
out[60]:

#  只取出字符串中的M
 In[61]:sex_age['Sex'] = sex_age['Sex'].str[0]
        sex_age.head()
out[61]:

#  用concat方法，将sex_age,与wl_cat_total连接起来
 In[62]:wl_cat_total = wl_melt[['Weight Category', 'Qual Total']]
        wl_tidy = pd.concat([sex_age, wl_cat_total], axis='columns')
        wl_tidy.head()
out[62]:

#  上面的结果也可以如下实现
 In[63]:cols = ['Weight Category', 'Qual Total']
        sex_age[cols] = wl_melt[cols]

#  也可以通过assign的方法，动态加载新的列
 In[64]: age_group = wl_melt.sex_age.str.extract('(\d{2}[-+](?:\d{2})?)', expand=False)
         sex = wl_melt.sex_age.str[0]
         new_cols = {'Sex':sex, 
                     'Age Group': age_group}
 In[65]: wl_tidy2 = wl_melt.assign(**new_cols).drop('sex_age', axis='columns')
         wl_tidy2.head()
out[65]:

#  判断两种方法是否等效
 In[66]: wl_tidy2.sort_index(axis=1).equals(wl_tidy.sort_index(axis=1))
out[66]: True

8. 当多个变量被存储为列名时进行清理

8. 当多个变量被存储为列名时进行清理

更多