多重索引(MultiIndexing)

The multindexing docs.

Creating a multi-index from a labeled frame

  1. In [53]: df = pd.DataFrame({'row' : [0,1,2],
  2. ....: 'One_X' : [1.1,1.1,1.1],
  3. ....: 'One_Y' : [1.2,1.2,1.2],
  4. ....: 'Two_X' : [1.11,1.11,1.11],
  5. ....: 'Two_Y' : [1.22,1.22,1.22]}); df
  6. ....:
  7. Out[53]:
  8. row One_X One_Y Two_X Two_Y
  9. 0 0 1.1 1.2 1.11 1.22
  10. 1 1 1.1 1.2 1.11 1.22
  11. 2 2 1.1 1.2 1.11 1.22
  12. # As Labelled Index
  13. In [54]: df = df.set_index('row');df
  14. Out[54]:
  15. One_X One_Y Two_X Two_Y
  16. row
  17. 0 1.1 1.2 1.11 1.22
  18. 1 1.1 1.2 1.11 1.22
  19. 2 1.1 1.2 1.11 1.22
  20. # With Hierarchical Columns
  21. In [55]: df.columns = pd.MultiIndex.from_tuples([tuple(c.split('_')) for c in df.columns]);df
  22. Out[55]:
  23. One Two
  24. X Y X Y
  25. row
  26. 0 1.1 1.2 1.11 1.22
  27. 1 1.1 1.2 1.11 1.22
  28. 2 1.1 1.2 1.11 1.22
  29. # Now stack & Reset
  30. In [56]: df = df.stack(0).reset_index(1);df
  31. Out[56]:
  32. level_1 X Y
  33. row
  34. 0 One 1.10 1.20
  35. 0 Two 1.11 1.22
  36. 1 One 1.10 1.20
  37. 1 Two 1.11 1.22
  38. 2 One 1.10 1.20
  39. 2 Two 1.11 1.22
  40. # And fix the labels (Notice the label 'level_1' got added automatically)
  41. In [57]: df.columns = ['Sample','All_X','All_Y'];df
  42. Out[57]:
  43. Sample All_X All_Y
  44. row
  45. 0 One 1.10 1.20
  46. 0 Two 1.11 1.22
  47. 1 One 1.10 1.20
  48. 1 Two 1.11 1.22
  49. 2 One 1.10 1.20
  50. 2 Two 1.11 1.22

Arithmetic

Performing arithmetic with a multi-index that needs broadcasting

  1. In [58]: cols = pd.MultiIndex.from_tuples([ (x,y) for x in ['A','B','C'] for y in ['O','I']])
  2. In [59]: df = pd.DataFrame(np.random.randn(2,6),index=['n','m'],columns=cols); df
  3. Out[59]:
  4. A B C
  5. O I O I O I
  6. n 1.920906 -0.388231 -2.314394 0.665508 0.402562 0.399555
  7. m -1.765956 0.850423 0.388054 0.992312 0.744086 -0.739776
  8. In [60]: df = df.div(df['C'],level=1); df
  9. Out[60]:
  10. A B C
  11. O I O I O I
  12. n 4.771702 -0.971660 -5.749162 1.665625 1.0 1.0
  13. m -2.373321 -1.149568 0.521518 -1.341367 1.0 1.0

Slicing

Slicing a multi-index with xs

  1. In [61]: coords = [('AA','one'),('AA','six'),('BB','one'),('BB','two'),('BB','six')]
  2. In [62]: index = pd.MultiIndex.from_tuples(coords)
  3. In [63]: df = pd.DataFrame([11,22,33,44,55],index,['MyData']); df
  4. Out[63]:
  5. MyData
  6. AA one 11
  7. six 22
  8. BB one 33
  9. two 44
  10. six 55

To take the cross section of the 1st level and 1st axis the index:

  1. In [64]: df.xs('BB',level=0,axis=0) #Note : level and axis are optional, and default to zero
  2. Out[64]:
  3. MyData
  4. one 33
  5. two 44
  6. six 55

…and now the 2nd level of the 1st axis.

  1. In [65]: df.xs('six',level=1,axis=0)
  2. Out[65]:
  3. MyData
  4. AA 22
  5. BB 55

Slicing a multi-index with xs, method #2

  1. In [66]: index = list(itertools.product(['Ada','Quinn','Violet'],['Comp','Math','Sci']))
  2. In [67]: headr = list(itertools.product(['Exams','Labs'],['I','II']))
  3. In [68]: indx = pd.MultiIndex.from_tuples(index,names=['Student','Course'])
  4. In [69]: cols = pd.MultiIndex.from_tuples(headr) #Notice these are un-named
  5. In [70]: data = [[70+x+y+(x*y)%3 for x in range(4)] for y in range(9)]
  6. In [71]: df = pd.DataFrame(data,indx,cols); df
  7. Out[71]:
  8. Exams Labs
  9. I II I II
  10. Student Course
  11. Ada Comp 70 71 72 73
  12. Math 71 73 75 74
  13. Sci 72 75 75 75
  14. Quinn Comp 73 74 75 76
  15. Math 74 76 78 77
  16. Sci 75 78 78 78
  17. Violet Comp 76 77 78 79
  18. Math 77 79 81 80
  19. Sci 78 81 81 81
  20. In [72]: All = slice(None)
  21. In [73]: df.loc['Violet']
  22. Out[73]:
  23. Exams Labs
  24. I II I II
  25. Course
  26. Comp 76 77 78 79
  27. Math 77 79 81 80
  28. Sci 78 81 81 81
  29. In [74]: df.loc[(All,'Math'),All]
  30. Out[74]:
  31. Exams Labs
  32. I II I II
  33. Student Course
  34. Ada Math 71 73 75 74
  35. Quinn Math 74 76 78 77
  36. Violet Math 77 79 81 80
  37. In [75]: df.loc[(slice('Ada','Quinn'),'Math'),All]
  38. Out[75]:
  39. Exams Labs
  40. I II I II
  41. Student Course
  42. Ada Math 71 73 75 74
  43. Quinn Math 74 76 78 77
  44. In [76]: df.loc[(All,'Math'),('Exams')]
  45. Out[76]:
  46. I II
  47. Student Course
  48. Ada Math 71 73
  49. Quinn Math 74 76
  50. Violet Math 77 79
  51. In [77]: df.loc[(All,'Math'),(All,'II')]
  52. Out[77]:
  53. Exams Labs
  54. II II
  55. Student Course
  56. Ada Math 73 74
  57. Quinn Math 76 77
  58. Violet Math 79 80

Setting portions of a multi-index with xs

Sorting

Sort by specific column or an ordered list of columns, with a multi-index

  1. In [78]: df.sort_values(by=('Labs', 'II'), ascending=False)
  2. Out[78]:
  3. Exams Labs
  4. I II I II
  5. Student Course
  6. Violet Sci 78 81 81 81
  7. Math 77 79 81 80
  8. Comp 76 77 78 79
  9. Quinn Sci 75 78 78 78
  10. Math 74 76 78 77
  11. Comp 73 74 75 76
  12. Ada Sci 72 75 75 75
  13. Math 71 73 75 74
  14. Comp 70 71 72 73

Partial Selection, the need for sortedness;

Levels

Prepending a level to a multiindex

Flatten Hierarchical columns