具有层次索引的高级索引

Syntactically integrating MultiIndex in advanced indexing with .loc is a bit challenging, but we’ve made every effort to do so. In general, MultiIndex keys take the form of tuples. For example, the following works as you would expect:

  1. In [36]: df = df.T
  2. In [37]: df
  3. Out[37]:
  4. A B C
  5. first second
  6. bar one 0.895717 0.410835 -1.413681
  7. two 0.805244 0.813850 1.607920
  8. baz one -1.206412 0.132003 1.024180
  9. two 2.565646 -0.827317 0.569605
  10. foo one 1.431256 -0.076467 0.875906
  11. two 1.340309 -1.187678 -2.211372
  12. qux one -1.170299 1.130127 0.974466
  13. two -0.226169 -1.436737 -2.006747
  14. In [38]: df.loc[('bar', 'two'),]
  15. Out[38]:
  16. A 0.805244
  17. B 0.813850
  18. C 1.607920
  19. Name: (bar, two), dtype: float64

Note that df.loc['bar', 'two'] would also work in this example, but this shorthand notation can lead to ambiguity in general.

If you also want to index a specific column with .loc, you must use a tuple like this:

  1. In [39]: df.loc[('bar', 'two'), 'A']
  2. Out[39]: 0.80524402538637851

You don’t have to specify all levels of the MultiIndex by passing only the first elements of the tuple. For example, you can use “partial” indexing to get all elements with bar in the first level as follows:

df.loc[‘bar’]

This is a shortcut for the slightly more verbose notation df.loc[('bar',),] (equivalent to df.loc['bar',] in this example).

“Partial” slicing also works quite nicely.

  1. In [40]: df.loc['baz':'foo']
  2. Out[40]:
  3. A B C
  4. first second
  5. baz one -1.206412 0.132003 1.024180
  6. two 2.565646 -0.827317 0.569605
  7. foo one 1.431256 -0.076467 0.875906
  8. two 1.340309 -1.187678 -2.211372

You can slice with a ‘range’ of values, by providing a slice of tuples.

  1. In [41]: df.loc[('baz', 'two'):('qux', 'one')]
  2. Out[41]:
  3. A B C
  4. first second
  5. baz two 2.565646 -0.827317 0.569605
  6. foo one 1.431256 -0.076467 0.875906
  7. two 1.340309 -1.187678 -2.211372
  8. qux one -1.170299 1.130127 0.974466
  9. In [42]: df.loc[('baz', 'two'):'foo']
  10. Out[42]:
  11. A B C
  12. first second
  13. baz two 2.565646 -0.827317 0.569605
  14. foo one 1.431256 -0.076467 0.875906
  15. two 1.340309 -1.187678 -2.211372

Passing a list of labels or tuples works similar to reindexing:

  1. In [43]: df.loc[[('bar', 'two'), ('qux', 'one')]]
  2. Out[43]:
  3. A B C
  4. first second
  5. bar two 0.805244 0.813850 1.607920
  6. qux one -1.170299 1.130127 0.974466

Note: It is important to note that tuples and lists are not treated identically in pandas when it comes to indexing. Whereas a tuple is interpreted as one multi-level key, a list is used to specify several keys. Or in other words, tuples go horizontally (traversing levels), lists go vertically (scanning levels).

Importantly, a list of tuples indexes several complete MultiIndex keys, whereas a tuple of lists refer to several values within a level:

  1. In [44]: s = pd.Series([1, 2, 3, 4, 5, 6],
  2. ....: index=pd.MultiIndex.from_product([["A", "B"], ["c", "d", "e"]]))
  3. ....:
  4. In [45]: s.loc[[("A", "c"), ("B", "d")]] # list of tuples
  5. Out[45]:
  6. A c 1
  7. B d 5
  8. dtype: int64
  9. In [46]: s.loc[(["A", "B"], ["c", "d"])] # tuple of lists
  10. Out[46]:
  11. A c 1
  12. d 2
  13. B c 4
  14. d 5
  15. dtype: int64

Using slicers

You can slice a MultiIndex by providing multiple indexers.

You can provide any of the selectors as if you are indexing by label, see Selection by Label, including slices, lists of labels, labels, and boolean indexers.

You can use slice(None) to select all the contents of that level. You do not need to specify all the deeper levels, they will be implied as slice(None).

As usual, both sides of the slicers are included as this is label indexing.

警告

You should specify all axes in the .loc specifier, meaning the indexer for the index and for the columns. There are some ambiguous cases where the passed indexer could be mis-interpreted as indexing both axes, rather than into say the MultiIndex for the rows.

You should do this:

  1. df.loc[(slice('A1','A3'),.....), :]

You should not do this:

  1. df.loc[(slice('A1','A3'),.....)]
  1. In [47]: def mklbl(prefix,n):
  2. ....: return ["%s%s" % (prefix,i) for i in range(n)]
  3. ....:
  4. In [48]: miindex = pd.MultiIndex.from_product([mklbl('A',4),
  5. ....: mklbl('B',2),
  6. ....: mklbl('C',4),
  7. ....: mklbl('D',2)])
  8. ....:
  9. In [49]: micolumns = pd.MultiIndex.from_tuples([('a','foo'),('a','bar'),
  10. ....: ('b','foo'),('b','bah')],
  11. ....: names=['lvl0', 'lvl1'])
  12. ....:
  13. In [50]: dfmi = pd.DataFrame(np.arange(len(miindex)*len(micolumns)).reshape((len(miindex),len(micolumns))),
  14. ....: index=miindex,
  15. ....: columns=micolumns).sort_index().sort_index(axis=1)
  16. ....:
  17. In [51]: dfmi
  18. Out[51]:
  19. lvl0 a b
  20. lvl1 bar foo bah foo
  21. A0 B0 C0 D0 1 0 3 2
  22. D1 5 4 7 6
  23. C1 D0 9 8 11 10
  24. D1 13 12 15 14
  25. C2 D0 17 16 19 18
  26. D1 21 20 23 22
  27. C3 D0 25 24 27 26
  28. ... ... ... ... ...
  29. A3 B1 C0 D1 229 228 231 230
  30. C1 D0 233 232 235 234
  31. D1 237 236 239 238
  32. C2 D0 241 240 243 242
  33. D1 245 244 247 246
  34. C3 D0 249 248 251 250
  35. D1 253 252 255 254
  36. [64 rows x 4 columns]

Basic multi-index slicing using slices, lists, and labels.

  1. In [52]: dfmi.loc[(slice('A1','A3'), slice(None), ['C1', 'C3']), :]
  2. Out[52]:
  3. lvl0 a b
  4. lvl1 bar foo bah foo
  5. A1 B0 C1 D0 73 72 75 74
  6. D1 77 76 79 78
  7. C3 D0 89 88 91 90
  8. D1 93 92 95 94
  9. B1 C1 D0 105 104 107 106
  10. D1 109 108 111 110
  11. C3 D0 121 120 123 122
  12. ... ... ... ... ...
  13. A3 B0 C1 D1 205 204 207 206
  14. C3 D0 217 216 219 218
  15. D1 221 220 223 222
  16. B1 C1 D0 233 232 235 234
  17. D1 237 236 239 238
  18. C3 D0 249 248 251 250
  19. D1 253 252 255 254
  20. [24 rows x 4 columns]

You can use pandas.IndexSlice to facilitate a more natural syntax using :, rather than using slice(None).

  1. In [53]: idx = pd.IndexSlice
  2. In [54]: dfmi.loc[idx[:, :, ['C1', 'C3']], idx[:, 'foo']]
  3. Out[54]:
  4. lvl0 a b
  5. lvl1 foo foo
  6. A0 B0 C1 D0 8 10
  7. D1 12 14
  8. C3 D0 24 26
  9. D1 28 30
  10. B1 C1 D0 40 42
  11. D1 44 46
  12. C3 D0 56 58
  13. ... ... ...
  14. A3 B0 C1 D1 204 206
  15. C3 D0 216 218
  16. D1 220 222
  17. B1 C1 D0 232 234
  18. D1 236 238
  19. C3 D0 248 250
  20. D1 252 254
  21. [32 rows x 2 columns]

It is possible to perform quite complicated selections using this method on multiple axes at the same time.

  1. In [55]: dfmi.loc['A1', (slice(None), 'foo')]
  2. Out[55]:
  3. lvl0 a b
  4. lvl1 foo foo
  5. B0 C0 D0 64 66
  6. D1 68 70
  7. C1 D0 72 74
  8. D1 76 78
  9. C2 D0 80 82
  10. D1 84 86
  11. C3 D0 88 90
  12. ... ... ...
  13. B1 C0 D1 100 102
  14. C1 D0 104 106
  15. D1 108 110
  16. C2 D0 112 114
  17. D1 116 118
  18. C3 D0 120 122
  19. D1 124 126
  20. [16 rows x 2 columns]
  21. In [56]: dfmi.loc[idx[:, :, ['C1', 'C3']], idx[:, 'foo']]
  22. Out[56]:
  23. lvl0 a b
  24. lvl1 foo foo
  25. A0 B0 C1 D0 8 10
  26. D1 12 14
  27. C3 D0 24 26
  28. D1 28 30
  29. B1 C1 D0 40 42
  30. D1 44 46
  31. C3 D0 56 58
  32. ... ... ...
  33. A3 B0 C1 D1 204 206
  34. C3 D0 216 218
  35. D1 220 222
  36. B1 C1 D0 232 234
  37. D1 236 238
  38. C3 D0 248 250
  39. D1 252 254
  40. [32 rows x 2 columns]

Using a boolean indexer you can provide selection related to the values.

  1. In [57]: mask = dfmi[('a', 'foo')] > 200
  2. In [58]: dfmi.loc[idx[mask, :, ['C1', 'C3']], idx[:, 'foo']]
  3. Out[58]:
  4. lvl0 a b
  5. lvl1 foo foo
  6. A3 B0 C1 D1 204 206
  7. C3 D0 216 218
  8. D1 220 222
  9. B1 C1 D0 232 234
  10. D1 236 238
  11. C3 D0 248 250
  12. D1 252 254

You can also specify the axis argument to .loc to interpret the passed slicers on a single axis.

  1. In [59]: dfmi.loc(axis=0)[:, :, ['C1', 'C3']]
  2. Out[59]:
  3. lvl0 a b
  4. lvl1 bar foo bah foo
  5. A0 B0 C1 D0 9 8 11 10
  6. D1 13 12 15 14
  7. C3 D0 25 24 27 26
  8. D1 29 28 31 30
  9. B1 C1 D0 41 40 43 42
  10. D1 45 44 47 46
  11. C3 D0 57 56 59 58
  12. ... ... ... ... ...
  13. A3 B0 C1 D1 205 204 207 206
  14. C3 D0 217 216 219 218
  15. D1 221 220 223 222
  16. B1 C1 D0 233 232 235 234
  17. D1 237 236 239 238
  18. C3 D0 249 248 251 250
  19. D1 253 252 255 254
  20. [32 rows x 4 columns]

Furthermore you can set the values using the following methods.

  1. In [60]: df2 = dfmi.copy()
  2. In [61]: df2.loc(axis=0)[:, :, ['C1', 'C3']] = -10
  3. In [62]: df2
  4. Out[62]:
  5. lvl0 a b
  6. lvl1 bar foo bah foo
  7. A0 B0 C0 D0 1 0 3 2
  8. D1 5 4 7 6
  9. C1 D0 -10 -10 -10 -10
  10. D1 -10 -10 -10 -10
  11. C2 D0 17 16 19 18
  12. D1 21 20 23 22
  13. C3 D0 -10 -10 -10 -10
  14. ... ... ... ... ...
  15. A3 B1 C0 D1 229 228 231 230
  16. C1 D0 -10 -10 -10 -10
  17. D1 -10 -10 -10 -10
  18. C2 D0 241 240 243 242
  19. D1 245 244 247 246
  20. C3 D0 -10 -10 -10 -10
  21. D1 -10 -10 -10 -10
  22. [64 rows x 4 columns]

You can use a right-hand-side of an alignable object as well.

  1. In [63]: df2 = dfmi.copy()
  2. In [64]: df2.loc[idx[:, :, ['C1', 'C3']], :] = df2 * 1000
  3. In [65]: df2
  4. Out[65]:
  5. lvl0 a b
  6. lvl1 bar foo bah foo
  7. A0 B0 C0 D0 1 0 3 2
  8. D1 5 4 7 6
  9. C1 D0 9000 8000 11000 10000
  10. D1 13000 12000 15000 14000
  11. C2 D0 17 16 19 18
  12. D1 21 20 23 22
  13. C3 D0 25000 24000 27000 26000
  14. ... ... ... ... ...
  15. A3 B1 C0 D1 229 228 231 230
  16. C1 D0 233000 232000 235000 234000
  17. D1 237000 236000 239000 238000
  18. C2 D0 241 240 243 242
  19. D1 245 244 247 246
  20. C3 D0 249000 248000 251000 250000
  21. D1 253000 252000 255000 254000
  22. [64 rows x 4 columns]

Cross-section

The xs method of DataFrame additionally takes a level argument to make selecting data at a particular level of a MultiIndex easier.

  1. In [66]: df
  2. Out[66]:
  3. A B C
  4. first second
  5. bar one 0.895717 0.410835 -1.413681
  6. two 0.805244 0.813850 1.607920
  7. baz one -1.206412 0.132003 1.024180
  8. two 2.565646 -0.827317 0.569605
  9. foo one 1.431256 -0.076467 0.875906
  10. two 1.340309 -1.187678 -2.211372
  11. qux one -1.170299 1.130127 0.974466
  12. two -0.226169 -1.436737 -2.006747
  13. In [67]: df.xs('one', level='second')
  14. Out[67]:
  15. A B C
  16. first
  17. bar 0.895717 0.410835 -1.413681
  18. baz -1.206412 0.132003 1.024180
  19. foo 1.431256 -0.076467 0.875906
  20. qux -1.170299 1.130127 0.974466
  1. # using the slicers
  2. In [68]: df.loc[(slice(None),'one'),:]
  3. Out[68]:
  4. A B C
  5. first second
  6. bar one 0.895717 0.410835 -1.413681
  7. baz one -1.206412 0.132003 1.024180
  8. foo one 1.431256 -0.076467 0.875906
  9. qux one -1.170299 1.130127 0.974466

You can also select on the columns with xs(), by providing the axis argument.

  1. In [69]: df = df.T
  2. In [70]: df.xs('one', level='second', axis=1)
  3. Out[70]:
  4. first bar baz foo qux
  5. A 0.895717 -1.206412 1.431256 -1.170299
  6. B 0.410835 0.132003 -0.076467 1.130127
  7. C -1.413681 1.024180 0.875906 0.974466
  1. # using the slicers
  2. In [71]: df.loc[:,(slice(None),'one')]
  3. Out[71]:
  4. first bar baz foo qux
  5. second one one one one
  6. A 0.895717 -1.206412 1.431256 -1.170299
  7. B 0.410835 0.132003 -0.076467 1.130127
  8. C -1.413681 1.024180 0.875906 0.974466

xs() also allows selection with multiple keys.

  1. In [72]: df.xs(('one', 'bar'), level=('second', 'first'), axis=1)
  2. Out[72]:
  3. first bar
  4. second one
  5. A 0.895717
  6. B 0.410835
  7. C -1.413681
  8. # using the slicers
  9. In [73]: df.loc[:,('bar','one')]
  10. Out[73]:
  11. A 0.895717
  12. B 0.410835
  13. C -1.413681
  14. Name: (bar, one), dtype: float64

You can pass drop_level=False to xs() to retain the level that was selected.

  1. In [74]: df.xs('one', level='second', axis=1, drop_level=False)
  2. Out[74]:
  3. first bar baz foo qux
  4. second one one one one
  5. A 0.895717 -1.206412 1.431256 -1.170299
  6. B 0.410835 0.132003 -0.076467 1.130127
  7. C -1.413681 1.024180 0.875906 0.974466

Compare the above with the result using drop_level=True (the default value).

  1. In [75]: df.xs('one', level='second', axis=1, drop_level=True)
  2. Out[75]:
  3. first bar baz foo qux
  4. A 0.895717 -1.206412 1.431256 -1.170299
  5. B 0.410835 0.132003 -0.076467 1.130127
  6. C -1.413681 1.024180 0.875906 0.974466

Advanced reindexing and alignment

The parameter level has been added to the reindex and align methods of pandas objects. This is useful to broadcast values across a level. For instance:

  1. In [76]: midx = pd.MultiIndex(levels=[['zero', 'one'], ['x','y']],
  2. ....: labels=[[1,1,0,0],[1,0,1,0]])
  3. ....:
  4. In [77]: df = pd.DataFrame(np.random.randn(4,2), index=midx)
  5. In [78]: df
  6. Out[78]:
  7. 0 1
  8. one y 1.519970 -0.493662
  9. x 0.600178 0.274230
  10. zero y 0.132885 -0.023688
  11. x 2.410179 1.450520
  12. In [79]: df2 = df.mean(level=0)
  13. In [80]: df2
  14. Out[80]:
  15. 0 1
  16. one 1.060074 -0.109716
  17. zero 1.271532 0.713416
  18. In [81]: df2.reindex(df.index, level=0)
  19. Out[81]:
  20. 0 1
  21. one y 1.060074 -0.109716
  22. x 1.060074 -0.109716
  23. zero y 1.271532 0.713416
  24. x 1.271532 0.713416
  25. # aligning
  26. In [82]: df_aligned, df2_aligned = df.align(df2, level=0)
  27. In [83]: df_aligned
  28. Out[83]:
  29. 0 1
  30. one y 1.519970 -0.493662
  31. x 0.600178 0.274230
  32. zero y 0.132885 -0.023688
  33. x 2.410179 1.450520
  34. In [84]: df2_aligned
  35. Out[84]:
  36. 0 1
  37. one y 1.060074 -0.109716
  38. x 1.060074 -0.109716
  39. zero y 1.271532 0.713416
  40. x 1.271532 0.713416

Swapping levels with swaplevel()

The swaplevel function can switch the order of two levels:

  1. In [85]: df[:5]
  2. Out[85]:
  3. 0 1
  4. one y 1.519970 -0.493662
  5. x 0.600178 0.274230
  6. zero y 0.132885 -0.023688
  7. x 2.410179 1.450520
  8. In [86]: df[:5].swaplevel(0, 1, axis=0)
  9. Out[86]:
  10. 0 1
  11. y one 1.519970 -0.493662
  12. x one 0.600178 0.274230
  13. y zero 0.132885 -0.023688
  14. x zero 2.410179 1.450520

Reordering levels with reorder_levels()

The reorder_levels function generalizes the swaplevel function, allowing you to permute the hierarchical index levels in one step:

  1. In [87]: df[:5].reorder_levels([1,0], axis=0)
  2. Out[87]:
  3. 0 1
  4. y one 1.519970 -0.493662
  5. x one 0.600178 0.274230
  6. y zero 0.132885 -0.023688
  7. x zero 2.410179 1.450520