对多索引进行排序

For MultiIndex-ed objects to be indexed and sliced effectively, they need to be sorted. As with any index, you can use sort_index.

  1. In [88]: import random; random.shuffle(tuples)
  2. In [89]: s = pd.Series(np.random.randn(8), index=pd.MultiIndex.from_tuples(tuples))
  3. In [90]: s
  4. Out[90]:
  5. baz one 0.206053
  6. foo two -0.251905
  7. one -2.213588
  8. baz two 1.063327
  9. qux two 1.266143
  10. bar two 0.299368
  11. one -0.863838
  12. qux one 0.408204
  13. dtype: float64
  14. In [91]: s.sort_index()
  15. Out[91]:
  16. bar one -0.863838
  17. two 0.299368
  18. baz one 0.206053
  19. two 1.063327
  20. foo one -2.213588
  21. two -0.251905
  22. qux one 0.408204
  23. two 1.266143
  24. dtype: float64
  25. In [92]: s.sort_index(level=0)
  26. Out[92]:
  27. bar one -0.863838
  28. two 0.299368
  29. baz one 0.206053
  30. two 1.063327
  31. foo one -2.213588
  32. two -0.251905
  33. qux one 0.408204
  34. two 1.266143
  35. dtype: float64
  36. In [93]: s.sort_index(level=1)
  37. Out[93]:
  38. bar one -0.863838
  39. baz one 0.206053
  40. foo one -2.213588
  41. qux one 0.408204
  42. bar two 0.299368
  43. baz two 1.063327
  44. foo two -0.251905
  45. qux two 1.266143
  46. dtype: float64

You may also pass a level name to sort_index if the MultiIndex levels are named.

  1. In [94]: s.index.set_names(['L1', 'L2'], inplace=True)
  2. In [95]: s.sort_index(level='L1')
  3. Out[95]:
  4. L1 L2
  5. bar one -0.863838
  6. two 0.299368
  7. baz one 0.206053
  8. two 1.063327
  9. foo one -2.213588
  10. two -0.251905
  11. qux one 0.408204
  12. two 1.266143
  13. dtype: float64
  14. In [96]: s.sort_index(level='L2')
  15. Out[96]:
  16. L1 L2
  17. bar one -0.863838
  18. baz one 0.206053
  19. foo one -2.213588
  20. qux one 0.408204
  21. bar two 0.299368
  22. baz two 1.063327
  23. foo two -0.251905
  24. qux two 1.266143
  25. dtype: float64

On higher dimensional objects, you can sort any of the other axes by level if they have a MultiIndex:

  1. In [97]: df.T.sort_index(level=1, axis=1)
  2. Out[97]:
  3. one zero one zero
  4. x x y y
  5. 0 0.600178 2.410179 1.519970 0.132885
  6. 1 0.274230 1.450520 -0.493662 -0.023688

Indexing will work even if the data are not sorted, but will be rather inefficient (and show a PerformanceWarning). It will also return a copy of the data rather than a view:

  1. In [98]: dfm = pd.DataFrame({'jim': [0, 0, 1, 1],
  2. ....: 'joe': ['x', 'x', 'z', 'y'],
  3. ....: 'jolie': np.random.rand(4)})
  4. ....:
  5. In [99]: dfm = dfm.set_index(['jim', 'joe'])
  6. In [100]: dfm
  7. Out[100]:
  8. jolie
  9. jim joe
  10. 0 x 0.490671
  11. x 0.120248
  12. 1 z 0.537020
  13. y 0.110968
  14. In [4]: dfm.loc[(1, 'z')]
  1. PerformanceWarning: indexing past lexsort depth may impact performance.
  2. Out[4]:
  3. jolie
  4. jim joe
  5. 1 z 0.64094

Furthermore if you try to index something that is not fully lexsorted, this can raise:

  1. In [5]: dfm.loc[(0,'y'):(1, 'z')]
  2. UnsortedIndexError: 'Key length (2) was greater than MultiIndex lexsort depth (1)'

The is_lexsorted() method on an Index show if the index is sorted, and the lexsort_depth property returns the sort depth:

  1. In [101]: dfm.index.is_lexsorted()
  2. Out[101]: False
  3. In [102]: dfm.index.lexsort_depth
  4. Out[102]: 1
  1. In [103]: dfm = dfm.sort_index()
  2. In [104]: dfm
  3. Out[104]:
  4. jolie
  5. jim joe
  6. 0 x 0.490671
  7. x 0.120248
  8. 1 y 0.110968
  9. z 0.537020
  10. In [105]: dfm.index.is_lexsorted()
  11. Out[105]: True
  12. In [106]: dfm.index.lexsort_depth
  13. Out[106]: 2

And now selection works as expected.

  1. In [107]: dfm.loc[(0,'y'):(1, 'z')]
  2. Out[107]:
  3. jolie
  4. jim joe
  5. 1 y 0.110968
  6. z 0.537020