对多索引进行排序

对多索引进行排序

For MultiIndex-ed objects to be indexed and sliced effectively, they need to be sorted. As with any index, you can use sort_index.

In [88]: import random; random.shuffle(tuples)
In [89]: s = pd.Series(np.random.randn(8), index=pd.MultiIndex.from_tuples(tuples))
In [90]: s
Out[90]: 
baz  one    0.206053
foo  two   -0.251905
     one   -2.213588
baz  two    1.063327
qux  two    1.266143
bar  two    0.299368
     one   -0.863838
qux  one    0.408204
dtype: float64
In [91]: s.sort_index()
Out[91]: 
bar  one   -0.863838
     two    0.299368
baz  one    0.206053
     two    1.063327
foo  one   -2.213588
     two   -0.251905
qux  one    0.408204
     two    1.266143
dtype: float64
In [92]: s.sort_index(level=0)
Out[92]: 
bar  one   -0.863838
     two    0.299368
baz  one    0.206053
     two    1.063327
foo  one   -2.213588
     two   -0.251905
qux  one    0.408204
     two    1.266143
dtype: float64
In [93]: s.sort_index(level=1)
Out[93]: 
bar  one   -0.863838
baz  one    0.206053
foo  one   -2.213588
qux  one    0.408204
bar  two    0.299368
baz  two    1.063327
foo  two   -0.251905
qux  two    1.266143
dtype: float64

You may also pass a level name to sort_index if the MultiIndex levels are named.

In [94]: s.index.set_names(['L1', 'L2'], inplace=True)
In [95]: s.sort_index(level='L1')
Out[95]: 
L1   L2 
bar  one   -0.863838
     two    0.299368
baz  one    0.206053
     two    1.063327
foo  one   -2.213588
     two   -0.251905
qux  one    0.408204
     two    1.266143
dtype: float64
In [96]: s.sort_index(level='L2')
Out[96]: 
L1   L2 
bar  one   -0.863838
baz  one    0.206053
foo  one   -2.213588
qux  one    0.408204
bar  two    0.299368
baz  two    1.063327
foo  two   -0.251905
qux  two    1.266143
dtype: float64

On higher dimensional objects, you can sort any of the other axes by level if they have a MultiIndex:

In [97]: df.T.sort_index(level=1, axis=1)
Out[97]: 
        one      zero       one      zero
          x         x         y         y
0  0.600178  2.410179  1.519970  0.132885
1  0.274230  1.450520 -0.493662 -0.023688

Indexing will work even if the data are not sorted, but will be rather inefficient (and show a PerformanceWarning). It will also return a copy of the data rather than a view:

In [98]: dfm = pd.DataFrame({'jim': [0, 0, 1, 1],
   ....:                     'joe': ['x', 'x', 'z', 'y'],
   ....:                     'jolie': np.random.rand(4)})
   ....: 
In [99]: dfm = dfm.set_index(['jim', 'joe'])
In [100]: dfm
Out[100]: 
            jolie
jim joe          
0   x    0.490671
    x    0.120248
1   z    0.537020
    y    0.110968
In [4]: dfm.loc[(1, 'z')]

PerformanceWarning: indexing past lexsort depth may impact performance.
Out[4]:
           jolie
jim joe
1   z    0.64094

Furthermore if you try to index something that is not fully lexsorted, this can raise:

In [5]: dfm.loc[(0,'y'):(1, 'z')]
UnsortedIndexError: 'Key length (2) was greater than MultiIndex lexsort depth (1)'

The is_lexsorted() method on an Index show if the index is sorted, and the lexsort_depth property returns the sort depth:

In [101]: dfm.index.is_lexsorted()
Out[101]: False
In [102]: dfm.index.lexsort_depth
Out[102]: 1

In [103]: dfm = dfm.sort_index()
In [104]: dfm
Out[104]: 
            jolie
jim joe          
0   x    0.490671
    x    0.120248
1   y    0.110968
    z    0.537020
In [105]: dfm.index.is_lexsorted()
Out[105]: True
In [106]: dfm.index.lexsort_depth
Out[106]: 2

And now selection works as expected.

In [107]: dfm.loc[(0,'y'):(1, 'z')]
Out[107]: 
            jolie
jim joe          
1   y    0.110968
    z    0.537020