按标签选择
Warning
Whether a copy or a reference is returned for a setting operation, may depend on the context. This is sometimes called chained assignment
and should be avoided. See Returning a View versus Copy.
Warning
.loc
is strict when you present slicers that are not compatible (or convertible) with the index type. For example using integers in a DatetimeIndex
. These will raise a TypeError
.
In [39]: dfl = pd.DataFrame(np.random.randn(5,4), columns=list('ABCD'), index=pd.date_range('20130101',periods=5))
In [40]: dfl
Out[40]:
A B C D
2013-01-01 1.075770 -0.109050 1.643563 -1.469388
2013-01-02 0.357021 -0.674600 -1.776904 -0.968914
2013-01-03 -1.294524 0.413738 0.276662 -0.472035
2013-01-04 -0.013960 -0.362543 -0.006154 -0.923061
2013-01-05 0.895717 0.805244 -1.206412 2.565646
In [4]: dfl.loc[2:3]
TypeError: cannot do slice indexing on < class 'pandas.tseries.index.DatetimeIndex'> with these indexers [2] of < type 'int'>
String likes in slicing can be convertible to the type of the index and lead to natural slicing.
In [41]: dfl.loc['20130102':'20130104']
Out[41]:
A B C D
2013-01-02 0.357021 -0.674600 -1.776904 -0.968914
2013-01-03 -1.294524 0.413738 0.276662 -0.472035
2013-01-04 -0.013960 -0.362543 -0.006154 -0.923061
Warning
Starting in 0.21.0, pandas will show a FutureWarning
if indexing with a list with missing labels. In the future this will raise a KeyError
. See list-like Using loc with missing keys in a list is Deprecated.
pandas provides a suite of methods in order to have purely label based indexing. This is a strict inclusion based protocol. Every label asked for must be in the index, or a KeyError will be raised. When slicing, both the start bound AND the stop bound are included, if present in the index. Integers are valid labels, but they refer to the label and not the position.
The .loc
attribute is the primary access method. The following are valid inputs:
- A single label, e.g. 5 or ‘a’ (Note that 5 is interpreted as a label of the index. This use is not an integer position along the index.).
- A list or array of labels [‘a’, ‘b’, ‘c’].
- A slice object with labels ‘a’:’f’ (Note that contrary to usual python slices, both the start and the stop are included, when present in the index! See Slicing with labels.).
- A boolean array.
- A callable, see Selection By Callable.
In [42]: s1 = pd.Series(np.random.randn(6),index=list('abcdef'))
In [43]: s1
Out[43]:
a 1.431256
b 1.340309
c -1.170299
d -0.226169
e 0.410835
f 0.813850
dtype: float64
In [44]: s1.loc['c':]
Out[44]:
c -1.170299
d -0.226169
e 0.410835
f 0.813850
dtype: float64
In [45]: s1.loc['b']
Out[45]: 1.3403088497993827
Note that setting works as well:
In [46]: s1.loc['c':] = 0
In [47]: s1
Out[47]:
a 1.431256
b 1.340309
c 0.000000
d 0.000000
e 0.000000
f 0.000000
dtype: float64
With a DataFrame:
In [48]: df1 = pd.DataFrame(np.random.randn(6,4),
....: index=list('abcdef'),
....: columns=list('ABCD'))
....:
In [49]: df1
Out[49]:
A B C D
a 0.132003 -0.827317 -0.076467 -1.187678
b 1.130127 -1.436737 -1.413681 1.607920
c 1.024180 0.569605 0.875906 -2.211372
d 0.974466 -2.006747 -0.410001 -0.078638
e 0.545952 -1.219217 -1.226825 0.769804
f -1.281247 -0.727707 -0.121306 -0.097883
In [50]: df1.loc[['a', 'b', 'd'], :]
Out[50]:
A B C D
a 0.132003 -0.827317 -0.076467 -1.187678
b 1.130127 -1.436737 -1.413681 1.607920
d 0.974466 -2.006747 -0.410001 -0.078638
Accessing via label slices:
In [51]: df1.loc['d':, 'A':'C']
Out[51]:
A B C
d 0.974466 -2.006747 -0.410001
e 0.545952 -1.219217 -1.226825
f -1.281247 -0.727707 -0.121306
For getting a cross section using a label (equivalent to df.xs('a')
):
In [52]: df1.loc['a']
Out[52]:
A 0.132003
B -0.827317
C -0.076467
D -1.187678
Name: a, dtype: float64
For getting values with a boolean array:
In [53]: df1.loc['a'] > 0
Out[53]:
A True
B False
C False
D False
Name: a, dtype: bool
In [54]: df1.loc[:, df1.loc['a'] > 0]
Out[54]:
A
a 0.132003
b 1.130127
c 1.024180
d 0.974466
e 0.545952
f -1.281247
For getting a value explicitly (equivalent to deprecated df.get_value('a','A')
):
# this is also equivalent to ``df1.at['a','A']``
In [55]: df1.loc['a', 'A']
Out[55]: 0.13200317033032932
Slicing with labels
When using .loc
with slices, if both the start and the stop labels are present in the index, then elements located between the two (including them) are returned:
In [56]: s = pd.Series(list('abcde'), index=[0,3,2,5,4])
In [57]: s.loc[3:5]
Out[57]:
3 b
2 c
5 d
dtype: object
If at least one of the two is absent, but the index is sorted, and can be compared against start and stop labels, then slicing will still work as expected, by selecting labels which rank between the two:
In [58]: s.sort_index()
Out[58]:
0 a
2 c
3 b
4 e
5 d
dtype: object
In [59]: s.sort_index().loc[1:6]
Out[59]:
2 c
3 b
4 e
5 d
dtype: object
However, if at least one of the two is absent and the index is not sorted, an error will be raised (since doing otherwise would be computationally expensive, as well as potentially ambiguous for mixed type indexes). For instance, in the above example, s.loc[1:6]
would raise KeyError
.