按标签选择

Warning

Whether a copy or a reference is returned for a setting operation, may depend on the context. This is sometimes called chained assignment and should be avoided. See Returning a View versus Copy.

Warning

.loc is strict when you present slicers that are not compatible (or convertible) with the index type. For example using integers in a DatetimeIndex. These will raise a TypeError.

  1. In [39]: dfl = pd.DataFrame(np.random.randn(5,4), columns=list('ABCD'), index=pd.date_range('20130101',periods=5))
  2. In [40]: dfl
  3. Out[40]:
  4. A B C D
  5. 2013-01-01 1.075770 -0.109050 1.643563 -1.469388
  6. 2013-01-02 0.357021 -0.674600 -1.776904 -0.968914
  7. 2013-01-03 -1.294524 0.413738 0.276662 -0.472035
  8. 2013-01-04 -0.013960 -0.362543 -0.006154 -0.923061
  9. 2013-01-05 0.895717 0.805244 -1.206412 2.565646
  1. In [4]: dfl.loc[2:3]
  2. TypeError: cannot do slice indexing on < class 'pandas.tseries.index.DatetimeIndex'> with these indexers [2] of < type 'int'>

String likes in slicing can be convertible to the type of the index and lead to natural slicing.

  1. In [41]: dfl.loc['20130102':'20130104']
  2. Out[41]:
  3. A B C D
  4. 2013-01-02 0.357021 -0.674600 -1.776904 -0.968914
  5. 2013-01-03 -1.294524 0.413738 0.276662 -0.472035
  6. 2013-01-04 -0.013960 -0.362543 -0.006154 -0.923061

Warning

Starting in 0.21.0, pandas will show a FutureWarning if indexing with a list with missing labels. In the future this will raise a KeyError. See list-like Using loc with missing keys in a list is Deprecated.

pandas provides a suite of methods in order to have purely label based indexing. This is a strict inclusion based protocol. Every label asked for must be in the index, or a KeyError will be raised. When slicing, both the start bound AND the stop bound are included, if present in the index. Integers are valid labels, but they refer to the label and not the position.

The .loc attribute is the primary access method. The following are valid inputs:

  • A single label, e.g. 5 or ‘a’ (Note that 5 is interpreted as a label of the index. This use is not an integer position along the index.).
  • A list or array of labels [‘a’, ‘b’, ‘c’].
  • A slice object with labels ‘a’:’f’ (Note that contrary to usual python slices, both the start and the stop are included, when present in the index! See Slicing with labels.).
  • A boolean array.
  • A callable, see Selection By Callable.
  1. In [42]: s1 = pd.Series(np.random.randn(6),index=list('abcdef'))
  2. In [43]: s1
  3. Out[43]:
  4. a 1.431256
  5. b 1.340309
  6. c -1.170299
  7. d -0.226169
  8. e 0.410835
  9. f 0.813850
  10. dtype: float64
  11. In [44]: s1.loc['c':]
  12. Out[44]:
  13. c -1.170299
  14. d -0.226169
  15. e 0.410835
  16. f 0.813850
  17. dtype: float64
  18. In [45]: s1.loc['b']
  19. Out[45]: 1.3403088497993827

Note that setting works as well:

  1. In [46]: s1.loc['c':] = 0
  2. In [47]: s1
  3. Out[47]:
  4. a 1.431256
  5. b 1.340309
  6. c 0.000000
  7. d 0.000000
  8. e 0.000000
  9. f 0.000000
  10. dtype: float64

With a DataFrame:

  1. In [48]: df1 = pd.DataFrame(np.random.randn(6,4),
  2. ....: index=list('abcdef'),
  3. ....: columns=list('ABCD'))
  4. ....:
  5. In [49]: df1
  6. Out[49]:
  7. A B C D
  8. a 0.132003 -0.827317 -0.076467 -1.187678
  9. b 1.130127 -1.436737 -1.413681 1.607920
  10. c 1.024180 0.569605 0.875906 -2.211372
  11. d 0.974466 -2.006747 -0.410001 -0.078638
  12. e 0.545952 -1.219217 -1.226825 0.769804
  13. f -1.281247 -0.727707 -0.121306 -0.097883
  14. In [50]: df1.loc[['a', 'b', 'd'], :]
  15. Out[50]:
  16. A B C D
  17. a 0.132003 -0.827317 -0.076467 -1.187678
  18. b 1.130127 -1.436737 -1.413681 1.607920
  19. d 0.974466 -2.006747 -0.410001 -0.078638

Accessing via label slices:

  1. In [51]: df1.loc['d':, 'A':'C']
  2. Out[51]:
  3. A B C
  4. d 0.974466 -2.006747 -0.410001
  5. e 0.545952 -1.219217 -1.226825
  6. f -1.281247 -0.727707 -0.121306

For getting a cross section using a label (equivalent to df.xs('a')):

  1. In [52]: df1.loc['a']
  2. Out[52]:
  3. A 0.132003
  4. B -0.827317
  5. C -0.076467
  6. D -1.187678
  7. Name: a, dtype: float64

For getting values with a boolean array:

  1. In [53]: df1.loc['a'] > 0
  2. Out[53]:
  3. A True
  4. B False
  5. C False
  6. D False
  7. Name: a, dtype: bool
  8. In [54]: df1.loc[:, df1.loc['a'] > 0]
  9. Out[54]:
  10. A
  11. a 0.132003
  12. b 1.130127
  13. c 1.024180
  14. d 0.974466
  15. e 0.545952
  16. f -1.281247

For getting a value explicitly (equivalent to deprecated df.get_value('a','A')):

  1. # this is also equivalent to ``df1.at['a','A']``
  2. In [55]: df1.loc['a', 'A']
  3. Out[55]: 0.13200317033032932

Slicing with labels

When using .loc with slices, if both the start and the stop labels are present in the index, then elements located between the two (including them) are returned:

  1. In [56]: s = pd.Series(list('abcde'), index=[0,3,2,5,4])
  2. In [57]: s.loc[3:5]
  3. Out[57]:
  4. 3 b
  5. 2 c
  6. 5 d
  7. dtype: object

If at least one of the two is absent, but the index is sorted, and can be compared against start and stop labels, then slicing will still work as expected, by selecting labels which rank between the two:

  1. In [58]: s.sort_index()
  2. Out[58]:
  3. 0 a
  4. 2 c
  5. 3 b
  6. 4 e
  7. 5 d
  8. dtype: object
  9. In [59]: s.sort_index().loc[1:6]
  10. Out[59]:
  11. 2 c
  12. 3 b
  13. 4 e
  14. 5 d
  15. dtype: object

However, if at least one of the two is absent and the index is not sorted, an error will be raised (since doing otherwise would be computationally expensive, as well as potentially ambiguous for mixed type indexes). For instance, in the above example, s.loc[1:6] would raise KeyError.