布尔索引

Another common operation is the use of boolean vectors to filter the data. The operators are: | for or, & for and, and ~ for not. These must be grouped by using parentheses, since by default Python will evaluate an expression such as df.A > 2 & df.B < 3 as df.A > (2 & df.B) < 3, while the desired evaluation order is (df.A > 2) & (df.B < 3).

Using a boolean vector to index a Series works exactly as in a NumPy ndarray:

  1. In [147]: s = pd.Series(range(-3, 4))
  2. In [148]: s
  3. Out[148]:
  4. 0 -3
  5. 1 -2
  6. 2 -1
  7. 3 0
  8. 4 1
  9. 5 2
  10. 6 3
  11. dtype: int64
  12. In [149]: s[s > 0]
  13. Out[149]:
  14. 4 1
  15. 5 2
  16. 6 3
  17. dtype: int64
  18. In [150]: s[(s < -1) | (s > 0.5)]
  19. Out[150]:
  20. 0 -3
  21. 1 -2
  22. 4 1
  23. 5 2
  24. 6 3
  25. dtype: int64
  26. In [151]: s[~(s < 0)]
  27. Out[151]:
  28. 3 0
  29. 4 1
  30. 5 2
  31. 6 3
  32. dtype: int64

You may select rows from a DataFrame using a boolean vector the same length as the DataFrame’s index (for example, something derived from one of the columns of the DataFrame):

  1. In [152]: df[df['A'] > 0]
  2. Out[152]:
  3. A B C D E 0
  4. 2000-01-01 0.469112 -0.282863 -1.509059 -1.135632 NaN NaN
  5. 2000-01-02 1.212112 -0.173215 0.119209 -1.044236 NaN NaN
  6. 2000-01-04 7.000000 -0.706771 -1.039575 0.271860 NaN NaN
  7. 2000-01-07 0.404705 0.577046 -1.715002 -1.039268 NaN NaN

List comprehensions and map method of Series can also be used to produce more complex criteria:

  1. In [153]: df2 = pd.DataFrame({'a' : ['one', 'one', 'two', 'three', 'two', 'one', 'six'],
  2. .....: 'b' : ['x', 'y', 'y', 'x', 'y', 'x', 'x'],
  3. .....: 'c' : np.random.randn(7)})
  4. .....:
  5. # only want 'two' or 'three'
  6. In [154]: criterion = df2['a'].map(lambda x: x.startswith('t'))
  7. In [155]: df2[criterion]
  8. Out[155]:
  9. a b c
  10. 2 two y 0.041290
  11. 3 three x 0.361719
  12. 4 two y -0.238075
  13. # equivalent but slower
  14. In [156]: df2[[x.startswith('t') for x in df2['a']]]
  15. Out[156]:
  16. a b c
  17. 2 two y 0.041290
  18. 3 three x 0.361719
  19. 4 two y -0.238075
  20. # Multiple criteria
  21. In [157]: df2[criterion & (df2['b'] == 'x')]
  22. Out[157]:
  23. a b c
  24. 3 three x 0.361719

With the choice methods Selection by Label, Selection by Position, and Advanced Indexing you may select along more than one axis using boolean vectors combined with other indexing expressions.

  1. In [158]: df2.loc[criterion & (df2['b'] == 'x'),'b':'c']
  2. Out[158]:
  3. b c
  4. 3 x 0.361719