使用isin进行索引

Consider the isin() method of Series, which returns a boolean vector that is true wherever the Series elements exist in the passed list. This allows you to select rows where one or more columns have values you want:

  1. In [159]: s = pd.Series(np.arange(5), index=np.arange(5)[::-1], dtype='int64')
  2. In [160]: s
  3. Out[160]:
  4. 4 0
  5. 3 1
  6. 2 2
  7. 1 3
  8. 0 4
  9. dtype: int64
  10. In [161]: s.isin([2, 4, 6])
  11. Out[161]:
  12. 4 False
  13. 3 False
  14. 2 True
  15. 1 False
  16. 0 True
  17. dtype: bool
  18. In [162]: s[s.isin([2, 4, 6])]
  19. Out[162]:
  20. 2 2
  21. 0 4
  22. dtype: int64

The same method is available for Index objects and is useful for the cases when you don’t know which of the sought labels are in fact present:

  1. In [163]: s[s.index.isin([2, 4, 6])]
  2. Out[163]:
  3. 4 0
  4. 2 2
  5. dtype: int64
  6. # compare it to the following
  7. In [164]: s.reindex([2, 4, 6])
  8. Out[164]:
  9. 2 2.0
  10. 4 0.0
  11. 6 NaN
  12. dtype: float64

In addition to that, MultiIndex allows selecting a separate level to use in the membership check:

  1. In [165]: s_mi = pd.Series(np.arange(6),
  2. .....: index=pd.MultiIndex.from_product([[0, 1], ['a', 'b', 'c']]))
  3. .....:
  4. In [166]: s_mi
  5. Out[166]:
  6. 0 a 0
  7. b 1
  8. c 2
  9. 1 a 3
  10. b 4
  11. c 5
  12. dtype: int64
  13. In [167]: s_mi.iloc[s_mi.index.isin([(1, 'a'), (2, 'b'), (0, 'c')])]
  14. Out[167]:
  15. 0 c 2
  16. 1 a 3
  17. dtype: int64
  18. In [168]: s_mi.iloc[s_mi.index.isin(['a', 'c', 'e'], level=1)]
  19. Out[168]:
  20. 0 a 0
  21. c 2
  22. 1 a 3
  23. c 5
  24. dtype: int64

DataFrame also has an isin() method. When calling isin, pass a set of values as either an array or dict. If values is an array, isin returns a DataFrame of booleans that is the same shape as the original DataFrame, with True wherever the element is in the sequence of values.

  1. In [169]: df = pd.DataFrame({'vals': [1, 2, 3, 4], 'ids': ['a', 'b', 'f', 'n'],
  2. .....: 'ids2': ['a', 'n', 'c', 'n']})
  3. .....:
  4. In [170]: values = ['a', 'b', 1, 3]
  5. In [171]: df.isin(values)
  6. Out[171]:
  7. vals ids ids2
  8. 0 True True True
  9. 1 False True False
  10. 2 True False False
  11. 3 False False False

Oftentimes you’ll want to match certain values with certain columns. Just make values a dict where the key is the column, and the value is a list of items you want to check for.

  1. In [172]: values = {'ids': ['a', 'b'], 'vals': [1, 3]}
  2. In [173]: df.isin(values)
  3. Out[173]:
  4. vals ids ids2
  5. 0 True True False
  6. 1 False True False
  7. 2 True False False
  8. 3 False False False

Combine DataFrame’s isin with the any() and all() methods to quickly select subsets of your data that meet a given criteria. To select a row where each column meets its own criterion:

  1. In [174]: values = {'ids': ['a', 'b'], 'ids2': ['a', 'c'], 'vals': [1, 3]}
  2. In [175]: row_mask = df.isin(values).all(1)
  3. In [176]: df[row_mask]
  4. Out[176]:
  5. vals ids ids2
  6. 0 1 a a