根据dtype选择列

The select_dtypes() method implements subsetting of columns based on their dtype.

First, let’s create a DataFrame with a slew of different dtypes:

  1. In [435]: df = pd.DataFrame({'string': list('abc'),
  2. .....: 'int64': list(range(1, 4)),
  3. .....: 'uint8': np.arange(3, 6).astype('u1'),
  4. .....: 'float64': np.arange(4.0, 7.0),
  5. .....: 'bool1': [True, False, True],
  6. .....: 'bool2': [False, True, False],
  7. .....: 'dates': pd.date_range('now', periods=3).values,
  8. .....: 'category': pd.Series(list("ABC")).astype('category')})
  9. .....:
  10. In [436]: df['tdeltas'] = df.dates.diff()
  11. In [437]: df['uint64'] = np.arange(3, 6).astype('u8')
  12. In [438]: df['other_dates'] = pd.date_range('20130101', periods=3).values
  13. In [439]: df['tz_aware_dates'] = pd.date_range('20130101', periods=3, tz='US/Eastern')
  14. In [440]: df
  15. Out[440]:
  16. string int64 uint8 float64 bool1 bool2 dates category tdeltas uint64 other_dates tz_aware_dates
  17. 0 a 1 3 4.0 True False 2018-08-05 11:57:39.507525 A NaT 3 2013-01-01 2013-01-01 00:00:00-05:00
  18. 1 b 2 4 5.0 False True 2018-08-06 11:57:39.507525 B 1 days 4 2013-01-02 2013-01-02 00:00:00-05:00
  19. 2 c 3 5 6.0 True False 2018-08-07 11:57:39.507525 C 1 days 5 2013-01-03 2013-01-03 00:00:00-05:00

And the dtypes:

  1. In [441]: df.dtypes
  2. Out[441]:
  3. string object
  4. int64 int64
  5. uint8 uint8
  6. float64 float64
  7. bool1 bool
  8. bool2 bool
  9. dates datetime64[ns]
  10. category category
  11. tdeltas timedelta64[ns]
  12. uint64 uint64
  13. other_dates datetime64[ns]
  14. tz_aware_dates datetime64[ns, US/Eastern]
  15. dtype: object

select_dtypes() has two parameters include and exclude that allow you to say “give me the columns with these dtypes” (include) and/or “give the columns without these dtypes” (exclude).

For example, to select bool columns:

  1. In [442]: df.select_dtypes(include=[bool])
  2. Out[442]:
  3. bool1 bool2
  4. 0 True False
  5. 1 False True
  6. 2 True False

You can also pass the name of a dtype in the NumPy dtype hierarchy:

  1. In [443]: df.select_dtypes(include=['bool'])
  2. Out[443]:
  3. bool1 bool2
  4. 0 True False
  5. 1 False True
  6. 2 True False

select_dtypes() also works with generic dtypes as well.

For example, to select all numeric and boolean columns while excluding unsigned integers:

  1. In [444]: df.select_dtypes(include=['number', 'bool'], exclude=['unsignedinteger'])
  2. Out[444]:
  3. int64 float64 bool1 bool2 tdeltas
  4. 0 1 4.0 True False NaT
  5. 1 2 5.0 False True 1 days
  6. 2 3 6.0 True False 1 days

To select string columns you must use the object dtype:

  1. In [445]: df.select_dtypes(include=['object'])
  2. Out[445]:
  3. string
  4. 0 a
  5. 1 b
  6. 2 c

To see all the child dtypes of a generic dtype like numpy.number you can define a function that returns a tree of child dtypes:

  1. In [446]: def subdtypes(dtype):
  2. .....: subs = dtype.__subclasses__()
  3. .....: if not subs:
  4. .....: return dtype
  5. .....: return [dtype, [subdtypes(dt) for dt in subs]]
  6. .....:

All NumPy dtypes are subclasses of numpy.generic:

  1. In [447]: subdtypes(np.generic)
  2. Out[447]:
  3. [numpy.generic,
  4. [[numpy.number,
  5. [[numpy.integer,
  6. [[numpy.signedinteger,
  7. [numpy.int8,
  8. numpy.int16,
  9. numpy.int32,
  10. numpy.int64,
  11. numpy.int64,
  12. numpy.timedelta64]],
  13. [numpy.unsignedinteger,
  14. [numpy.uint8,
  15. numpy.uint16,
  16. numpy.uint32,
  17. numpy.uint64,
  18. numpy.uint64]]]],
  19. [numpy.inexact,
  20. [[numpy.floating,
  21. [numpy.float16, numpy.float32, numpy.float64, numpy.float128]],
  22. [numpy.complexfloating,
  23. [numpy.complex64, numpy.complex128, numpy.complex256]]]]]],
  24. [numpy.flexible,
  25. [[numpy.character, [numpy.bytes_, numpy.str_]],
  26. [numpy.void, [numpy.record]]]],
  27. numpy.bool_,
  28. numpy.datetime64,
  29. numpy.object_]]

Note: Pandas also defines the types category, and datetime64[ns, tz], which are not integrated into the normal NumPy hierarchy and won’t show up with the above function.