缺少数据转换规则和索引

While pandas supports storing arrays of integer and boolean type, these types are not capable of storing missing data. Until we can switch to using a native NA type in NumPy, we’ve established some “casting rules”. When a reindexing operation introduces missing data, the Series will be cast according to the rules introduced in the table below.

data typeCast to
integerfloat
booleanobject
floatno cast
objectno cast

For example:

  1. In [127]: s = pd.Series(np.random.randn(5), index=[0, 2, 4, 6, 7])
  2. In [128]: s > 0
  3. Out[128]:
  4. 0 True
  5. 2 True
  6. 4 True
  7. 6 True
  8. 7 True
  9. dtype: bool
  10. In [129]: (s > 0).dtype
  11. Out[129]: dtype('bool')
  12. In [130]: crit = (s > 0).reindex(list(range(8)))
  13. In [131]: crit
  14. Out[131]:
  15. 0 True
  16. 1 NaN
  17. 2 True
  18. 3 NaN
  19. 4 True
  20. 5 NaN
  21. 6 True
  22. 7 True
  23. dtype: object
  24. In [132]: crit.dtype
  25. Out[132]: dtype('O')

Ordinarily NumPy will complain if you try to use an object array (even if it contains boolean values) instead of a boolean array to get or set values from an ndarray (e.g. selecting values based on some criteria). If a boolean vector contains NAs, an exception will be generated:

  1. In [133]: reindexed = s.reindex(list(range(8))).fillna(0)
  2. In [134]: reindexed[crit]
  3. ---------------------------------------------------------------------------
  4. ValueError Traceback (most recent call last)
  5. <ipython-input-134-0dac417a4890> in <module>()
  6. ----> 1 reindexed[crit]
  7. /pandas/pandas/core/series.py in __getitem__(self, key)
  8. 805 key = list(key)
  9. 806
  10. --> 807 if com.is_bool_indexer(key):
  11. 808 key = check_bool_indexer(self.index, key)
  12. 809
  13. /pandas/pandas/core/common.py in is_bool_indexer(key)
  14. 105 if not lib.is_bool_array(key):
  15. 106 if isna(key).any():
  16. --> 107 raise ValueError('cannot index with vector containing '
  17. 108 'NA / NaN values')
  18. 109 return False
  19. ValueError: cannot index with vector containing NA / NaN values

However, these can be filled in using fillna() and it will work fine:

  1. In [135]: reindexed[crit.fillna(False)]
  2. Out[135]:
  3. 0 0.126504
  4. 2 0.696198
  5. 4 0.697416
  6. 6 0.601516
  7. 7 0.003659
  8. dtype: float64
  9. In [136]: reindexed[crit.fillna(True)]
  10. Out[136]:
  11. 0 0.126504
  12. 1 0.000000
  13. 2 0.696198
  14. 3 0.000000
  15. 4 0.697416
  16. 5 0.000000
  17. 6 0.601516
  18. 7 0.003659
  19. dtype: float64