五、 相关操作

详情请参与 基本的二进制操作

统计(相关操作通常情况下不包括缺失值)

1、 执行描述性统计:

  1. In [61]: df.mean()
  2. Out[61]:
  3. A -0.004474
  4. B -0.383981
  5. C -0.687758
  6. D 5.000000
  7. F 3.000000
  8. dtype: float64

2、 在其他轴上进行相同的操作:

  1. In [62]: df.mean(1)
  2. Out[62]:
  3. 2013-01-01 0.872735
  4. 2013-01-02 1.431621
  5. 2013-01-03 0.707731
  6. 2013-01-04 1.395042
  7. 2013-01-05 1.883656
  8. 2013-01-06 1.592306
  9. Freq: D, dtype: float64

3、 对于拥有不同维度,需要对齐的对象进行操作。Pandas 会自动的沿着指定的维度进行广播:

  1. In [63]: s = pd.Series([1,3,5,np.nan,6,8], index=dates).shift(2)
  2. In [64]: s
  3. Out[64]:
  4. 2013-01-01 NaN
  5. 2013-01-02 NaN
  6. 2013-01-03 1.0
  7. 2013-01-04 3.0
  8. 2013-01-05 5.0
  9. 2013-01-06 NaN
  10. Freq: D, dtype: float64
  11. In [65]: df.sub(s, axis='index')
  12. Out[65]:
  13. A B C D F
  14. 2013-01-01 NaN NaN NaN NaN NaN
  15. 2013-01-02 NaN NaN NaN NaN NaN
  16. 2013-01-03 -1.861849 -3.104569 -1.494929 4.0 1.0
  17. 2013-01-04 -2.278445 -3.706771 -4.039575 2.0 0.0
  18. 2013-01-05 -5.424972 -4.432980 -4.723768 0.0 -1.0
  19. 2013-01-06 NaN NaN NaN NaN NaN

Apply

1、 对数据应用函数:

  1. In [66]: df.apply(np.cumsum)
  2. Out[66]:
  3. A B C D F
  4. 2013-01-01 0.000000 0.000000 -1.509059 5 NaN
  5. 2013-01-02 1.212112 -0.173215 -1.389850 10 1.0
  6. 2013-01-03 0.350263 -2.277784 -1.884779 15 3.0
  7. 2013-01-04 1.071818 -2.984555 -2.924354 20 6.0
  8. 2013-01-05 0.646846 -2.417535 -2.648122 25 10.0
  9. 2013-01-06 -0.026844 -2.303886 -4.126549 30 15.0
  10. In [67]: df.apply(lambda x: x.max() - x.min())
  11. Out[67]:
  12. A 2.073961
  13. B 2.671590
  14. C 1.785291
  15. D 0.000000
  16. F 4.000000
  17. dtype: float64

直方图

具体请参照:直方图和离散化

  1. In [68]: s = pd.Series(np.random.randint(0, 7, size=10))
  2. In [69]: s
  3. Out[69]:
  4. 0 4
  5. 1 2
  6. 2 1
  7. 3 2
  8. 4 6
  9. 5 4
  10. 6 4
  11. 7 6
  12. 8 4
  13. 9 4
  14. dtype: int64
  15. In [70]: s.value_counts()
  16. Out[70]:
  17. 4 5
  18. 6 2
  19. 2 2
  20. 1 1
  21. dtype: int64

字符串方法

Series对象在其str属性中配备了一组字符串处理方法,可以很容易的应用到数组中的每个元素,如下段代码所示。更多详情请参考:字符串向量化方法

  1. In [71]: s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])
  2. In [72]: s.str.lower()
  3. Out[72]:
  4. 0 a
  5. 1 b
  6. 2 c
  7. 3 aaba
  8. 4 baca
  9. 5 NaN
  10. 6 caba
  11. 7 dog
  12. 8 cat
  13. dtype: object