聚合

Once the Rolling, Expanding or EWM objects have been created, several methods are available to perform multiple computations on the data. These operations are similar to the aggregating API, groupby API, and resample API.

  1. In [85]: dfa = pd.DataFrame(np.random.randn(1000, 3),
  2. ....: index=pd.date_range('1/1/2000', periods=1000),
  3. ....: columns=['A', 'B', 'C'])
  4. ....:
  5. In [86]: r = dfa.rolling(window=60,min_periods=1)
  6. In [87]: r
  7. Out[87]: Rolling [window=60,min_periods=1,center=False,axis=0]

We can aggregate by passing a function to the entire DataFrame, or select a Series (or multiple Series) via standard __getitem__.

  1. In [88]: r.aggregate(np.sum)
  2. Out[88]:
  3. A B C
  4. 2000-01-01 -0.289838 -0.370545 -1.284206
  5. 2000-01-02 -0.216612 -1.675528 -1.169415
  6. 2000-01-03 1.154661 -1.634017 -1.566620
  7. 2000-01-04 2.969393 -4.003274 -1.816179
  8. 2000-01-05 4.690630 -4.682017 -2.717209
  9. 2000-01-06 3.880630 -4.447700 -1.078947
  10. 2000-01-07 4.001957 -2.884072 -3.116903
  11. ... ... ... ...
  12. 2002-09-20 2.652493 -10.528875 9.867805
  13. 2002-09-21 0.844497 -9.280944 9.522649
  14. 2002-09-22 2.860036 -9.270337 6.415245
  15. 2002-09-23 3.510163 -8.151439 5.177219
  16. 2002-09-24 6.524983 -10.168078 5.792639
  17. 2002-09-25 6.409626 -9.956226 5.704050
  18. 2002-09-26 5.093787 -7.074515 6.905823
  19. [1000 rows x 3 columns]
  20. In [89]: r['A'].aggregate(np.sum)
  21. Out[89]:
  22. 2000-01-01 -0.289838
  23. 2000-01-02 -0.216612
  24. 2000-01-03 1.154661
  25. 2000-01-04 2.969393
  26. 2000-01-05 4.690630
  27. 2000-01-06 3.880630
  28. 2000-01-07 4.001957
  29. ...
  30. 2002-09-20 2.652493
  31. 2002-09-21 0.844497
  32. 2002-09-22 2.860036
  33. 2002-09-23 3.510163
  34. 2002-09-24 6.524983
  35. 2002-09-25 6.409626
  36. 2002-09-26 5.093787
  37. Freq: D, Name: A, Length: 1000, dtype: float64
  38. In [90]: r[['A','B']].aggregate(np.sum)
  39. Out[90]:
  40. A B
  41. 2000-01-01 -0.289838 -0.370545
  42. 2000-01-02 -0.216612 -1.675528
  43. 2000-01-03 1.154661 -1.634017
  44. 2000-01-04 2.969393 -4.003274
  45. 2000-01-05 4.690630 -4.682017
  46. 2000-01-06 3.880630 -4.447700
  47. 2000-01-07 4.001957 -2.884072
  48. ... ... ...
  49. 2002-09-20 2.652493 -10.528875
  50. 2002-09-21 0.844497 -9.280944
  51. 2002-09-22 2.860036 -9.270337
  52. 2002-09-23 3.510163 -8.151439
  53. 2002-09-24 6.524983 -10.168078
  54. 2002-09-25 6.409626 -9.956226
  55. 2002-09-26 5.093787 -7.074515
  56. [1000 rows x 2 columns]

As you can see, the result of the aggregation will have the selected columns, or all columns if none are selected.

Applying multiple functions

With windowed Series you can also pass a list of functions to do aggregation with, outputting a DataFrame:

  1. In [91]: r['A'].agg([np.sum, np.mean, np.std])
  2. Out[91]:
  3. sum mean std
  4. 2000-01-01 -0.289838 -0.289838 NaN
  5. 2000-01-02 -0.216612 -0.108306 0.256725
  6. 2000-01-03 1.154661 0.384887 0.873311
  7. 2000-01-04 2.969393 0.742348 1.009734
  8. 2000-01-05 4.690630 0.938126 0.977914
  9. 2000-01-06 3.880630 0.646772 1.128883
  10. 2000-01-07 4.001957 0.571708 1.049487
  11. ... ... ... ...
  12. 2002-09-20 2.652493 0.044208 1.164919
  13. 2002-09-21 0.844497 0.014075 1.148231
  14. 2002-09-22 2.860036 0.047667 1.132051
  15. 2002-09-23 3.510163 0.058503 1.134296
  16. 2002-09-24 6.524983 0.108750 1.144204
  17. 2002-09-25 6.409626 0.106827 1.142913
  18. 2002-09-26 5.093787 0.084896 1.151416
  19. [1000 rows x 3 columns]

On a windowed DataFrame, you can pass a list of functions to apply to each column, which produces an aggregated result with a hierarchical index:

  1. In [92]: r.agg([np.sum, np.mean])
  2. Out[92]:
  3. A B C
  4. sum mean sum mean sum mean
  5. 2000-01-01 -0.289838 -0.289838 -0.370545 -0.370545 -1.284206 -1.284206
  6. 2000-01-02 -0.216612 -0.108306 -1.675528 -0.837764 -1.169415 -0.584708
  7. 2000-01-03 1.154661 0.384887 -1.634017 -0.544672 -1.566620 -0.522207
  8. 2000-01-04 2.969393 0.742348 -4.003274 -1.000819 -1.816179 -0.454045
  9. 2000-01-05 4.690630 0.938126 -4.682017 -0.936403 -2.717209 -0.543442
  10. 2000-01-06 3.880630 0.646772 -4.447700 -0.741283 -1.078947 -0.179825
  11. 2000-01-07 4.001957 0.571708 -2.884072 -0.412010 -3.116903 -0.445272
  12. ... ... ... ... ... ... ...
  13. 2002-09-20 2.652493 0.044208 -10.528875 -0.175481 9.867805 0.164463
  14. 2002-09-21 0.844497 0.014075 -9.280944 -0.154682 9.522649 0.158711
  15. 2002-09-22 2.860036 0.047667 -9.270337 -0.154506 6.415245 0.106921
  16. 2002-09-23 3.510163 0.058503 -8.151439 -0.135857 5.177219 0.086287
  17. 2002-09-24 6.524983 0.108750 -10.168078 -0.169468 5.792639 0.096544
  18. 2002-09-25 6.409626 0.106827 -9.956226 -0.165937 5.704050 0.095068
  19. 2002-09-26 5.093787 0.084896 -7.074515 -0.117909 6.905823 0.115097
  20. [1000 rows x 6 columns]

Passing a dict of functions has different behavior by default, see the next section.

Applying different functions to DataFrame columns

By passing a dict to aggregate you can apply a different aggregation to the columns of a DataFrame:

  1. In [93]: r.agg({'A' : np.sum,
  2. ....: 'B' : lambda x: np.std(x, ddof=1)})
  3. ....:
  4. Out[93]:
  5. A B
  6. 2000-01-01 -0.289838 NaN
  7. 2000-01-02 -0.216612 0.660747
  8. 2000-01-03 1.154661 0.689929
  9. 2000-01-04 2.969393 1.072199
  10. 2000-01-05 4.690630 0.939657
  11. 2000-01-06 3.880630 0.966848
  12. 2000-01-07 4.001957 1.240137
  13. ... ... ...
  14. 2002-09-20 2.652493 1.114814
  15. 2002-09-21 0.844497 1.113220
  16. 2002-09-22 2.860036 1.113208
  17. 2002-09-23 3.510163 1.132381
  18. 2002-09-24 6.524983 1.080963
  19. 2002-09-25 6.409626 1.082911
  20. 2002-09-26 5.093787 1.136199
  21. [1000 rows x 2 columns]

The function names can also be strings. In order for a string to be valid it must be implemented on the windowed object

  1. In [94]: r.agg({'A' : 'sum', 'B' : 'std'})
  2. Out[94]:
  3. A B
  4. 2000-01-01 -0.289838 NaN
  5. 2000-01-02 -0.216612 0.660747
  6. 2000-01-03 1.154661 0.689929
  7. 2000-01-04 2.969393 1.072199
  8. 2000-01-05 4.690630 0.939657
  9. 2000-01-06 3.880630 0.966848
  10. 2000-01-07 4.001957 1.240137
  11. ... ... ...
  12. 2002-09-20 2.652493 1.114814
  13. 2002-09-21 0.844497 1.113220
  14. 2002-09-22 2.860036 1.113208
  15. 2002-09-23 3.510163 1.132381
  16. 2002-09-24 6.524983 1.080963
  17. 2002-09-25 6.409626 1.082911
  18. 2002-09-26 5.093787 1.136199
  19. [1000 rows x 2 columns]

Furthermore you can pass a nested dict to indicate different aggregations on different columns.

  1. In [95]: r.agg({'A' : ['sum','std'], 'B' : ['mean','std'] })
  2. Out[95]:
  3. A B
  4. sum std mean std
  5. 2000-01-01 -0.289838 NaN -0.370545 NaN
  6. 2000-01-02 -0.216612 0.256725 -0.837764 0.660747
  7. 2000-01-03 1.154661 0.873311 -0.544672 0.689929
  8. 2000-01-04 2.969393 1.009734 -1.000819 1.072199
  9. 2000-01-05 4.690630 0.977914 -0.936403 0.939657
  10. 2000-01-06 3.880630 1.128883 -0.741283 0.966848
  11. 2000-01-07 4.001957 1.049487 -0.412010 1.240137
  12. ... ... ... ... ...
  13. 2002-09-20 2.652493 1.164919 -0.175481 1.114814
  14. 2002-09-21 0.844497 1.148231 -0.154682 1.113220
  15. 2002-09-22 2.860036 1.132051 -0.154506 1.113208
  16. 2002-09-23 3.510163 1.134296 -0.135857 1.132381
  17. 2002-09-24 6.524983 1.144204 -0.169468 1.080963
  18. 2002-09-25 6.409626 1.142913 -0.165937 1.082911
  19. 2002-09-26 5.093787 1.151416 -0.117909 1.136199
  20. [1000 rows x 4 columns]