聚合
Once the Rolling
, Expanding
or EWM
objects have been created, several methods are available to perform multiple computations on the data. These operations are similar to the aggregating API, groupby API, and resample API.
In [85]: dfa = pd.DataFrame(np.random.randn(1000, 3),
....: index=pd.date_range('1/1/2000', periods=1000),
....: columns=['A', 'B', 'C'])
....:
In [86]: r = dfa.rolling(window=60,min_periods=1)
In [87]: r
Out[87]: Rolling [window=60,min_periods=1,center=False,axis=0]
We can aggregate by passing a function to the entire DataFrame, or select a Series (or multiple Series) via standard __getitem__
.
In [88]: r.aggregate(np.sum)
Out[88]:
A B C
2000-01-01 -0.289838 -0.370545 -1.284206
2000-01-02 -0.216612 -1.675528 -1.169415
2000-01-03 1.154661 -1.634017 -1.566620
2000-01-04 2.969393 -4.003274 -1.816179
2000-01-05 4.690630 -4.682017 -2.717209
2000-01-06 3.880630 -4.447700 -1.078947
2000-01-07 4.001957 -2.884072 -3.116903
... ... ... ...
2002-09-20 2.652493 -10.528875 9.867805
2002-09-21 0.844497 -9.280944 9.522649
2002-09-22 2.860036 -9.270337 6.415245
2002-09-23 3.510163 -8.151439 5.177219
2002-09-24 6.524983 -10.168078 5.792639
2002-09-25 6.409626 -9.956226 5.704050
2002-09-26 5.093787 -7.074515 6.905823
[1000 rows x 3 columns]
In [89]: r['A'].aggregate(np.sum)
Out[89]:
2000-01-01 -0.289838
2000-01-02 -0.216612
2000-01-03 1.154661
2000-01-04 2.969393
2000-01-05 4.690630
2000-01-06 3.880630
2000-01-07 4.001957
...
2002-09-20 2.652493
2002-09-21 0.844497
2002-09-22 2.860036
2002-09-23 3.510163
2002-09-24 6.524983
2002-09-25 6.409626
2002-09-26 5.093787
Freq: D, Name: A, Length: 1000, dtype: float64
In [90]: r[['A','B']].aggregate(np.sum)
Out[90]:
A B
2000-01-01 -0.289838 -0.370545
2000-01-02 -0.216612 -1.675528
2000-01-03 1.154661 -1.634017
2000-01-04 2.969393 -4.003274
2000-01-05 4.690630 -4.682017
2000-01-06 3.880630 -4.447700
2000-01-07 4.001957 -2.884072
... ... ...
2002-09-20 2.652493 -10.528875
2002-09-21 0.844497 -9.280944
2002-09-22 2.860036 -9.270337
2002-09-23 3.510163 -8.151439
2002-09-24 6.524983 -10.168078
2002-09-25 6.409626 -9.956226
2002-09-26 5.093787 -7.074515
[1000 rows x 2 columns]
As you can see, the result of the aggregation will have the selected columns, or all columns if none are selected.
Applying multiple functions
With windowed Series
you can also pass a list of functions to do aggregation with, outputting a DataFrame:
In [91]: r['A'].agg([np.sum, np.mean, np.std])
Out[91]:
sum mean std
2000-01-01 -0.289838 -0.289838 NaN
2000-01-02 -0.216612 -0.108306 0.256725
2000-01-03 1.154661 0.384887 0.873311
2000-01-04 2.969393 0.742348 1.009734
2000-01-05 4.690630 0.938126 0.977914
2000-01-06 3.880630 0.646772 1.128883
2000-01-07 4.001957 0.571708 1.049487
... ... ... ...
2002-09-20 2.652493 0.044208 1.164919
2002-09-21 0.844497 0.014075 1.148231
2002-09-22 2.860036 0.047667 1.132051
2002-09-23 3.510163 0.058503 1.134296
2002-09-24 6.524983 0.108750 1.144204
2002-09-25 6.409626 0.106827 1.142913
2002-09-26 5.093787 0.084896 1.151416
[1000 rows x 3 columns]
On a windowed DataFrame, you can pass a list of functions to apply to each column, which produces an aggregated result with a hierarchical index:
In [92]: r.agg([np.sum, np.mean])
Out[92]:
A B C
sum mean sum mean sum mean
2000-01-01 -0.289838 -0.289838 -0.370545 -0.370545 -1.284206 -1.284206
2000-01-02 -0.216612 -0.108306 -1.675528 -0.837764 -1.169415 -0.584708
2000-01-03 1.154661 0.384887 -1.634017 -0.544672 -1.566620 -0.522207
2000-01-04 2.969393 0.742348 -4.003274 -1.000819 -1.816179 -0.454045
2000-01-05 4.690630 0.938126 -4.682017 -0.936403 -2.717209 -0.543442
2000-01-06 3.880630 0.646772 -4.447700 -0.741283 -1.078947 -0.179825
2000-01-07 4.001957 0.571708 -2.884072 -0.412010 -3.116903 -0.445272
... ... ... ... ... ... ...
2002-09-20 2.652493 0.044208 -10.528875 -0.175481 9.867805 0.164463
2002-09-21 0.844497 0.014075 -9.280944 -0.154682 9.522649 0.158711
2002-09-22 2.860036 0.047667 -9.270337 -0.154506 6.415245 0.106921
2002-09-23 3.510163 0.058503 -8.151439 -0.135857 5.177219 0.086287
2002-09-24 6.524983 0.108750 -10.168078 -0.169468 5.792639 0.096544
2002-09-25 6.409626 0.106827 -9.956226 -0.165937 5.704050 0.095068
2002-09-26 5.093787 0.084896 -7.074515 -0.117909 6.905823 0.115097
[1000 rows x 6 columns]
Passing a dict of functions has different behavior by default, see the next section.
Applying different functions to DataFrame columns
By passing a dict to aggregate
you can apply a different aggregation to the columns of a DataFrame:
In [93]: r.agg({'A' : np.sum,
....: 'B' : lambda x: np.std(x, ddof=1)})
....:
Out[93]:
A B
2000-01-01 -0.289838 NaN
2000-01-02 -0.216612 0.660747
2000-01-03 1.154661 0.689929
2000-01-04 2.969393 1.072199
2000-01-05 4.690630 0.939657
2000-01-06 3.880630 0.966848
2000-01-07 4.001957 1.240137
... ... ...
2002-09-20 2.652493 1.114814
2002-09-21 0.844497 1.113220
2002-09-22 2.860036 1.113208
2002-09-23 3.510163 1.132381
2002-09-24 6.524983 1.080963
2002-09-25 6.409626 1.082911
2002-09-26 5.093787 1.136199
[1000 rows x 2 columns]
The function names can also be strings. In order for a string to be valid it must be implemented on the windowed object
In [94]: r.agg({'A' : 'sum', 'B' : 'std'})
Out[94]:
A B
2000-01-01 -0.289838 NaN
2000-01-02 -0.216612 0.660747
2000-01-03 1.154661 0.689929
2000-01-04 2.969393 1.072199
2000-01-05 4.690630 0.939657
2000-01-06 3.880630 0.966848
2000-01-07 4.001957 1.240137
... ... ...
2002-09-20 2.652493 1.114814
2002-09-21 0.844497 1.113220
2002-09-22 2.860036 1.113208
2002-09-23 3.510163 1.132381
2002-09-24 6.524983 1.080963
2002-09-25 6.409626 1.082911
2002-09-26 5.093787 1.136199
[1000 rows x 2 columns]
Furthermore you can pass a nested dict to indicate different aggregations on different columns.
In [95]: r.agg({'A' : ['sum','std'], 'B' : ['mean','std'] })
Out[95]:
A B
sum std mean std
2000-01-01 -0.289838 NaN -0.370545 NaN
2000-01-02 -0.216612 0.256725 -0.837764 0.660747
2000-01-03 1.154661 0.873311 -0.544672 0.689929
2000-01-04 2.969393 1.009734 -1.000819 1.072199
2000-01-05 4.690630 0.977914 -0.936403 0.939657
2000-01-06 3.880630 1.128883 -0.741283 0.966848
2000-01-07 4.001957 1.049487 -0.412010 1.240137
... ... ... ... ...
2002-09-20 2.652493 1.164919 -0.175481 1.114814
2002-09-21 0.844497 1.148231 -0.154682 1.113220
2002-09-22 2.860036 1.132051 -0.154506 1.113208
2002-09-23 3.510163 1.134296 -0.135857 1.132381
2002-09-24 6.524983 1.144204 -0.169468 1.080963
2002-09-25 6.409626 1.142913 -0.165937 1.082911
2002-09-26 5.093787 1.151416 -0.117909 1.136199
[1000 rows x 4 columns]