基本说明

As mentioned when introducing the data structures in the last section, the primary function of indexing with [] (a.k.a. __getitem__ for those familiar with implementing class behavior in Python) is selecting out lower-dimensional slices. The following table shows return type values when indexing pandas objects with []:

Object Type | Selection | Return Value Type Series | series[label] | scalar value DataFrame | frame[colname] | Series corresponding to colname Panel | panel[itemname] | DataFrame corresponding to the itemname

Here we construct a simple time series data set to use for illustrating the indexing functionality:

  1. In [1]: dates = pd.date_range('1/1/2000', periods=8)
  2. In [2]: df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
  3. In [3]: df
  4. Out[3]:
  5. A B C D
  6. 2000-01-01 0.469112 -0.282863 -1.509059 -1.135632
  7. 2000-01-02 1.212112 -0.173215 0.119209 -1.044236
  8. 2000-01-03 -0.861849 -2.104569 -0.494929 1.071804
  9. 2000-01-04 0.721555 -0.706771 -1.039575 0.271860
  10. 2000-01-05 -0.424972 0.567020 0.276232 -1.087401
  11. 2000-01-06 -0.673690 0.113648 -1.478427 0.524988
  12. 2000-01-07 0.404705 0.577046 -1.715002 -1.039268
  13. 2000-01-08 -0.370647 -1.157892 -1.344312 0.844885
  14. In [4]: panel = pd.Panel({'one' : df, 'two' : df - df.mean()})
  15. In [5]: panel
  16. Out[5]:
  17. <class 'pandas.core.panel.Panel'>
  18. Dimensions: 2 (items) x 8 (major_axis) x 4 (minor_axis)
  19. Items axis: one to two
  20. Major_axis axis: 2000-01-01 00:00:00 to 2000-01-08 00:00:00
  21. Minor_axis axis: A to D

Note: None of the indexing functionality is time series specific unless specifically stated.

Thus, as per above, we have the most basic indexing using []:

  1. In [6]: s = df['A']
  2. In [7]: s[dates[5]]
  3. Out[7]: -0.67368970808837059
  4. In [8]: panel['two']
  5. Out[8]:
  6. A B C D
  7. 2000-01-01 0.409571 0.113086 -0.610826 -0.936507
  8. 2000-01-02 1.152571 0.222735 1.017442 -0.845111
  9. 2000-01-03 -0.921390 -1.708620 0.403304 1.270929
  10. 2000-01-04 0.662014 -0.310822 -0.141342 0.470985
  11. 2000-01-05 -0.484513 0.962970 1.174465 -0.888276
  12. 2000-01-06 -0.733231 0.509598 -0.580194 0.724113
  13. 2000-01-07 0.345164 0.972995 -0.816769 -0.840143
  14. 2000-01-08 -0.430188 -0.761943 -0.446079 1.044010

You can pass a list of columns to [] to select columns in that order. If a column is not contained in the DataFrame, an exception will be raised. Multiple columns can also be set in this manner:

  1. In [9]: df
  2. Out[9]:
  3. A B C D
  4. 2000-01-01 0.469112 -0.282863 -1.509059 -1.135632
  5. 2000-01-02 1.212112 -0.173215 0.119209 -1.044236
  6. 2000-01-03 -0.861849 -2.104569 -0.494929 1.071804
  7. 2000-01-04 0.721555 -0.706771 -1.039575 0.271860
  8. 2000-01-05 -0.424972 0.567020 0.276232 -1.087401
  9. 2000-01-06 -0.673690 0.113648 -1.478427 0.524988
  10. 2000-01-07 0.404705 0.577046 -1.715002 -1.039268
  11. 2000-01-08 -0.370647 -1.157892 -1.344312 0.844885
  12. In [10]: df[['B', 'A']] = df[['A', 'B']]
  13. In [11]: df
  14. Out[11]:
  15. A B C D
  16. 2000-01-01 -0.282863 0.469112 -1.509059 -1.135632
  17. 2000-01-02 -0.173215 1.212112 0.119209 -1.044236
  18. 2000-01-03 -2.104569 -0.861849 -0.494929 1.071804
  19. 2000-01-04 -0.706771 0.721555 -1.039575 0.271860
  20. 2000-01-05 0.567020 -0.424972 0.276232 -1.087401
  21. 2000-01-06 0.113648 -0.673690 -1.478427 0.524988
  22. 2000-01-07 0.577046 0.404705 -1.715002 -1.039268
  23. 2000-01-08 -1.157892 -0.370647 -1.344312 0.844885

You may find this useful for applying a transform (in-place) to a subset of the columns.

警告

pandas aligns all AXES when setting Series and DataFrame from .loc, and .iloc. This will not modify df because the column alignment is before value assignment.

  1. In [12]: df[['A', 'B']]
  2. Out[12]:
  3. A B
  4. 2000-01-01 -0.282863 0.469112
  5. 2000-01-02 -0.173215 1.212112
  6. 2000-01-03 -2.104569 -0.861849
  7. 2000-01-04 -0.706771 0.721555
  8. 2000-01-05 0.567020 -0.424972
  9. 2000-01-06 0.113648 -0.673690
  10. 2000-01-07 0.577046 0.404705
  11. 2000-01-08 -1.157892 -0.370647
  12. In [13]: df.loc[:,['B', 'A']] = df[['A', 'B']]

The correct way to swap column values is by using raw values:

  1. In [15]: df.loc[:,['B', 'A']] = df[['A', 'B']].values