选择器(Selection)

DataFrames

The indexing docs.

Using both row labels and value conditionals

  1. In [27]: df = pd.DataFrame(
  2. ....: {'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}); df
  3. ....:
  4. Out[27]:
  5. AAA BBB CCC
  6. 0 4 10 100
  7. 1 5 20 50
  8. 2 6 30 -30
  9. 3 7 40 -50
  10. In [28]: df[(df.AAA <= 6) & (df.index.isin([0,2,4]))]
  11. Out[28]:
  12. AAA BBB CCC
  13. 0 4 10 100
  14. 2 6 30 -30

Use loc for label-oriented slicing and iloc positional slicing

  1. In [29]: data = {'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}
  2. In [30]: df = pd.DataFrame(data=data,index=['foo','bar','boo','kar']); df
  3. Out[30]:
  4. AAA BBB CCC
  5. foo 4 10 100
  6. bar 5 20 50
  7. boo 6 30 -30
  8. kar 7 40 -50

There are 2 explicit slicing methods, with a third general case

  1. Positional-oriented (Python slicing style : exclusive of end)
  2. Label-oriented (Non-Python slicing style : inclusive of end)
  3. General (Either slicing style : depends on if the slice contains labels or positions)
  1. In [31]: df.loc['bar':'kar'] #Label
  2. Out[31]:
  3. AAA BBB CCC
  4. bar 5 20 50
  5. boo 6 30 -30
  6. kar 7 40 -50
  7. # Generic
  8. In [32]: df.iloc[0:3]
  9. Out[32]:
  10. AAA BBB CCC
  11. foo 4 10 100
  12. bar 5 20 50
  13. boo 6 30 -30
  14. In [33]: df.loc['bar':'kar']
  15. Out[33]:
  16. AAA BBB CCC
  17. bar 5 20 50
  18. boo 6 30 -30
  19. kar 7 40 -50

Ambiguity arises when an index consists of integers with a non-zero start or non-unit increment.

  1. In [34]: df2 = pd.DataFrame(data=data,index=[1,2,3,4]); #Note index starts at 1.
  2. In [35]: df2.iloc[1:3] #Position-oriented
  3. Out[35]:
  4. AAA BBB CCC
  5. 2 5 20 50
  6. 3 6 30 -30
  7. In [36]: df2.loc[1:3] #Label-oriented
  8. Out[36]:
  9. AAA BBB CCC
  10. 1 4 10 100
  11. 2 5 20 50
  12. 3 6 30 -30

Using inverse operator (~) to take the complement of a mask

  1. In [37]: df = pd.DataFrame(
  2. ....: {'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40], 'CCC' : [100,50,-30,-50]}); df
  3. ....:
  4. Out[37]:
  5. AAA BBB CCC
  6. 0 4 10 100
  7. 1 5 20 50
  8. 2 6 30 -30
  9. 3 7 40 -50
  10. In [38]: df[~((df.AAA <= 6) & (df.index.isin([0,2,4])))]
  11. Out[38]:
  12. AAA BBB CCC
  13. 1 5 20 50
  14. 3 7 40 -50

Panels

Extend a panel frame by transposing, adding a new dimension, and transposing back to the original dimensions

  1. In [39]: rng = pd.date_range('1/1/2013',periods=100,freq='D')
  2. In [40]: data = np.random.randn(100, 4)
  3. In [41]: cols = ['A','B','C','D']
  4. In [42]: df1, df2, df3 = pd.DataFrame(data, rng, cols), pd.DataFrame(data, rng, cols), pd.DataFrame(data, rng, cols)
  5. In [43]: pf = pd.Panel({'df1':df1,'df2':df2,'df3':df3});pf
  6. Out[43]:
  7. <class 'pandas.core.panel.Panel'>
  8. Dimensions: 3 (items) x 100 (major_axis) x 4 (minor_axis)
  9. Items axis: df1 to df3
  10. Major_axis axis: 2013-01-01 00:00:00 to 2013-04-10 00:00:00
  11. Minor_axis axis: A to D
  12. In [44]: pf.loc[:,:,'F'] = pd.DataFrame(data, rng, cols);pf
  13. Out[44]:
  14. <class 'pandas.core.panel.Panel'>
  15. Dimensions: 3 (items) x 100 (major_axis) x 5 (minor_axis)
  16. Items axis: df1 to df3
  17. Major_axis axis: 2013-01-01 00:00:00 to 2013-04-10 00:00:00
  18. Minor_axis axis: A to F

Mask a panel by using np.where and then reconstructing the panel with the new masked values

New Columns

Efficiently and dynamically creating new columns using applymap

  1. In [45]: df = pd.DataFrame(
  2. ....: {'AAA' : [1,2,1,3], 'BBB' : [1,1,2,2], 'CCC' : [2,1,3,1]}); df
  3. ....:
  4. Out[45]:
  5. AAA BBB CCC
  6. 0 1 1 2
  7. 1 2 1 1
  8. 2 1 2 3
  9. 3 3 2 1
  10. In [46]: source_cols = df.columns # or some subset would work too.
  11. In [47]: new_cols = [str(x) + "_cat" for x in source_cols]
  12. In [48]: categories = {1 : 'Alpha', 2 : 'Beta', 3 : 'Charlie' }
  13. In [49]: df[new_cols] = df[source_cols].applymap(categories.get);df
  14. Out[49]:
  15. AAA BBB CCC AAA_cat BBB_cat CCC_cat
  16. 0 1 1 2 Alpha Alpha Beta
  17. 1 2 1 1 Beta Alpha Alpha
  18. 2 1 2 3 Alpha Beta Charlie
  19. 3 3 2 1 Charlie Beta Alpha

Keep other columns when using min() with groupby

  1. In [50]: df = pd.DataFrame(
  2. ....: {'AAA' : [1,1,1,2,2,2,3,3], 'BBB' : [2,1,3,4,5,1,2,3]}); df
  3. ....:
  4. Out[50]:
  5. AAA BBB
  6. 0 1 2
  7. 1 1 1
  8. 2 1 3
  9. 3 2 4
  10. 4 2 5
  11. 5 2 1
  12. 6 3 2
  13. 7 3 3

Method 1 : idxmin() to get the index of the mins

  1. In [51]: df.loc[df.groupby("AAA")["BBB"].idxmin()]
  2. Out[51]:
  3. AAA BBB
  4. 1 1 1
  5. 5 2 1
  6. 6 3 2

Method 2 : sort then take first of each

  1. In [52]: df.sort_values(by="BBB").groupby("AAA", as_index=False).first()
  2. Out[52]:
  3. AAA BBB
  4. 0 1 1
  5. 1 2 1
  6. 2 3 2

Notice the same results, with the exception of the index.