风格(Idioms)

These are some neat pandas idioms

if-then/if-then-else on one column, and assignment to another one or more columns:

  1. In [1]: df = pd.DataFrame(
  2. ...: {'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}); df
  3. ...:
  4. Out[1]:
  5. AAA BBB CCC
  6. 0 4 10 100
  7. 1 5 20 50
  8. 2 6 30 -30
  9. 3 7 40 -50

if-then…

An if-then on one column

  1. In [2]: df.loc[df.AAA >= 5,'BBB'] = -1; df
  2. Out[2]:
  3. AAA BBB CCC
  4. 0 4 10 100
  5. 1 5 -1 50
  6. 2 6 -1 -30
  7. 3 7 -1 -50

An if-then with assignment to 2 columns:

  1. In [3]: df.loc[df.AAA >= 5,['BBB','CCC']] = 555; df
  2. Out[3]:
  3. AAA BBB CCC
  4. 0 4 10 100
  5. 1 5 555 555
  6. 2 6 555 555
  7. 3 7 555 555

Add another line with different logic, to do the -else

  1. In [4]: df.loc[df.AAA < 5,['BBB','CCC']] = 2000; df
  2. Out[4]:
  3. AAA BBB CCC
  4. 0 4 2000 2000
  5. 1 5 555 555
  6. 2 6 555 555
  7. 3 7 555 555

Or use pandas where after you’ve set up a mask

  1. In [5]: df_mask = pd.DataFrame({'AAA' : [True] * 4, 'BBB' : [False] * 4,'CCC' : [True,False] * 2})
  2. In [6]: df.where(df_mask,-1000)
  3. Out[6]:
  4. AAA BBB CCC
  5. 0 4 -1000 2000
  6. 1 5 -1000 -1000
  7. 2 6 -1000 555
  8. 3 7 -1000 -1000

if-then-else using numpy’s where()

  1. In [7]: df = pd.DataFrame(
  2. ...: {'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}); df
  3. ...:
  4. Out[7]:
  5. AAA BBB CCC
  6. 0 4 10 100
  7. 1 5 20 50
  8. 2 6 30 -30
  9. 3 7 40 -50
  10. In [8]: df['logic'] = np.where(df['AAA'] > 5,'high','low'); df
  11. Out[8]:
  12. AAA BBB CCC logic
  13. 0 4 10 100 low
  14. 1 5 20 50 low
  15. 2 6 30 -30 high
  16. 3 7 40 -50 high

Splitting

Split a frame with a boolean criterion

  1. In [9]: df = pd.DataFrame(
  2. ...: {'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}); df
  3. ...:
  4. Out[9]:
  5. AAA BBB CCC
  6. 0 4 10 100
  7. 1 5 20 50
  8. 2 6 30 -30
  9. 3 7 40 -50
  10. In [10]: dflow = df[df.AAA <= 5]; dflow
  11. Out[10]:
  12. AAA BBB CCC
  13. 0 4 10 100
  14. 1 5 20 50
  15. In [11]: dfhigh = df[df.AAA > 5]; dfhigh
  16. Out[11]:
  17. AAA BBB CCC
  18. 2 6 30 -30
  19. 3 7 40 -50

Building Criteria

Select with multi-column criteria

  1. In [12]: df = pd.DataFrame(
  2. ....: {'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}); df
  3. ....:
  4. Out[12]:
  5. AAA BBB CCC
  6. 0 4 10 100
  7. 1 5 20 50
  8. 2 6 30 -30
  9. 3 7 40 -50

…and (without assignment returns a Series)

  1. In [13]: newseries = df.loc[(df['BBB'] < 25) & (df['CCC'] >= -40), 'AAA']; newseries
  2. Out[13]:
  3. 0 4
  4. 1 5
  5. Name: AAA, dtype: int64

…or (without assignment returns a Series)

  1. In [14]: newseries = df.loc[(df['BBB'] > 25) | (df['CCC'] >= -40), 'AAA']; newseries;

…or (with assignment modifies the DataFrame.)

  1. In [15]: df.loc[(df['BBB'] > 25) | (df['CCC'] >= 75), 'AAA'] = 0.1; df
  2. Out[15]:
  3. AAA BBB CCC
  4. 0 0.1 10 100
  5. 1 5.0 20 50
  6. 2 0.1 30 -30
  7. 3 0.1 40 -50

Select rows with data closest to certain value using argsort

  1. In [16]: df = pd.DataFrame(
  2. ....: {'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}); df
  3. ....:
  4. Out[16]:
  5. AAA BBB CCC
  6. 0 4 10 100
  7. 1 5 20 50
  8. 2 6 30 -30
  9. 3 7 40 -50
  10. In [17]: aValue = 43.0
  11. In [18]: df.loc[(df.CCC-aValue).abs().argsort()]
  12. Out[18]:
  13. AAA BBB CCC
  14. 1 5 20 50
  15. 0 4 10 100
  16. 2 6 30 -30
  17. 3 7 40 -50

Dynamically reduce a list of criteria using a binary operators

  1. In [19]: df = pd.DataFrame(
  2. ....: {'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}); df
  3. ....:
  4. Out[19]:
  5. AAA BBB CCC
  6. 0 4 10 100
  7. 1 5 20 50
  8. 2 6 30 -30
  9. 3 7 40 -50
  10. In [20]: Crit1 = df.AAA <= 5.5
  11. In [21]: Crit2 = df.BBB == 10.0
  12. In [22]: Crit3 = df.CCC > -40.0

One could hard code:

  1. In [23]: AllCrit = Crit1 & Crit2 & Crit3

…Or it can be done with a list of dynamically built criteria

  1. In [24]: CritList = [Crit1,Crit2,Crit3]
  2. In [25]: AllCrit = functools.reduce(lambda x,y: x & y, CritList)
  3. In [26]: df[AllCrit]
  4. Out[26]:
  5. AAA BBB CCC
  6. 0 4 10 100