合并(Merge)

The Concat docs. The Join docs.

Append two dataframes with overlapping index (emulate R rbind)

  1. In [149]: rng = pd.date_range('2000-01-01', periods=6)
  2. In [150]: df1 = pd.DataFrame(np.random.randn(6, 3), index=rng, columns=['A', 'B', 'C'])
  3. In [151]: df2 = df1.copy()

Depending on df construction, ignore_index may be needed

  1. In [152]: df = df1.append(df2,ignore_index=True); df
  2. Out[152]:
  3. A B C
  4. 0 -0.480676 -1.305282 -0.212846
  5. 1 1.979901 0.363112 -0.275732
  6. 2 -1.433852 0.580237 -0.013672
  7. 3 1.776623 -0.803467 0.521517
  8. 4 -0.302508 -0.442948 -0.395768
  9. 5 -0.249024 -0.031510 2.413751
  10. 6 -0.480676 -1.305282 -0.212846
  11. 7 1.979901 0.363112 -0.275732
  12. 8 -1.433852 0.580237 -0.013672
  13. 9 1.776623 -0.803467 0.521517
  14. 10 -0.302508 -0.442948 -0.395768
  15. 11 -0.249024 -0.031510 2.413751

Self Join of a DataFrame

  1. In [153]: df = pd.DataFrame(data={'Area' : ['A'] * 5 + ['C'] * 2,
  2. .....: 'Bins' : [110] * 2 + [160] * 3 + [40] * 2,
  3. .....: 'Test_0' : [0, 1, 0, 1, 2, 0, 1],
  4. .....: 'Data' : np.random.randn(7)});df
  5. .....:
  6. Out[153]:
  7. Area Bins Test_0 Data
  8. 0 A 110 0 -0.378914
  9. 1 A 110 1 -1.032527
  10. 2 A 160 0 -1.402816
  11. 3 A 160 1 0.715333
  12. 4 A 160 2 -0.091438
  13. 5 C 40 0 1.608418
  14. 6 C 40 1 0.753207
  15. In [154]: df['Test_1'] = df['Test_0'] - 1
  16. In [155]: pd.merge(df, df, left_on=['Bins', 'Area','Test_0'], right_on=['Bins', 'Area','Test_1'],suffixes=('_L','_R'))
  17. Out[155]:
  18. Area Bins Test_0_L Data_L Test_1_L Test_0_R Data_R Test_1_R
  19. 0 A 110 0 -0.378914 -1 1 -1.032527 0
  20. 1 A 160 0 -1.402816 -1 1 0.715333 0
  21. 2 A 160 1 0.715333 0 2 -0.091438 1
  22. 3 C 40 0 1.608418 -1 1 0.753207 0

How to set the index and join

KDB like asof join

Join with a criteria based on the values

Using searchsorted to merge based on values inside a range