设置/重置索引

Occasionally you will load or create a data set into a DataFrame and want to add an index after you’ve already done so. There are a couple of different ways.

Set an index

DataFrame has a set_index() method which takes a column name (for a regular Index) or a list of column names (for a MultiIndex). To create a new, re-indexed DataFrame:

  1. In [324]: data
  2. Out[324]:
  3. a b c d
  4. 0 bar one z 1.0
  5. 1 bar two y 2.0
  6. 2 foo one x 3.0
  7. 3 foo two w 4.0
  8. In [325]: indexed1 = data.set_index('c')
  9. In [326]: indexed1
  10. Out[326]:
  11. a b d
  12. c
  13. z bar one 1.0
  14. y bar two 2.0
  15. x foo one 3.0
  16. w foo two 4.0
  17. In [327]: indexed2 = data.set_index(['a', 'b'])
  18. In [328]: indexed2
  19. Out[328]:
  20. c d
  21. a b
  22. bar one z 1.0
  23. two y 2.0
  24. foo one x 3.0
  25. two w 4.0

The append keyword option allow you to keep the existing index and append the given columns to a MultiIndex:

  1. In [329]: frame = data.set_index('c', drop=False)
  2. In [330]: frame = frame.set_index(['a', 'b'], append=True)
  3. In [331]: frame
  4. Out[331]:
  5. c d
  6. c a b
  7. z bar one z 1.0
  8. y bar two y 2.0
  9. x foo one x 3.0
  10. w foo two w 4.0

Other options in set_index allow you not drop the index columns or to add the index in-place (without creating a new object):

  1. In [332]: data.set_index('c', drop=False)
  2. Out[332]:
  3. a b c d
  4. c
  5. z bar one z 1.0
  6. y bar two y 2.0
  7. x foo one x 3.0
  8. w foo two w 4.0
  9. In [333]: data.set_index(['a', 'b'], inplace=True)
  10. In [334]: data
  11. Out[334]:
  12. c d
  13. a b
  14. bar one z 1.0
  15. two y 2.0
  16. foo one x 3.0
  17. two w 4.0

Reset the index

As a convenience, there is a new function on DataFrame called reset_index() which transfers the index values into the DataFrame’s columns and sets a simple integer index. This is the inverse operation of set_index().

  1. In [335]: data
  2. Out[335]:
  3. c d
  4. a b
  5. bar one z 1.0
  6. two y 2.0
  7. foo one x 3.0
  8. two w 4.0
  9. In [336]: data.reset_index()
  10. Out[336]:
  11. a b c d
  12. 0 bar one z 1.0
  13. 1 bar two y 2.0
  14. 2 foo one x 3.0
  15. 3 foo two w 4.0

The output is more similar to a SQL table or a record array. The names for the columns derived from the index are the ones stored in the names attribute.

You can use the level keyword to remove only a portion of the index:

  1. In [337]: frame
  2. Out[337]:
  3. c d
  4. c a b
  5. z bar one z 1.0
  6. y bar two y 2.0
  7. x foo one x 3.0
  8. w foo two w 4.0
  9. In [338]: frame.reset_index(level=1)
  10. Out[338]:
  11. a c d
  12. c b
  13. z one bar z 1.0
  14. y two bar y 2.0
  15. x one foo x 3.0
  16. w two foo w 4.0

reset_index takes an optional parameter drop which if true simply discards the index, instead of putting index values in the DataFrame’s columns.

Adding an ad hoc index

If you create an index yourself, you can just assign it to the index field:

  1. data.index = index