按位置选择

Warning

Whether a copy or a reference is returned for a setting operation, may depend on the context. This is sometimes called chained assignment and should be avoided. See Returning a View versus Copy.

Pandas provides a suite of methods in order to get purely integer based indexing. The semantics follow closely Python and NumPy slicing. These are 0-based indexing. When slicing, the start bounds is included, while the upper bound is excluded. Trying to use a non-integer, even a valid label will raise an IndexError.

The .iloc attribute is the primary access method. The following are valid inputs:

  • An integer e.g. 5.
  • A list or array of integers [4, 3, 0].
  • A slice object with ints 1:7.
  • A boolean array.
  • A callable, see Selection By Callable.
  1. In [60]: s1 = pd.Series(np.random.randn(5), index=list(range(0,10,2)))
  2. In [61]: s1
  3. Out[61]:
  4. 0 0.695775
  5. 2 0.341734
  6. 4 0.959726
  7. 6 -1.110336
  8. 8 -0.619976
  9. dtype: float64
  10. In [62]: s1.iloc[:3]
  11. Out[62]:
  12. 0 0.695775
  13. 2 0.341734
  14. 4 0.959726
  15. dtype: float64
  16. In [63]: s1.iloc[3]
  17. Out[63]: -1.1103361028911669

Note that setting works as well:

  1. In [64]: s1.iloc[:3] = 0
  2. In [65]: s1
  3. Out[65]:
  4. 0 0.000000
  5. 2 0.000000
  6. 4 0.000000
  7. 6 -1.110336
  8. 8 -0.619976
  9. dtype: float64

With a DataFrame:

  1. In [66]: df1 = pd.DataFrame(np.random.randn(6,4),
  2. ....: index=list(range(0,12,2)),
  3. ....: columns=list(range(0,8,2)))
  4. ....:
  5. In [67]: df1
  6. Out[67]:
  7. 0 2 4 6
  8. 0 0.149748 -0.732339 0.687738 0.176444
  9. 2 0.403310 -0.154951 0.301624 -2.179861
  10. 4 -1.369849 -0.954208 1.462696 -1.743161
  11. 6 -0.826591 -0.345352 1.314232 0.690579
  12. 8 0.995761 2.396780 0.014871 3.357427
  13. 10 -0.317441 -1.236269 0.896171 -0.487602

Select via integer slicing:

  1. In [68]: df1.iloc[:3]
  2. Out[68]:
  3. 0 2 4 6
  4. 0 0.149748 -0.732339 0.687738 0.176444
  5. 2 0.403310 -0.154951 0.301624 -2.179861
  6. 4 -1.369849 -0.954208 1.462696 -1.743161
  7. In [69]: df1.iloc[1:5, 2:4]
  8. Out[69]:
  9. 4 6
  10. 2 0.301624 -2.179861
  11. 4 1.462696 -1.743161
  12. 6 1.314232 0.690579
  13. 8 0.014871 3.357427

Select via integer list:

  1. In [70]: df1.iloc[[1, 3, 5], [1, 3]]
  2. Out[70]:
  3. 2 6
  4. 2 -0.154951 -2.179861
  5. 6 -0.345352 0.690579
  6. 10 -1.236269 -0.487602
  1. In [71]: df1.iloc[1:3, :]
  2. Out[71]:
  3. 0 2 4 6
  4. 2 0.403310 -0.154951 0.301624 -2.179861
  5. 4 -1.369849 -0.954208 1.462696 -1.743161
  1. In [72]: df1.iloc[:, 1:3]
  2. Out[72]:
  3. 2 4
  4. 0 -0.732339 0.687738
  5. 2 -0.154951 0.301624
  6. 4 -0.954208 1.462696
  7. 6 -0.345352 1.314232
  8. 8 2.396780 0.014871
  9. 10 -1.236269 0.896171
  1. # this is also equivalent to ``df1.iat[1,1]``
  2. In [73]: df1.iloc[1, 1]
  3. Out[73]: -0.15495077442490321

For getting a cross section using an integer position (equiv to df.xs(1)):

  1. In [74]: df1.iloc[1]
  2. Out[74]:
  3. 0 0.403310
  4. 2 -0.154951
  5. 4 0.301624
  6. 6 -2.179861
  7. Name: 2, dtype: float64

Out of range slice indexes are handled gracefully just as in Python/Numpy.

  1. # these are allowed in python/numpy.
  2. In [75]: x = list('abcdef')
  3. In [76]: x
  4. Out[76]: ['a', 'b', 'c', 'd', 'e', 'f']
  5. In [77]: x[4:10]
  6. Out[77]: ['e', 'f']
  7. In [78]: x[8:10]
  8. Out[78]: []
  9. In [79]: s = pd.Series(x)
  10. In [80]: s
  11. Out[80]:
  12. 0 a
  13. 1 b
  14. 2 c
  15. 3 d
  16. 4 e
  17. 5 f
  18. dtype: object
  19. In [81]: s.iloc[4:10]
  20. Out[81]:
  21. 4 e
  22. 5 f
  23. dtype: object
  24. In [82]: s.iloc[8:10]
  25. Out[82]: Series([], dtype: object)

Note that using slices that go out of bounds can result in an empty axis (e.g. an empty DataFrame being returned).

  1. In [83]: dfl = pd.DataFrame(np.random.randn(5,2), columns=list('AB'))
  2. In [84]: dfl
  3. Out[84]:
  4. A B
  5. 0 -0.082240 -2.182937
  6. 1 0.380396 0.084844
  7. 2 0.432390 1.519970
  8. 3 -0.493662 0.600178
  9. 4 0.274230 0.132885
  10. In [85]: dfl.iloc[:, 2:3]
  11. Out[85]:
  12. Empty DataFrame
  13. Columns: []
  14. Index: [0, 1, 2, 3, 4]
  15. In [86]: dfl.iloc[:, 1:3]
  16. Out[86]:
  17. B
  18. 0 -2.182937
  19. 1 0.084844
  20. 2 1.519970
  21. 3 0.600178
  22. 4 0.132885
  23. In [87]: dfl.iloc[4:6]
  24. Out[87]:
  25. A B
  26. 4 0.27423 0.132885

A single indexer that is out of bounds will raise an IndexError. A list of indexers where any element is out of bounds will raise an IndexError.

  1. dfl.iloc[[4, 5, 6]]
  2. IndexError: positional indexers are out-of-bounds
  3. dfl.iloc[:, 4]
  4. IndexError: single positional indexer is out-of-bounds