序列(Series)

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:

  1. >>> s = pd.Series(data, index=index)

Here, data can be many different things:

  • a Python dict
  • an ndarray
  • a scalar value (like 5)

The passed index is a list of axis labels. Thus, this separates into a few cases depending on what data is:

From ndarray

If data is an ndarray, index must be the same length as data. If no index is passed, one will be created having values [0, …, len(data) - 1].

  1. In [3]: s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
  2. In [4]: s
  3. Out[4]:
  4. a 0.4691
  5. b -0.2829
  6. c -1.5091
  7. d -1.1356
  8. e 1.2121
  9. dtype: float64
  10. In [5]: s.index
  11. Out[5]: Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
  12. In [6]: pd.Series(np.random.randn(5))
  13. Out[6]:
  14. 0 -0.1732
  15. 1 0.1192
  16. 2 -1.0442
  17. 3 -0.8618
  18. 4 -2.1046
  19. dtype: float64

Note: pandas supports non-unique index values. If an operation that does not support duplicate index values is attempted, an exception will be raised at that time. The reason for being lazy is nearly all performance-based (there are many instances in computations, like parts of GroupBy, where the index is not used).

From dict

Series can be instantiated from dicts:

  1. In [7]: d = {'b' : 1, 'a' : 0, 'c' : 2}
  2. In [8]: pd.Series(d)
  3. Out[8]:
  4. b 1
  5. a 0
  6. c 2
  7. dtype: int64

Note: When the data is a dict, and an index is not passed, the Series index will be ordered by the dict’s insertion order, if you’re using Python version >= 3.6 and Pandas version >= 0.23. If you’re using Python < 3.6 or Pandas < 0.23, and an index is not passed, the Series index will be the lexically ordered list of dict keys.

In the example above, if you were on a Python version lower than 3.6 or a Pandas version lower than 0.23, the Series would be ordered by the lexical order of the dict keys (i.e. ['a', 'b', 'c'] rather than ['b', 'a', 'c']).

If an index is passed, the values in data corresponding to the labels in the index will be pulled out.

  1. In [9]: d = {'a' : 0., 'b' : 1., 'c' : 2.}
  2. In [10]: pd.Series(d)
  3. Out[10]:
  4. a 0.0
  5. b 1.0
  6. c 2.0
  7. dtype: float64
  8. In [11]: pd.Series(d, index=['b', 'c', 'd', 'a'])
  9. Out[11]:
  10. b 1.0
  11. c 2.0
  12. d NaN
  13. a 0.0
  14. dtype: float64

Note: NaN (not a number) is the standard missing data marker used in pandas.

From scalar value

If data is a scalar value, an index must be provided. The value will be repeated to match the length of index.

  1. In [12]: pd.Series(5., index=['a', 'b', 'c', 'd', 'e'])
  2. Out[12]:
  3. a 5.0
  4. b 5.0
  5. c 5.0
  6. d 5.0
  7. e 5.0
  8. dtype: float64

Series is ndarray-like

Series acts very similarly to a ndarray, and is a valid argument to most NumPy functions. However, operations such as slicing will also slice the index.

  1. In [13]: s[0]
  2. Out[13]: 0.46911229990718628
  3. In [14]: s[:3]
  4. Out[14]:
  5. a 0.4691
  6. b -0.2829
  7. c -1.5091
  8. dtype: float64
  9. In [15]: s[s > s.median()]
  10. Out[15]:
  11. a 0.4691
  12. e 1.2121
  13. dtype: float64
  14. In [16]: s[[4, 3, 1]]
  15. Out[16]:
  16. e 1.2121
  17. d -1.1356
  18. b -0.2829
  19. dtype: float64
  20. In [17]: np.exp(s)
  21. Out[17]:
  22. a 1.5986
  23. b 0.7536
  24. c 0.2211
  25. d 0.3212
  26. e 3.3606
  27. dtype: float64

We will address array-based indexing in a separate section.

Series is dict-like

A Series is like a fixed-size dict in that you can get and set values by index label:

  1. In [18]: s['a']
  2. Out[18]: 0.46911229990718628
  3. In [19]: s['e'] = 12.
  4. In [20]: s
  5. Out[20]:
  6. a 0.4691
  7. b -0.2829
  8. c -1.5091
  9. d -1.1356
  10. e 12.0000
  11. dtype: float64
  12. In [21]: 'e' in s
  13. Out[21]: True
  14. In [22]: 'f' in s
  15. Out[22]: False

If a label is not contained, an exception is raised:

  1. >>> s['f']
  2. KeyError: 'f'

Using the get method, a missing label will return None or specified default:

  1. In [23]: s.get('f')
  2. In [24]: s.get('f', np.nan)
  3. Out[24]: nan

See also the section on attribute access.

Vectorized operations and label alignment with Series

When working with raw NumPy arrays, looping through value-by-value is usually not necessary. The same is true when working with Series in pandas. Series can also be passed into most NumPy methods expecting an ndarray.

  1. In [25]: s + s
  2. Out[25]:
  3. a 0.9382
  4. b -0.5657
  5. c -3.0181
  6. d -2.2713
  7. e 24.0000
  8. dtype: float64
  9. In [26]: s * 2
  10. Out[26]:
  11. a 0.9382
  12. b -0.5657
  13. c -3.0181
  14. d -2.2713
  15. e 24.0000
  16. dtype: float64
  17. In [27]: np.exp(s)
  18. Out[27]:
  19. a 1.5986
  20. b 0.7536
  21. c 0.2211
  22. d 0.3212
  23. e 162754.7914
  24. dtype: float64

A key difference between Series and ndarray is that operations between Series automatically align the data based on label. Thus, you can write computations without giving consideration to whether the Series involved have the same labels.

  1. In [28]: s[1:] + s[:-1]
  2. Out[28]:
  3. a NaN
  4. b -0.5657
  5. c -3.0181
  6. d -2.2713
  7. e NaN
  8. dtype: float64

The result of an operation between unaligned Series will have the union of the indexes involved. If a label is not found in one Series or the other, the result will be marked as missing NaN. Being able to write code without doing any explicit data alignment grants immense freedom and flexibility in interactive data analysis and research. The integrated data alignment features of the pandas data structures set pandas apart from the majority of related tools for working with labeled data.

Note: In general, we chose to make the default result of operations between differently indexed objects yield the union of the indexes in order to avoid loss of information. Having an index label, though the data is missing, is typically important information as part of a computation. You of course have the option of dropping labels with missing data via the dropna function.

Name attribute

Series can also have a name attribute:

  1. In [29]: s = pd.Series(np.random.randn(5), name='something')
  2. In [30]: s
  3. Out[30]:
  4. 0 -0.4949
  5. 1 1.0718
  6. 2 0.7216
  7. 3 -0.7068
  8. 4 -1.0396
  9. Name: something, dtype: float64
  10. In [31]: s.name
  11. Out[31]: 'something'

The Series name will be assigned automatically in many cases, in particular when taking 1D slices of DataFrame as you will see below.

New in version 0.18.0.

You can rename a Series with the pandas.Series.rename() method.

  1. In [32]: s2 = s.rename("different")
  2. In [33]: s2.name
  3. Out[33]: 'different'

Note that s and s2 refer to different objects.