pandas

What it is good for?

Analyze tabular data.

pandas is an extremely powerful library to analyze, combine and manipulate data in many thinkable (and some unthinkable) ways. The tables called DataFrame have many similarities to R. DataFrames have an index column and functions for plotting and reading from CSV or Excel files are included by default. Pandas uses numpy under the hood.

Installed with Python by default

no

Installed with Anaconda

yes

How to install it?

  1. pip install pandas

Example

Create a table with characters and numbers.:

  1. import pandas as pd
  2. hamlet = [['Hamlet', 1.76], ['Polonius', 1.52], ['Ophelia', 1.83], ['Claudius', 1.95]]
  3. df = pd.DataFrame(data = hamlet, columns = ['name', 'size'])
  4. print(df)
  5. name size
  6. 0 Hamlet 1.76
  7. 1 Polonius 1.52
  8. 2 Ophelia 1.83
  9. 3 Claudius 1.95

Sorted lines by name, filter by minimum size, print first two values and write a CSV file:

  1. sorted = df.sort_values(by='name', ascending=False)
  2. tall = sorted[sorted['size'] > 1.70]
  3. print(tall.head(2))
  4. name size
  5. 3 Claudius 1.95
  6. 2 Ophelia 1.83
  7. df.to_csv('hamlet.csv', index=False, header=True)

Where to learn more?

http://pandas.pydata.org/