CSV 文件和 csv 模块¶

标准库中有自带的 csv (逗号分隔值) 模块处理 csv 格式的文件:

In [1]:

  1. import csv

读 csv 文件¶

假设我们有这样的一个文件:

In [2]:

  1. %%file data.csv
  2. "alpha 1", 100, -1.443
  3. "beat 3", 12, -0.0934
  4. "gamma 3a", 192, -0.6621
  5. "delta 2a", 15, -4.515
  1. Writing data.csv

打开这个文件,并产生一个文件 reader:

In [3]:

  1. fp = open("data.csv")
  2. r = csv.reader(fp)

可以按行迭代数据:

In [4]:

  1. for row in r:
  2. print row
  3.  
  4. fp.close()
  1. ['alpha 1', ' 100', ' -1.443']
  2. ['beat 3', ' 12', ' -0.0934']
  3. ['gamma 3a', ' 192', ' -0.6621']
  4. ['delta 2a', ' 15', ' -4.515']

默认数据内容都被当作字符串处理,不过可以自己进行处理:

In [5]:

  1. data = []
  2.  
  3. with open('data.csv') as fp:
  4. r = csv.reader(fp)
  5. for row in r:
  6. data.append([row[0], int(row[1]), float(row[2])])
  7.  
  8. data

Out[5]:

  1. [['alpha 1', 100, -1.443],
  2. ['beat 3', 12, -0.0934],
  3. ['gamma 3a', 192, -0.6621],
  4. ['delta 2a', 15, -4.515]]

In [6]:

  1. import os
  2. os.remove('data.csv')

写 csv 文件¶

可以使用 csv.writer 写入文件,不过相应地,传入的应该是以写方式打开的文件,不过一般要用 'wb' 即二进制写入方式,防止出现换行不正确的问题:

In [7]:

  1. data = [('one', 1, 1.5), ('two', 2, 8.0)]
  2. with open('out.csv', 'wb') as fp:
  3. w = csv.writer(fp)
  4. w.writerows(data)

显示结果:

In [8]:

  1. !cat 'out.csv'
  1. one,1,1.5
  2. two,2,8.0

更换分隔符¶

默认情况下,csv 模块默认 csv 文件都是由 excel 产生的,实际中可能会遇到这样的问题:

In [9]:

  1. data = [('one, \"real\" string', 1, 1.5), ('two', 2, 8.0)]
  2. with open('out.csv', 'wb') as fp:
  3. w = csv.writer(fp)
  4. w.writerows(data)

In [10]:

  1. !cat 'out.csv'
  1. "one, ""real"" string",1,1.5
  2. two,2,8.0

可以修改分隔符来处理这组数据:

In [11]:

  1. data = [('one, \"real\" string', 1, 1.5), ('two', 2, 8.0)]
  2. with open('out.psv', 'wb') as fp:
  3. w = csv.writer(fp, delimiter="|")
  4. w.writerows(data)

In [12]:

  1. !cat 'out.psv'
  1. "one, ""real"" string"|1|1.5
  2. two|2|8.0

In [13]:

  1. import os
  2. os.remove('out.psv')
  3. os.remove('out.csv')

其他选项¶

numpy.loadtxt()pandas.read_csv() 可以用来读写包含很多数值数据的 csv 文件:

In [14]:

  1. %%file trades.csv
  2. Order,Date,Stock,Quantity,Price
  3. A0001,2013-12-01,AAPL,1000,203.4
  4. A0002,2013-12-01,MSFT,1500,167.5
  5. A0003,2013-12-02,GOOG,1500,167.5
  1. Writing trades.csv

使用 pandas 进行处理,生成一个 DataFrame 对象:

In [15]:

  1. import pandas
  2. df = pandas.read_csv('trades.csv', index_col=0)
  3. print df
  1. Date Stock Quantity Price
  2. Order
  3. A0001 2013-12-01 AAPL 1000 203.4
  4. A0002 2013-12-01 MSFT 1500 167.5
  5. A0003 2013-12-02 GOOG 1500 167.5

通过名字进行索引:

In [16]:

  1. df['Quantity'] * df['Price']

Out[16]:

  1. Order
  2. A0001 203400
  3. A0002 251250
  4. A0003 251250
  5. dtype: float64

In [17]:

  1. import os
  2. os.remove('trades.csv')

原文: https://nbviewer.jupyter.org/github/lijin-THU/notes-python/blob/master/05-advanced-python/05.03-comma-separated-values.ipynb