6.3 Web APIs交互

许多网站都有一些通过JSON或其他格式提供数据的公共API。通过Python访问这些API的办法有不少。一个简单易用的办法(推荐)是requests包(http://docs.python-requests.org)。

为了搜索最新的30个GitHub上的pandas主题,我们可以发一个HTTP GET请求,使用requests扩展库:

  1. In [113]: import requests
  2. In [114]: url = 'https://api.github.com/repos/pandas-dev/pandas/issues'
  3. In [115]: resp = requests.get(url)
  4. In [116]: resp
  5. Out[116]: <Response [200]>

响应对象的json方法会返回一个包含被解析过的JSON字典,加载到一个Python对象中:

  1. In [117]: data = resp.json()
  2. In [118]: data[0]['title']
  3. Out[118]: 'Period does not round down for frequencies less that 1 hour'

data中的每个元素都是一个包含所有GitHub主题页数据(不包含评论)的字典。我们可以直接传递数据到DataFrame,并提取感兴趣的字段:

  1. In [119]: issues = pd.DataFrame(data, columns=['number', 'title',
  2. .....: 'labels', 'state'])
  3. In [120]: issues
  4. Out[120]:
  5. number title \
  6. 0 17666 Period does not round down for frequencies les...
  7. 1 17665 DOC: improve docstring of function where
  8. 2 17664 COMPAT: skip 32-bit test on int repr
  9. 3 17662 implement Delegator class
  10. 4 17654 BUG: Fix series rename called with str alterin...
  11. .. ... ...
  12. 25 17603 BUG: Correctly localize naive datetime strings...
  13. 26 17599 core.dtypes.generic --> cython
  14. 27 17596 Merge cdate_range functionality into bdate_range
  15. 28 17587 Time Grouper bug fix when applied for list gro...
  16. 29 17583 BUG: fix tz-aware DatetimeIndex + TimedeltaInd...
  17. labels state
  18. 0 [] open
  19. 1 [{'id': 134699, 'url': 'https://api.github.com... open
  20. 2 [{'id': 563047854, 'url': 'https://api.github.... open
  21. 3 [] open
  22. 4 [{'id': 76811, 'url': 'https://api.github.com/... open
  23. .. ... ...
  24. 25 [{'id': 76811, 'url': 'https://api.github.com/... open
  25. 26 [{'id': 49094459, 'url': 'https://api.github.c... open
  26. 27 [{'id': 35818298, 'url': 'https://api.github.c... open
  27. 28 [{'id': 233160, 'url': 'https://api.github.com... open
  28. 29 [{'id': 76811, 'url': 'https://api.github.com/... open
  29. [30 rows x 4 columns]

花费一些精力,你就可以创建一些更高级的常见的Web API的接口,返回DataFrame对象,方便进行分析。