3.7 Calling a Web API

In the previous section we explained how to download individual files from the Internet. Another way data can come from the Internet is through a web API, which stands for Application Programming Interface. The number of APIs that are being offered by organizations is growing at increasing rate, which means a lot of interesting data for us data scientists.

Web APIs are not meant to be presented in nice layout, such as websites. Instead, most web APIs return data in a structured format, such as JSON or XML. Having data in a structured form has the advantage that the data can be easily processed by other tools, such as jq. For example, the API from https://randomuser.me returns data in the following JSON structure.

  1. $ curl -s https://randomuser.me/api/1.2/ | jq .
  2. {
  3. "results": [
  4. {
  5. "gender": "male",
  6. "name": {
  7. "title": "mr",
  8. "first": "jeffrey",
  9. "last": "lawson"
  10. },
  11. "location": {
  12. "street": "838 miller ave",
  13. "city": "washington",
  14. "state": "maryland",
  15. "postcode": 81831,
  16. "coordinates": {
  17. "latitude": "81.9488",
  18. "longitude": "-67.8247"
  19. },
  20. "timezone": {
  21. "offset": "+4:00",
  22. "description": "Abu Dhabi, Muscat, Baku, Tbilisi"
  23. }
  24. },
  25. "email": "jeffrey.lawson@example.com",
  26. "login": {
  27. "uuid": "78918f6c-2658-4915-bebf-bfaa61a1624c",
  28. "username": "silverzebra774",
  29. "password": "treble",
  30. "salt": "iAtIKhvB",
  31. "md5": "4c02abeca4d6ca4dbfc0ddb33dcef29f",
  32. "sha1": "36e109513abf73df460cead89b78c749abe908fa",
  33. "sha256": "0155d9e6cabedfc3ad0f21d18b3ca3e738a8f17811dd57dc3b4dd386cd021963"
  34. },
  35. "dob": {
  36. "date": "1996-07-04T02:49:46Z",
  37. "age": 22
  38. },
  39. "registered": {
  40. "date": "2013-01-13T13:37:21Z",
  41. "age": 5
  42. },
  43. "phone": "(406)-041-2792",
  44. "cell": "(831)-085-8264",
  45. "id": {
  46. "name": "SSN",
  47. "value": "629-40-9671"
  48. },
  49. "picture": {
  50. "large": "https://randomuser.me/api/portraits/men/62.jpg",
  51. "medium": "https://randomuser.me/api/portraits/med/men/62.jpg",
  52. "thumbnail": "https://randomuser.me/api/portraits/thumb/men/62.jpg"
  53. },
  54. "nat": "US"
  55. }
  56. ],
  57. "info": {
  58. "seed": "4bd9f66fd83a6ec7",
  59. "results": 1,
  60. "page": 1,
  61. "version": "1.2"
  62. }
  63. }

The data is piped to a command-line tool jq in order to display it in a nice way. jq has many more possibilities that we will explore in Chapter 5.

Some web APIs return data in a streaming manner. This means that once you connect to it, the data will continue to pour in forever. A well-known example is the Twitter “firehose”, which constantly streams all the tweets being sent around the world. Luckily, most command-line tools that we use also operate in a streaming matter, so that we also use this kind of data.

Some APIs require you to log in using the OAuth protocol. There is a handy command-line tool called curlicue (Foster 2014) that assists in performing the so-called “OAuth dance”. Once this has been set up, it curlicue will call curl with the correct headers. First, you set things up once for a particular API with curlicue-setup, and then you can call that API using curlicue. For example, to use curlicue with the Twitter API you would run:

  1. $ curlicue-setup \
  2. > 'https://api.twitter.com/oauth/request_token' \
  3. > 'https://api.twitter.com/oauth/authorize?oauth_token=$oauth_token' \
  4. > 'https://api.twitter.com/oauth/access_token' \
  5. > credentials
  6. $ curlicue -f credentials \
  7. > 'https://api.twitter.com/1/statuses/home_timeline.xml'

For more popular APIs, there are specialized command-line tools available. These are wrappers that provide a convenient way to connect to the API. In Chapter 9, for example, we’ll be using the command-line tool bigmler that only connects to BigML’s prediction API.