Tutorial: Analyze Cryptocurrency Market Data

This tutorial is a step-by-step guide on how to analyze a time-series cryptocurrency dataset using TimescaleDB. The instructions in this tutorial were used to create this analysis of 4100+ cryptocurrencies.

This tutorial will cover the following four steps:

  1. Design our database schema
  2. Create a dataset using publicly available cryptocurrency pricing data
  3. Load the dataset into TimescaleDB
  4. Query the data in TimescaleDB

You can skip ahead to the TimescaleDB portion if you would prefer not to run through the scripts to create your database schema or your dataset.

You can also download the resources for this tutorial:

Pre-requisites

To complete this tutorial, you will need a cursory knowledge of the Structured Query Language (SQL). The tutorial will walk you through each SQL command, but it will be helpful if you’ve seen SQL before.

To start, install TimescaleDB. Once your installation is complete, we can proceed to ingesting or creating sample data and finishing the tutorial.

Finally, this tutorial leads directly into a second tutorial that covers how Timescale can be used with Tableau to visualize time-series data.

Step 1: Design the database schema

Now that our database is up and running we need some data to insert into it. Before we get data for analysis, we first need to define what kind of data we want to perform queries on.

In our analysis, we have two main goals.

  • We want to explore the price of Bitcoin and Ethereum, expressed in different fiat currencies, over time.
  • We want to explore the price of different cryptocurrencies, expressed in Bitcoin, over time.

Examples of questions we might want to ask are:

  • How has Bitcoin’s price in USD varied over time?
  • How has Ethereum’s price in ZAR varied over time?
  • How has Bitcoin’s trading volume in KRW increased or decreased over time?
  • Which crypto has highest trading volume in last two weeks?
  • Which day was Bitcoin most profitable?
  • Which are the most profitable new coins from the past 3 months?

Understanding the questions required of the data informs our schema definition.

Our requirements lead us to four tables, specifically, three TimescaleDB hypertables, btc_prices, crypto_prices, and eth_prices, and one relational table, currency_info.

The btc_prices and eth_prices hypertables contain data about Bitcoin prices in 17 different fiat currencies since 2010. The Bitcoin table is below and the Ethereum table is very similar:

FieldDescription
timeThe day-specific timestamp of the price records, with time given as the default 00:00:00+00
opening_priceThe first price at which the coin was exchanged that day
highest_priceThe highest price at which the coin was exchanged that day
lowest_priceThe lowest price at which the coin was exchanged that day
closing_priceThe last price at which the coin was exchanged that day
volume_btcThe volume exchanged in the cryptocurrency value that day, in BTC
volume_currencyThe volume exchanged in its converted value for that day, quoted in the corresponding fiat currency
currency_codeCorresponds to the fiat currency used for non-btc prices/volumes

Lastly, we have the currency_info table, which maps the currency’s code to its English-language name:

FieldDescription
currency_code2-7 character abbreviation for currency. Used in other hypertables
currencyEnglish name of currency

Once we’ve established the schema for the tables in our database, we can formulate create_table SQL statements to actually create the tables we need:

  1. --Schema for cryptocurrency analysis
  2. DROP TABLE IF EXISTS "currency_info";
  3. CREATE TABLE "currency_info"(
  4. currency_code VARCHAR (10),
  5. currency TEXT
  6. );
  7. --Schema for btc_prices table
  8. DROP TABLE IF EXISTS "btc_prices";
  9. CREATE TABLE "btc_prices"(
  10. time TIMESTAMP WITH TIME ZONE NOT NULL,
  11. opening_price DOUBLE PRECISION,
  12. highest_price DOUBLE PRECISION,
  13. lowest_price DOUBLE PRECISION,
  14. closing_price DOUBLE PRECISION,
  15. volume_btc DOUBLE PRECISION,
  16. volume_currency DOUBLE PRECISION,
  17. currency_code VARCHAR (10)
  18. );
  19. --Schema for crypto_prices table
  20. DROP TABLE IF EXISTS "crypto_prices";
  21. CREATE TABLE "crypto_prices"(
  22. time TIMESTAMP WITH TIME ZONE NOT NULL,
  23. opening_price DOUBLE PRECISION,
  24. highest_price DOUBLE PRECISION,
  25. lowest_price DOUBLE PRECISION,
  26. closing_price DOUBLE PRECISION,
  27. volume_crypto DOUBLE PRECISION,
  28. volume_btc DOUBLE PRECISION,
  29. currency_code VARCHAR (10)
  30. );
  31. --Schema for eth_prices table
  32. DROP TABLE IF EXISTS "eth_prices";
  33. CREATE TABLE "eth_prices"(
  34. time TIMESTAMP WITH TIME ZONE NOT NULL,
  35. opening_price DOUBLE PRECISION,
  36. highest_price DOUBLE PRECISION,
  37. lowest_price DOUBLE PRECISION,
  38. closing_price DOUBLE PRECISION,
  39. volume_eth DOUBLE PRECISION,
  40. volume_currency DOUBLE PRECISION,
  41. currency_code VARCHAR (10)
  42. );
  43. --Timescale specific statements to create hypertables for better performance
  44. SELECT create_hypertable('btc_prices', 'time');
  45. SELECT create_hypertable('eth_prices', 'time');
  46. SELECT create_hypertable('crypto_prices', 'time');

Note that we include three create_hypertable statements which are special TimescaleDB statements. A hypertable is an abstraction of a single continuous table across time intervals, such that one can query it via vanilla SQL. For more on hypertables, see the Timescale docs and this blog post.

Step 2: Create a dataset to analyze

Now that we’ve defined the data we want, it’s time to construct a dataset containing that data. To do this, we’ll write a small Python script for extracting data from CryptoCompare into four CSV files (coin_names.csv, crypto_prices.csv, btc_prices.csv, and eth_prices.csv).

In order to get data from CryptoCompare, you’ll need to obtain an API key. For this analysis, the free key should be plenty.

The script consists of five parts:

  • Importing the necessary Python libraries in order to complete the data extraction
  • Populate the currency_info table with a list of coin names
  • Get the historical Bitcoin (BTC) prices in 4198 other cryptocurrencies and populate the crypto_prices table
  • Get historical Bitcoin prices in different fiat currencies to populate btc_prices
  • Get historical Ethereum prices in different fiat currencies to populate eth_prices

Here’s the full Python script, which you can also downloadAnalyzing cryptocurrency data - 图4:

  1. #####################################################################
  2. #1. Import library and setup API key
  3. #####################################################################
  4. import requests
  5. import json
  6. import csv
  7. from datetime import datetime
  8. apikey = 'YOUR_CRYPTO_COMPARE_API_KEY'
  9. #attach to end of URLstring
  10. url_api_part = '&api_key=' + apikey
  11. #####################################################################
  12. #2. Populate list of all coin names
  13. #####################################################################
  14. #URL to get a list of coins from cryptocompare API
  15. URLcoinslist = 'https://min-api.cryptocompare.com/data/all/coinlist'
  16. #Get list of cryptos with their symbols
  17. res1 = requests.get(URLcoinslist)
  18. res1_json = res1.json()
  19. data1 = res1_json['Data']
  20. symbol_array = []
  21. cryptoDict = dict(data1)
  22. #write to CSV
  23. with open('coin_names.csv', mode = 'w') as test_file:
  24. test_file_writer = csv.writer(test_file,
  25. delimiter = ',',
  26. quotechar = '"',
  27. quoting=csv.QUOTE_MINIMAL)
  28. for coin in cryptoDict.values():
  29. name = coin['Name']
  30. symbol = coin['Symbol']
  31. symbol_array.append(symbol)
  32. coin_name = coin['CoinName']
  33. full_name = coin['FullName']
  34. entry = [symbol, coin_name]
  35. test_file_writer.writerow(entry)
  36. print('Done getting crypto names and symbols. See coin_names.csv for result')
  37. #####################################################################
  38. #3. Populate historical price for each crypto in BTC
  39. #####################################################################
  40. #Note: this part might take a while to run since we're populating data for 4k+ coins
  41. #counter variable for progress made
  42. progress = 0
  43. num_cryptos = str(len(symbol_array))
  44. for symbol in symbol_array:
  45. # get data for that currency
  46. URL = 'https://min-api.cryptocompare.com/data/histoday?fsym=' +
  47. symbol +
  48. '&tsym=BTC&allData=true' +
  49. url_api_part
  50. res = requests.get(URL)
  51. res_json = res.json()
  52. data = res_json['Data']
  53. # write required fields into csv
  54. with open('crypto_prices.csv', mode = 'a') as test_file:
  55. test_file_writer = csv.writer(test_file,
  56. delimiter = ',',
  57. quotechar = '"',
  58. quoting=csv.QUOTE_MINIMAL)
  59. for day in data:
  60. rawts = day['time']
  61. ts = datetime.utcfromtimestamp(rawts).strftime('%Y-%m-%d %H:%M:%S')
  62. o = day['open']
  63. h = day['high']
  64. l = day['low']
  65. c = day['close']
  66. vfrom = day['volumefrom']
  67. vto = day['volumeto']
  68. entry = [ts, o, h, l, c, vfrom, vto, symbol]
  69. test_file_writer.writerow(entry)
  70. progress = progress + 1
  71. print('Processed ' + str(symbol))
  72. print(str(progress) + ' currencies out of ' + num_cryptos + ' written to csv')
  73. print('Done getting price data for all coins. See crypto_prices.csv for result')
  74. #####################################################################
  75. #4. Populate BTC prices in different fiat currencies
  76. #####################################################################
  77. # List of fiat currencies we want to query
  78. # You can expand this list, but CryptoCompare does not have
  79. # a comprehensive fiat list on their site
  80. fiatList = ['AUD', 'CAD', 'CNY', 'EUR', 'GBP', 'GOLD', 'HKD',
  81. 'ILS', 'INR', 'JPY', 'KRW', 'PLN', 'RUB', 'SGD', 'UAH', 'USD', 'ZAR']
  82. #counter variable for progress made
  83. progress2 = 0
  84. for fiat in fiatList:
  85. # get data for bitcoin price in that fiat
  86. URL = 'https://min-api.cryptocompare.com/data/histoday?fsym=BTC&tsym=' +
  87. fiat +
  88. '&allData=true' +
  89. url_api_part
  90. res = requests.get(URL)
  91. res_json = res.json()
  92. data = res_json['Data']
  93. # write required fields into csv
  94. with open('btc_prices.csv', mode = 'a') as test_file:
  95. test_file_writer = csv.writer(test_file,
  96. delimiter = ',',
  97. quotechar = '"',
  98. quoting=csv.QUOTE_MINIMAL)
  99. for day in data:
  100. rawts = day['time']
  101. ts = datetime.utcfromtimestamp(rawts).strftime('%Y-%m-%d %H:%M:%S')
  102. o = day['open']
  103. h = day['high']
  104. l = day['low']
  105. c = day['close']
  106. vfrom = day['volumefrom']
  107. vto = day['volumeto']
  108. entry = [ts, o, h, l, c, vfrom, vto, fiat]
  109. test_file_writer.writerow(entry)
  110. progress2 = progress2 + 1
  111. print('processed ' + str(fiat))
  112. print(str(progress2) + ' currencies out of 17 written')
  113. print('Done getting price data for btc. See btc_prices.csv for result')
  114. #####################################################################
  115. #5. Populate ETH prices in different fiat currencies
  116. #####################################################################
  117. #counter variable for progress made
  118. progress3 = 0
  119. for fiat in fiatList:
  120. # get data for bitcoin price in that fiat
  121. URL = 'https://min-api.cryptocompare.com/data/histoday?fsym=ETH&tsym=' +
  122. fiat +
  123. '&allData=true' +
  124. url_api_part
  125. res = requests.get(URL)
  126. res_json = res.json()
  127. data = res_json['Data']
  128. # write required fields into csv
  129. with open('eth_prices.csv', mode = 'a') as test_file:
  130. test_file_writer = csv.writer(test_file,
  131. delimiter = ',',
  132. quotechar = '"',
  133. quoting=csv.QUOTE_MINIMAL)
  134. for day in data:
  135. rawts = day['time']
  136. ts = datetime.utcfromtimestamp(rawts).strftime('%Y-%m-%d %H:%M:%S')
  137. o = day['open']
  138. h = day['high']
  139. l = day['low']
  140. c = day['close']
  141. vfrom = day['volumefrom']
  142. vto = day['volumeto']
  143. entry = [ts, o, h, l, c, vfrom, vto, fiat]
  144. test_file_writer.writerow(entry)
  145. progress3 = progress3 + 1
  146. print('processed ' + str(fiat))
  147. print(str(progress3) + ' currencies out of 17 written')
  148. print('Done getting price data for eth. See eth_prices.csv for result')

After running the script, you will receive four CSV files:

  1. python crypto_data_extraction.py

Step 3: Load the dataset into TimescaleDB

To proceed, be sure you have a working installation of TimescaleDB.

Setup our schema

Now all our hard work in Step 1 comes in handy! We will use the SQL script we created to setup our instance of TimescaleDB. If you don’t want to enter the SQL script by yourself, you can always download schema.sqlAnalyzing cryptocurrency data - 图5.

Let’s first login to our TimescaleDB instance. Locate your host, port, and password and then connect to the database:

  1. psql -x "postgres://tsdbadmin:{YOUR_PASSWORD_HERE}@{YOUR_HOSTNAME_HERE}:{YOUR_PORT_HERE}/defaultdb?sslmode=require"

From the psql command line, we need to first create a database. Let’s call it crypto_data:

  1. CREATE DATABASE crypto_data;
  2. \c crypto_data
  3. CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;

From your shell prompt, we will now apply our schema creation script to the database like this:

  1. psql -x "postgres://tsdbadmin:{YOUR_PASSWORD_HERE}@{|YOUR_HOSTNAME_HERE}:{YOUR_PORT_HERE}/crypto_data?sslmode=require" < schema.sql

Your output should look something like this:

  1. NOTICE: 00000: table "currency_info" does not exist, skipping
  2. LOCATION: DropErrorMsgNonExistent, tablecmds.c:1057
  3. DROP TABLE
  4. Time: 78.384 ms
  5. CREATE TABLE
  6. Time: 87.011 ms
  7. NOTICE: 00000: table "btc_prices" does not exist, skipping
  8. LOCATION: DropErrorMsgNonExistent, tablecmds.c:1057
  9. DROP TABLE
  10. Time: 77.094 ms
  11. CREATE TABLE
  12. Time: 79.815 ms
  13. NOTICE: 00000: table "crypto_prices" does not exist, skipping
  14. LOCATION: DropErrorMsgNonExistent, tablecmds.c:1057
  15. DROP TABLE
  16. Time: 78.430 ms
  17. CREATE TABLE
  18. Time: 78.430 ms
  19. NOTICE: 00000: table "eth_prices" does not exist, skipping
  20. LOCATION: DropErrorMsgNonExistent, tablecmds.c:1057
  21. DROP TABLE
  22. Time: 77.410 ms
  23. CREATE TABLE
  24. Time: 80.883 ms
  25. create_hypertable
  26. -------------------------
  27. (1,public,btc_prices,t)
  28. (1 row)
  29. Time: 83.154 ms
  30. create_hypertable
  31. -------------------------
  32. (2,public,eth_prices,t)
  33. (1 row)
  34. Time: 84.650 ms
  35. create_hypertable
  36. ----------------------------
  37. (3,public,crypto_prices,t)
  38. (1 row)
  39. Time: 81.864 ms

Now when we log back into our TimescaleDB instance using psql, we can run the \dt command and see that our tables have been created properly:

  1. List of relations
  2. Schema | Name | Type | Owner
  3. --------+---------------+-------+-----------
  4. public | btc_prices | table | tsdbadmin
  5. public | crypto_prices | table | tsdbadmin
  6. public | currency_info | table | tsdbadmin
  7. public | eth_prices | table | tsdbadmin
  8. (4 rows)

Ingest our data

Now that we’ve created the tables with our desired schema, all that’s left is to insert the data from the CSV files we’ve created into the tables.

Make sure you are logged into TimescaleDB using psql so that you can run each of the following commands successively:

  1. \COPY btc_prices FROM btc_prices.csv CSV;
  2. \COPY eth_prices FROM eth_prices.csv CSV;
  3. \COPY crypto_prices FROM crypto_prices.csv CSV;
  4. \COPY currency_info FROM coin_names.csv CSV;

WARNING:This data ingestion may take a while, depending on the speed of your Internet connection.

We can test that the ingestion worked by running a simple SQL command, such as:

  1. SELECT * FROM btc_prices LIMIT 5;

You should get something like the following output:

  1. -[ RECORD 1 ]---+-----------------------
  2. time | 2013-03-11 00:00:00+00
  3. opening_price | 60.56
  4. highest_price | 60.56
  5. lowest_price | 60.56
  6. closing_price | 60.56
  7. volume_btc | 0.1981
  8. volume_currency | 12
  9. currency_code | AUD
  10. -[ RECORD 2 ]---+-----------------------
  11. time | 2013-03-12 00:00:00+00
  12. opening_price | 60.56
  13. highest_price | 60.56
  14. lowest_price | 41.38
  15. closing_price | 47.78
  16. volume_btc | 47.11
  17. volume_currency | 2297.5
  18. currency_code | AUD
  19. -[ RECORD 3 ]---+-----------------------
  20. time | 2013-03-07 00:00:00+00
  21. opening_price | 181.15
  22. highest_price | 273.5
  23. lowest_price | 237.4
  24. closing_price | 262.87
  25. volume_btc | 33.04
  26. volume_currency | 8974.45
  27. currency_code | CNY
  28. -[ RECORD 4 ]---+-----------------------
  29. time | 2013-03-07 00:00:00+00
  30. opening_price | 32.31
  31. highest_price | 35.03
  32. lowest_price | 26
  33. closing_price | 31.57
  34. volume_btc | 13321.61
  35. volume_currency | 425824.38
  36. currency_code | EUR
  37. -[ RECORD 5 ]---+-----------------------
  38. time | 2013-03-11 00:00:00+00
  39. opening_price | 35.7
  40. highest_price | 37.35
  41. lowest_price | 35.4
  42. closing_price | 37.15
  43. volume_btc | 3316.09
  44. volume_currency | 121750.98
  45. currency_code | EUR
  46. Time: 224.741 ms

Step 4: Query and analyze our data

When we started the tutorial, we laid out a series of questions that we would like to answer. Naturally, each of those questions has an answer in the form of a SQL query. Now that our database is setup properly, our data is captured, and our data is ingested, we are able to proceed and answer our questions.

For example, How did Bitcoin price in USD vary over time?

  1. SELECT time_bucket('7 days', time) AS period,
  2. last(closing_price, time) AS last_closing_price
  3. FROM btc_prices
  4. WHERE currency_code = 'USD'
  5. GROUP BY period
  6. ORDER BY period

How did BTC daily returns vary over time? Which days had the worst and best returns?

  1. SELECT time,
  2. closing_price / lead(closing_price) over prices AS daily_factor
  3. FROM (
  4. SELECT time,
  5. closing_price
  6. FROM btc_prices
  7. WHERE currency_code = 'USD'
  8. GROUP BY 1,2
  9. ) sub window prices AS (ORDER BY time DESC)

How did the trading volume of Bitcoin vary over time in different fiat currencies?

  1. SELECT time_bucket('7 days', time) AS period,
  2. currency_code,
  3. sum(volume_btc)
  4. FROM btc_prices
  5. GROUP BY currency_code, period
  6. ORDER BY period

How did Ethereum (ETH) price in BTC vary over time?

  1. SELECT
  2. time_bucket('7 days', time) AS time_period,
  3. last(closing_price, time) AS closing_price_btc
  4. FROM crypto_prices
  5. WHERE currency_code='ETH'
  6. GROUP BY time_period
  7. ORDER BY time_period

How did ETH prices, in different fiat currencies, vary over time?

  1. SELECT time_bucket('7 days', c.time) AS time_period,
  2. last(c.closing_price, c.time) AS last_closing_price_in_btc,
  3. last(c.closing_price, c.time) * last(b.closing_price, c.time) FILTER (WHERE b.currency_code = 'USD') AS last_closing_price_in_usd,
  4. last(c.closing_price, c.time) * last(b.closing_price, c.time) FILTER (WHERE b.currency_code = 'EUR') AS last_closing_price_in_eur,
  5. last(c.closing_price, c.time) * last(b.closing_price, c.time) FILTER (WHERE b.currency_code = 'CNY') AS last_closing_price_in_cny,
  6. last(c.closing_price, c.time) * last(b.closing_price, c.time) FILTER (WHERE b.currency_code = 'JPY') AS last_closing_price_in_jpy,
  7. last(c.closing_price, c.time) * last(b.closing_price, c.time) FILTER (WHERE b.currency_code = 'KRW') AS last_closing_price_in_krw
  8. FROM crypto_prices c
  9. JOIN btc_prices b
  10. ON time_bucket('1 day', c.time) = time_bucket('1 day', b.time)
  11. WHERE c.currency_code = 'ETH'
  12. GROUP BY time_period
  13. ORDER BY time_period

Which cryptocurrencies had the most transaction volume in the past 14 days?

  1. SELECT 'BTC' AS currency_code,
  2. sum(b.volume_currency) AS total_volume_in_usd
  3. FROM btc_prices b
  4. WHERE b.currency_code = 'USD'
  5. AND now() - date(b.time) < INTERVAL '14 day'
  6. GROUP BY b.currency_code
  7. UNION
  8. SELECT c.currency_code AS currency_code,
  9. sum(c.volume_btc) * avg(b.closing_price) AS total_volume_in_usd
  10. FROM crypto_prices c JOIN btc_prices b ON date(c.time) = date(b.time)
  11. WHERE c.volume_btc > 0
  12. AND b.currency_code = 'USD'
  13. AND now() - date(b.time) < INTERVAL '14 day'
  14. AND now() - date(c.time) < INTERVAL '14 day'
  15. GROUP BY c.currency_code
  16. ORDER BY total_volume_in_usd DESC

Which cryptocurrencies had the top daily return?

  1. WITH
  2. prev_day_closing AS (
  3. SELECT
  4. currency_code,
  5. time,
  6. closing_price,
  7. LEAD(closing_price) OVER (PARTITION BY currency_code ORDER BY TIME DESC) AS prev_day_closing_price
  8. FROM
  9. crypto_prices
  10. )
  11. , daily_factor AS (
  12. SELECT
  13. currency_code,
  14. time,
  15. CASE WHEN prev_day_closing_price = 0 THEN 0 ELSE closing_price/prev_day_closing_price END AS daily_factor
  16. FROM
  17. prev_day_closing
  18. )
  19. SELECT
  20. time,
  21. LAST(currency_code, daily_factor) AS currency_code,
  22. MAX(daily_factor) AS max_daily_factor
  23. FROM
  24. daily_factor
  25. GROUP BY
  26. TIME

Next steps

While it’s fun to run SQL queries in the command line, the real magic is when you’re able to visualize it. Follow the companion tutorial to this piece and learn how to use TimescaleDB and Tableau together to visualize your time-series data.

Ready for even more learning? Here’s a few suggestions: