NFT schema design and ingestion

A properly designed database schema is essential to efficiently store and analyze data. This tutorial uses NFT time-series data with multiple supporting relational tables.

To help you get familiar with NFT data, here are some of the questions that could be answered with this dataset:

  • Which collections have the highest trading volume?
  • What’s the number of daily transactions of a given collection or asset?
  • Which collections have the most trading volume in Ether (ETH)?
  • Which account made the most NFT trades?
  • How are the mean and median sale prices correlated?

One theme across all these questions is that most of the insights are about the sale itself, or the aggregation of sales. So you need to create a schema which focuses on the time-series aspect of the data. It’s also important to make sure that you can JOIN supporting tables, so you can more easily make queries that touch both the time-series and the relational tables. TimescaleDB’s PostgreSQL foundation and full-SQL support allows you to easily combine time-series and relational tables during your analysis.

Tables and field descriptions

You need these tables:

TimescaleDB hypertable:

  • nft_sales: successful NFT transactions

Relational tables (regular PostgreSQL tables):

  • assets: unique NFT items
  • collections: NFT collections
  • accounts: NFT trading accounts/users

The nft_sales table

The nft_sales table contains information about successful sale transactions in time-series form. One row represents one successful sale event on the OpenSea platform.

  • id field is a unique field provided by the OpenSea API.
  • total_price field is the price paid for the NFTs in ETH (or other cryptocurrency payment symbol available on OpenSea).
  • quantity field indicates how many NFTs were sold in the transaction (can be more than 1).
  • auction_type field is NULL by default, unless the transaction happened as part of an auction.
  • asset_id and collection_id fields can be used to JOIN the supporting relational tables.
Data fieldDescription
idOpenSea ID (unique)
timeTime of the sale
asset_idID of the NFT, FK: assets(id)
collection_idID of the collection this NFT belongs to, FK: collections(id))
auction_typeAuction type (‘dutch’, ‘english’, ‘min_price’)
contract_addressAddress of the smart contract
quantityNFT quantity sold
payment_symbolPayment symbol (usually ETH, depends on the blockchain where the NFT is minted)
total_priceTotal price paid for the NFT
seller_accountSeller’s account, FK: accounts(id)
from_accountAccount used to transfer from, FK: accounts(id)
to_accountAccount used to transfer to, FK: accounts(id)
winner_accountBuyer’s account, FK: accounts(id)

The assets table

The assets table contains information about the assets (NFTs) that are in the transactions. One row represents a unique NFT asset on the OpenSea platform.

  • name field is the name of the NFT, and is not unique.
  • id field is the primary key, provided by the OpenSea API.
  • One asset can be referenced from multiple transactions (traded multiple times).
Data fieldDescription
idOpenSea ID (PK)
nameName of the NFT
descriptionDescription of the NFT
contract_dateCreation date of the smart contract
urlOpenSea URL of the NFT
owner_idID of the NFT owner account, FK: accounts(id)
detailsOther extra data fields (JSONB)

The collections table

The collections table holds information about the NFT collections. One row represents a unique NFT collection. One collection includes multiple unique NFTs (that are in the assets table).

  • slug field is a unique identifier of the collection.
Data fieldDescription
idAuto-increment (PK)
slugSlug of the collection (unique)
nameName of the collection
urlOpenSea url of the collection
detailsOther extra data fields (JSONB)

The accounts table

The accounts table includes the accounts that have participated in at least one transaction from the nft_sales table. One row represents one unique account on the OpenSea platform.

  • address is never NULL and it’s unique
  • user_name is NULL unless it’s been submitted on the OpenSea profile by the user
Data fieldDescription
idAuto-increment (PK)
user_nameOpenSea user name
addressAccount address, unique
detailsOther extra data fields (JSONB)

Database schema

The data types used in the schema for this tutorial have been determined based on our research and hands-on experience working with the OpenSea API and the data pulled from OpenSea. Start by running these SQL commands to create the schema. Alternatively, you can download and run the schema.sql file from our NFT Starter Kit GitHub repository.

  1. CREATE TABLE collections (
  2. id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  3. slug TEXT UNIQUE,
  4. name TEXT,
  5. url TEXT,
  6. details JSONB
  7. );
  8. CREATE TABLE accounts (
  9. id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  10. user_name TEXT,
  11. address TEXT UNIQUE NOT NULL,
  12. details JSONB
  13. );
  14. CREATE TABLE assets (
  15. id BIGINT PRIMARY KEY,
  16. name TEXT,
  17. collection_id BIGINT REFERENCES collections (id), -- collection
  18. description TEXT,
  19. contract_date TIMESTAMP WITH TIME ZONE,
  20. url TEXT UNIQUE,
  21. img_url TEXT,
  22. owner_id BIGINT REFERENCES accounts (id), -- account
  23. details JSONB
  24. );
  25. CREATE TYPE auction AS ENUM ('dutch', 'english', 'min_price');
  26. CREATE TABLE nft_sales (
  27. id BIGINT,
  28. "time" TIMESTAMP WITH TIME ZONE,
  29. asset_id BIGINT REFERENCES assets (id), -- asset
  30. collection_id BIGINT REFERENCES collections (id), -- collection
  31. auction_type auction,
  32. contract_address TEXT,
  33. quantity NUMERIC,
  34. payment_symbol TEXT,
  35. total_price DOUBLE PRECISION,
  36. seller_account BIGINT REFERENCES accounts (id), -- account
  37. from_account BIGINT REFERENCES accounts (id), -- account
  38. to_account BIGINT REFERENCES accounts (id), -- account
  39. winner_account BIGINT REFERENCES accounts (id), -- account
  40. CONSTRAINT id_time_unique UNIQUE (id, time)
  41. );
  42. SELECT create_hypertable('nft_sales', 'time');
  43. CREATE INDEX idx_asset_id ON nft_sales (asset_id);
  44. CREATE INDEX idx_collection_id ON nft_sales (collection_id);
  45. CREATE INDEX idx_payment_symbol ON nft_sales (payment_symbol);

Schema design

The id field in each table is BIGINT because its storage size is 8 bytes in PostgreSQL (as opposed to INT‘s 4 bytes) which is needed to make sure this value doesn’t overflow.

For the quantity field we suggest using numeric or decimal (which works the same way in PostgreSQL) as the data type, because in some edge cases we experience transactions where the quantity was too big even for BIGINT.

total_price needs to be double precision because NFT prices often include many decimals, especially in the case of Ether (ETH) and similar cryptocurrencies which are, functionally, infinitely divisible.

We created an ENUM for auction_type as this value can only be ‘dutch’, ‘english’, or ‘min_price’, representing the different types of auctions used to sell an NFT.

We decided to not store all the data fields that are available from the OpenSea API, only those that we deem interesting or useful for future analysis. But we still wanted to keep all of the unused data fields somewhere close, so we added a details JSONB column to each relational table. This column contains additional information about the record. For example, it includes a background_color as a field for the assets.

Note: In our sample dataset, we chose not to include the JSONB data to keep the size of the dataset easily managable. If you want a dataset with the full JSON data included, you need to fetch the data directly from the OpenSea API (see below for steps).

Ingest NFT data

When you have your database and schema created, you can ingest some data to play with! You have two options to ingest NFT data for this tutorial:

  • Fetch data directly from the OpenSea API
  • Download sample data and import it

Fetch data directly from the OpenSea API

To ingest data from the OpenSea API, you can use the opensea_ingest.py script included in the starter kit repository on GitHub. The script connects to the OpenSea API /events endpoint, and fetches data from the specified time period.

note

You need an OpenSEA API key to fetch data from the OpenSea API. To request your key, see the OpenSea API documentation.

warning

This procedure relies on the OpenSea API. The OpenSea API is provided and maintained by OpenSea. Recently, the API has stopped functioning for extended periods of time. If the API has changed or is not accessible when you attempt to run the opensea_ingest.py script, try following the procedure to download a historical data file and import it. You can use this data file to complete the tutorial.

Fetching data directly from the OpenSea API

  1. Clone the nft-starter-kit repository on Github:

    1. git clone https://github.com/timescale/nft-starter-kit.git
    2. cd nft-starter-kit
  2. Create a new Python virtual environment and install the requirements:

    1. virtualenv env && source env/bin/activate
    2. pip install -r requirements.txt
  3. Replace the parameters in the config.py file:

    1. DB_NAME="tsdb"
    2. HOST="YOUR_HOST_URL"
    3. USER="tsdbadmin"
    4. PASS="YOUR_PASSWORD_HERE"
    5. PORT="PORT_NUMBER"
    6. OPENSEA_START_DATE="2021-10-01T00:00:00" # example start date (UTC)
    7. OPENSEA_END_DATE="2021-10-06T23:59:59" # example end date (UTC)
    8. OPENSEA_APIKEY="YOUR_OPENSEA_APIKEY" # need to request from OpenSea's docs
  4. Run the Python script:

    1. python opensea_ingest.py

    This starts ingesting data in batches, 300 rows at a time:

    1. Start ingesting data between 2021-10-01 00:00:00+00:00 and 2021-10-06 23:59:59+00:00
    2. ---
    3. Fetching transactions from OpenSea...
    4. Data loaded into temp table!
    5. Data ingested!
    6. Data has been backfilled until this time: 2021-10-06 23:51:31.140126+00:00
    7. ---

    You can stop the ingesting process anytime (Ctrl+C), otherwise the script runs until all the transactions have been ingested from the given time period.

Download sample NFT data

You can download and insert sample CSV files that contain NFT sales data from October 1, 2021 to October 7, 2021.

Downloading sample NFT data

  1. Download sample CSV files containing one week of sample data.
  2. Uncompress the ZIP file:

    1. unzip nft_sample.zip
  3. Connect to your database:

    1. psql -x "postgres://host:port/tsdb?sslmode=require"

    If you’re using Timescale Cloud, the instructions under How to Connect provide a customized command to run to connect directly to your database.

  4. Import the CSV files in this order (it can take a few minutes in total):

    1. \copy accounts FROM 001_accounts.csv CSV HEADER;
    2. \copy collections FROM 002_collections.csv CSV HEADER;
    3. \copy assets FROM 003_assets.csv CSV HEADER;
    4. \copy nft_sales FROM 004_nft_sales.csv CSV HEADER;

After ingesting NFT data, you can try running some queries on your database:

  1. SELECT count(*), MIN(time) AS min_date, MAX(time) AS max_date FROM nft_sales