SAMPLE BY is used on time series data to summarise large datasets into aggregates of homogeneous time chunks as part of a SELECT statement.

Users performing SAMPLE BY queries on datasets with missing data may make use of the FILL keyword to specify a fill behavior.

:::note

To use SAMPLE BY, one column needs to be designated as timestamp. Find out more in the designated timestamp section.

:::

Syntax

Flow chart showing the syntax of the SAMPLE BY keyword

Where SAMPLE_SIZE is the unit of time by which you wish to aggregate your results, and n is the number of time chunks that will be summarised together.

Examples

Assume the following table

tsbuysellquantityprice
ts1Bq1p1
ts2Sq2p2
ts3Sq3p3
tsnBqnpn

The following will return the number of trades per hour:

  1. SELECT ts, count()
  2. FROM TRADES
  3. SAMPLE BY 1h;

The following will return the trade volume in 30 minute intervals

  1. SELECT ts, sum(quantity*price)
  2. FROM TRADES
  3. SAMPLE BY 30m;

The following will return the average trade notional (where notional is = q * p) by day:

  1. SELECT ts, avg(quantity*price)
  2. FROM TRADES
  3. SAMPLE BY 1d;