tdigest() functions

Introduction

Estimate the value at a given percentile, or the percentile rank of a given value, using the t-digest algorithm. This estimation is more memory- and CPU-efficient than an exact calculation using PostgreSQL’s percentile_cont and percentile_disc functions.

tdigest is one of two advanced percentile approximation aggregates provided in TimescaleDB Toolkit. It is a space-efficient aggregation, and it provides more accurate estimates at extreme quantiles than traditional methods.

tdigest is somewhat dependent on input order. If tdigest is run on the same data arranged in different order, the results should be nearly equal, but they are unlikely to be exact.

The other advanced percentile approximation aggregate is uddsketch, which produces stable estimates within a guaranteed relative error. If you aren’t sure which to use, try the default percentile estimation method, percentile_agg. It uses the uddsketch algorithm with some sensible defaults.

For more information about percentile approximation algorithms, see the algorithms overview.

Two-step aggregation

Hide content

This group of functions uses the two-step aggregation pattern.

Rather than calculating the final result in one step, you first create an intermediate aggregate by using the aggregate function.

Then, use any of the accessors on the intermediate aggregate to calculate a final result. You can also roll up multiple intermediate aggregates with the rollup functions.

The two-step aggregation pattern has several advantages:

  1. More efficient because multiple accessors can reuse the same aggregate
  2. Easier to reason about performance, because aggregation is separate from final computation
  3. Easier to understand when calculations can be rolled up into larger intervals, especially in window functions and continuous aggregates
  4. Can perform retrospective analysis even when underlying data is dropped, because the intermediate aggregate stores extra information not available in the final result

To learn more, see the blog post on two-step aggregates.

Functions in this group

Aggregate

tdigest

Aggregate data in a tdigest for further calculation of percentile estimates

Accessor

approx_percentile

Estimate the value at a given percentile from a tdigest

approx_percentile_rank

Estimate the percentile of a given value from a tdigest

mean

Calculate the exact mean from values in a tdigest

num_vals

Get the number of values contained in a tdigest

Rollup

rollup

Roll up multiple tdigests

Function details

tdigest()

Stabilized in Toolkit v1.0.0

Hide content

`

  1. tdigest(

`

  1. buckets INTEGER,
  2. value DOUBLE PRECISION

`

  1. ) RETURNS TDigest

`

This is the first step for calculating approximate percentiles with the tdigest algorithm. Use tdigest to create an intermediate aggregate from your raw data. This intermediate form can then be used by one or more accessors in this group to compute final results.

Optionally, multiple such intermediate aggregate objects can be combined using rollup() before an accessor is applied.

Required arguments

NameTypeDescription
bucketsINTEGERNumber of buckets in the digest. Increasing this provides more accurate quantile estimates, but requires more memory.
valueDOUBLE PRECISIONColumn of values to aggregate for the tdigest object.

Returns

ColumnTypeDescription
tdigestTDigestA percentile estimator object created to calculate percentiles using the tdigest algorithm

Examples

Given a table called samples, with a column called data, build a tdigest using the data column. Use 100 buckets for the approximation:

  1. SELECT tdigest(100, data) FROM samples;

approx_percentile()

Stabilized in Toolkit v1.0.0

Hide content

`

  1. approx_percentile(

`

  1. percentile DOUBLE PRECISION,
  2. tdigest TDigest

`

  1. ) RETURNS DOUBLE PRECISION

`

Estimate the approximate value at a percentile from a tdigest aggregate.

Required arguments

NameTypeDescription
percentileDOUBLE PRECISIONThe percentile to compute. Must be within the range [0.0, 1.0].
tdigestTDigestThe tdigest aggregate.

Returns

ColumnTypeDescription
approx_percentileDOUBLE PRECISIONThe estimated value at the requested percentile.

Examples

Estimate the value at the first percentile, given a sample containing the numbers from 0 to 100:

  1. SELECT
  2. approx_percentile(0.01, tdigest(data))
  3. FROM generate_series(0, 100) data;
  4. approx_percentile
  5. -------------------
  6. 0.999

approx_percentile_rank()

Stabilized in Toolkit v1.0.0

Hide content

`

  1. approx_percentile_rank(

`

  1. value DOUBLE PRECISION,
  2. digest TDigest

`

  1. ) RETURNS DOUBLE PRECISION

`

Estimate the the percentile at which a given value would be located.

Required arguments

NameTypeDescription
valueDOUBLE PRECISIONThe value to estimate the percentile of.
digestTDigestThe tdigest aggregate.

Returns

ColumnTypeDescription
approx_percentile_rankDOUBLE PRECISIONThe estimated percentile associated with the provided value.

Examples

Estimate the percentile rank of the value 99, given a sample containing the numbers from 0 to 100:

  1. SELECT
  2. approx_percentile_rank(99, tdigest(data))
  3. FROM generate_series(0, 100) data;
  4. approx_percentile_rank
  5. ----------------------------
  6. 0.9851485148514851

mean()

Stabilized in Toolkit v1.0.0

Hide content

`

  1. mean(

`

  1. digest TDigest

`

  1. ) RETURNS DOUBLE PRECISION

`

Calculate the exact mean of the values in a tdigest aggregate. Unlike percentile calculations, the mean calculation is exact. This accessor allows you to calculate the mean alongside percentiles, without needing to create two separate aggregates from the same raw data.

Required arguments

NameTypeDescription
digestTDigestThe tdigest aggregate to extract the mean from.

Returns

ColumnTypeDescription
meanDOUBLE PRECISIONThe mean of the values in the tdigest aggregate.

Examples

Calculate the mean of the integers from 0 to 100:

  1. SELECT mean(tdigest(data))
  2. FROM generate_series(0, 100) data;
  3. mean
  4. ------
  5. 50

num_vals()

Stabilized in Toolkit v1.0.0

Hide content

`

  1. num_vals(

`

  1. digest TDigest

`

  1. ) RETURNS DOUBLE PRECISION

`

Get the number of values contained in a tdigest aggregate. This accessor allows you to calculate a count alongside percentiles, without needing to create two separate aggregates from the same raw data.

Required arguments

NameTypeDescription
digestTDigestThe tdigest aggregate to extract the number of values from.

Returns

ColumnTypeDescription
num_valsDOUBLE PRECISIONThe number of values in the tdigest aggregate.

Examples

Count the number of integers from 0 to 100:

  1. SELECT num_vals(tdigest(data))
  2. FROM generate_series(0, 100) data;
  3. num_vals
  4. -----------
  5. 101

rollup()

Stabilized in Toolkit v1.0.0

Hide content

`

  1. rollup(

`

  1. digest TDigest

`

  1. ) RETURNS TDigest

`

Combine multiple intermediate tdigest aggregates, produced by tdigest, into a single intermediate tdigest aggregate. For example, you can use rollup to combine tdigests from 15-minute buckets into daily buckets.

Required arguments

NameTypeDescription
digestTDigestThe tdigests to roll up.

Returns

ColumnTypeDescription
rollupTDigestA new tdigest created by combining the input tdigests.

Extended examples

Aggregate and roll up percentile data to calculate daily percentiles

Create an hourly continuous aggregate that contains a percentile aggregate:

  1. CREATE MATERIALIZED VIEW foo_hourly
  2. WITH (timescaledb.continuous)
  3. AS SELECT
  4. time_bucket('1 h'::interval, ts) as bucket,
  5. tdigest(value) as tdigest
  6. FROM foo
  7. GROUP BY 1;

You can use accessors to query directly from the continuous aggregate for hourly data. You can also roll the hourly data up into daily buckets, then calculate approximate percentiles:

  1. SELECT
  2. time_bucket('1 day'::interval, bucket) as bucket,
  3. approx_percentile(0.95, rollup(tdigest)) as p95,
  4. approx_percentile(0.99, rollup(tdigest)) as p99
  5. FROM foo_hourly
  6. GROUP BY 1;