stats_agg() (one variable) functions

Introduction

Perform common statistical analyses, such as calculating averages and standard deviations, using this group of functions. These functions are similar to the PostgreSQL statistical aggregates, but they include more features and are easier to use in continuous aggregates and window functions.

These functions work on one-dimensional data. To work with two-dimensional data, for example to perform linear regression, see the two-dimensional stats_agg functions.

Two-step aggregation

Hide content

This group of functions uses the two-step aggregation pattern.

Rather than calculating the final result in one step, you first create an intermediate aggregate by using the aggregate function.

Then, use any of the accessors on the intermediate aggregate to calculate a final result. You can also roll up multiple intermediate aggregates with the rollup functions.

The two-step aggregation pattern has several advantages:

  1. More efficient because multiple accessors can reuse the same aggregate
  2. Easier to reason about performance, because aggregation is separate from final computation
  3. Easier to understand when calculations can be rolled up into larger intervals, especially in window functions and continuous aggregates
  4. Can perform retrospective analysis even when underlying data is dropped, because the intermediate aggregate stores extra information not available in the final result

To learn more, see the blog post on two-step aggregates.

Functions in this group

Aggregate

stats_agg (one variable)

Aggregate data into an intermediate statistical aggregate form for further calculation

Accessor

average

Calculate the average from a one-dimensional statistical aggregate

kurtosis

Calculate the kurtosis from a one-dimensional statistical aggregate

num_vals

Calculate the number of values in a one-dimensional statistical aggregate

skewness

Calculate the skewness from a one-dimensional statistical aggregate

stddev

Calculate the standard deviation from a one-dimensional statistical aggregate

sum

Calculate the sum from a one-dimensional statistical aggregate

variance

Calculate the variance from a one-dimensional statistical aggregate

Rollup

rolling

Combine multiple one-dimensional statistical aggregates to calculate rolling window aggregates

rollup

Combine multiple one-dimensional statistical aggregates

Function details

stats_agg() (one variable)

Stabilized in Toolkit v1.3.0

Hide content

`

  1. stats_agg(

`

  1. value DOUBLE PRECISION

`

  1. ) RETURNS StatsSummary1D

`

This is the first step for performing any statistical aggregate calculations on one-dimensional data. Use stats_agg to create an intermediate aggregate (StatsSummary1D) from your data. This intermediate form can then be used by one or more accessors in this group to compute final results. Optionally, multiple such intermediate aggregate objects can be combined using rollup() or rolling() before an accessor is applied.

stats_agg is well suited for creating a continuous aggregate that can serve multiple purposes later. For example, you can create a continuous aggregate using stats_agg to calculate average and sum. Later, you can reuse the same StatsSummary1D objects to calculate standard deviation from the same continuous aggregate.

Required arguments

NameTypeDescription
valueDOUBLE PRECISIONThe variable to use for the statistical aggregate.

Returns

ColumnTypeDescription
stats_aggStatsSummary1DThe statistical aggregate, containing data about the variables in an intermediate form. Pass the aggregate to accessor functions in the statistical aggregates API to perform final calculations. Or, pass the aggregate to rollup functions to combine multiple statistical aggregates into larger aggregates.

Examples

Create a one-dimensional statistical aggregate from column x in table foo:

  1. SELECT stats_agg(x) FROM foo;

average()

Stabilized in Toolkit v1.3.0

Hide content

`

  1. average(

`

  1. summary StatsSummary1D

`

  1. ) RETURNS DOUBLE PRECISION

`

Calculate a simple average (or mean) from the values in a statistical aggregate.

Required arguments

NameTypeDescription
summaryStatsSummary1DThe statistical aggregate produced by a stats_agg call

Returns

ColumnTypeDescription
averageDOUBLE PRECISIONThe average of the values in the statistical aggregate

Examples

Calculate the average of column x in table foo:

  1. SELECT average(
  2. stats_agg(x)
  3. ) FROM foo;

Calculate the average of the integers from 0 to 100:

  1. SELECT average(stats_agg(data))
  2. FROM generate_series(0, 100) data;
  3. average
  4. -----------
  5. 50

kurtosis()

Stabilized in Toolkit v1.3.0

Hide content

`

  1. kurtosis(

`

  1. summary StatsSummary1D,
  2. [ method TEXT ]

`

  1. ) DOUBLE PRECISION

`

Calculate the kurtosis from the values in a statistical aggregate. The kurtosis is the fourth statistical moment. It is a measure of “tailedness” of a data distribution compared to a normal distribution.

Required arguments

NameTypeDescription
summaryStatsSummary1DThe statistical aggregate produced by a stats_agg call

Optional arguments

NameTypeDescription
methodTEXTThe method used for calculating the kurtosis. The two options are population and sample, which can be abbreviated to pop or samp. Defaults to sample.

Returns

ColumnTypeDescription
kurtosisDOUBLE PRECISIONThe kurtosis of the values in the statistical aggregate

Examples

Calculate the kurtosis of a sample containing the integers from 0 to 100:

  1. SELECT kurtosis(stats_agg(data))
  2. FROM generate_series(0, 100) data;
  3. kurtosis
  4. ----------
  5. 1.78195

num_vals()

Stabilized in Toolkit v1.3.0

Hide content

`

  1. num_vals(

`

  1. summary StatsSummary1D

`

  1. ) RETURNS BIGINT

`

Calculate the number of values contained in a statistical aggregate.

Required arguments

NameTypeDescription
summaryStatsSummary1DThe statistical aggregate produced by a stats_agg call

Returns

ColumnTypeDescription
num_valsDOUBLE PRECISIONThe number of values in the statistical aggregate

Examples

Calculate the number of values from 0 to 100, inclusive:

  1. SELECT num_vals(stats_agg(data))
  2. FROM generate_series(0, 100) data;
  3. num_vals
  4. --------
  5. 101

skewness()

Stabilized in Toolkit v1.3.0

Hide content

`

  1. skewness(

`

  1. summary StatsSummary1D,
  2. [ method TEXT ]

`

  1. ) RETURNS DOUBLE PRECISION

`

Calculate the skewness from the values in a statistical aggregate. The skewness is the third statistical moment. It is a measure of asymmetry in a data distribution.

Required arguments

NameTypeDescription
summaryStatsSummary1DThe statistical aggregate produced by a stats_agg call

Optional arguments

NameTypeDescription
methodTEXTThe method used for calculating the skewness. The two options are population and sample, which can be abbreviated to pop or samp. Defaults to sample.

Returns

ColumnTypeDescription
skewnessDOUBLE PRECISIONThe skewness of the values in the statistical aggregate

Examples

Calculate the skewness of a sample containing the integers from 0 to 100:

  1. SELECT skewness(stats_agg(data))
  2. FROM generate_series(0, 100) data;
  3. skewness_x
  4. ----------
  5. 0

stddev()

Stabilized in Toolkit v1.3.0

Hide content

`

  1. stddev(

`

  1. summary StatsSummary1D,
  2. [ method TEXT ]

`

  1. ) RETURNS DOUBLE PRECISION

`

Calculate the standard deviation from the values in a statistical aggregate.

Required arguments

NameTypeDescription
summaryStatsSummary1DThe statistical aggregate produced by a stats_agg call

Optional arguments

NameTypeDescription
methodTEXTThe method used for calculating the standard deviation. The two options are population and sample, which can be abbreviated to pop or samp. Defaults to sample.

Returns

ColumnTypeDescription
stddevDOUBLE PRECISIONThe standard deviation of the values in the statistical aggregate

Examples

Calculate the standard deviation of a sample containing the integers from 0 to 100:

  1. SELECT stddev(stats_agg(data))
  2. FROM generate_series(0, 100) data;
  3. stddev_y
  4. --------
  5. 29.3002

sum()

Stabilized in Toolkit v1.3.0

Hide content

`

  1. sum(

`

  1. summary StatsSummary1D

`

  1. ) RETURNS DOUBLE PRECISION

`

Calculate the sum of the values contained in a statistical aggregate.

Required arguments

NameTypeDescription
summaryStatsSummary1DThe statistical aggregate produced by a stats_agg call

Returns

ColumnTypeDescription
sumDOUBLE PRECISIONThe sum of the values in the statistical aggregate

Examples

Calculate the sum of the integers from 0 to 100:

  1. SELECT sum(stats_agg(data))
  2. FROM generate_series(0, 100) data;
  3. sum
  4. -----
  5. 5050

variance()

Stabilized in Toolkit v1.3.0

Hide content

`

  1. variance(

`

  1. summary StatsSummary1D,
  2. [ method TEXT ]

`

  1. ) RETURNS DOUBLE PRECISION

`

Calculate the variance from the values in a statistical aggregate.

Required arguments

NameTypeDescription
summaryStatsSummary1DThe statistical aggregate produced by a stats_agg call

Optional arguments

NameTypeDescription
methodTEXTThe method used for calculating the standard deviation. The two options are population and sample, which can be abbreviated to pop or samp. Defaults to sample.

Returns

ColumnTypeDescription
varianceDOUBLE PRECISIONThe variance of the values in the statistical aggregate

Examples

Calculate the variance of a sample containing the integers from 0 to 100:

  1. SELECT variance(stats_agg(data))
  2. FROM generate_series(0, 100) data;
  3. variance
  4. ----------
  5. 858.5

rolling()

Stabilized in Toolkit v1.3.0

Hide content

`

  1. rolling(

`

  1. ss StatsSummary1D

`

  1. ) RETURNS StatsSummary1D

`

Combine multiple intermediate statistical aggregate (StatsSummary1D) objects into a single StatsSummary1D object. It is optimized for use in a window function context for computing tumbling window statistical aggregates.

note

This is especially useful for computing tumbling window aggregates from a continuous aggregate. It can be orders of magnitude faster because it uses inverse transition and combine functions, with the possibility that bigger floating point errors can occur in unusual scenarios. For re-aggregation in a non-window function context, such as combining hourly buckets into daily buckets, see rollup().

Required arguments

NameTypeDescription
summaryStatsSummary1DThe statistical aggregate produced by a stats_agg call

Returns

ColumnTypeDescription
rollingStatsSummary1DA new statistical aggregate produced by combining the input statistical aggregates

Examples

Combine hourly continuous aggregates to create a tumbling window daily aggregate. Calculate the average and standard deviation using the appropriate accessors:

  1. CREATE MATERIALIZED VIEW foo_hourly
  2. WITH (timescaledb.continuous)
  3. AS SELECT
  4. time_bucket('1h'::interval, ts) AS bucket,
  5. stats_agg(value) as stats
  6. FROM foo
  7. GROUP BY 1;
  8. SELECT
  9. bucket,
  10. average(rolling(stats) OVER (ORDER BY bucket RANGE '1 day' PRECEDING)),
  11. stddev(rolling(stats) OVER (ORDER BY bucket RANGE '1 day' PRECEDING)),
  12. FROM foo_hourly;

rollup()

Stabilized in Toolkit v1.3.0

Hide content

`

  1. rollup(

`

  1. ss StatsSummary1D

`

  1. ) RETURNS StatsSummary1D

`

Combine multiple intermediate statistical aggregate (StatsSummary1D) objects produced by stats_agg (one variable) into a single intermediate StatsSummary1D object. For example, you can use rollup to combine statistical aggregates from 15-minute buckets into daily buckets. For use in window functions, see rolling().

Required arguments

NameTypeDescription
summaryStatsSummary1DThe statistical aggregate produced by a stats_agg call

Returns

ColumnTypeDescription
rollupStatsSummary1DA new statistical aggregate produced by combining the input statistical aggregates

Extended examples

Create a statistical aggregate to summarize daily statistical data about the variable val1. Use the statistical aggregate to calculate average, standard deviation, and skewness of the variable:

  1. WITH t as (
  2. SELECT
  3. time_bucket('1 day'::interval, ts) as dt,
  4. stats_agg(val1) AS stats1D
  5. FROM foo
  6. WHERE id = 'bar'
  7. GROUP BY time_bucket('1 day'::interval, ts)
  8. )
  9. SELECT
  10. average(stats1D),
  11. stddev(stats1D),
  12. skewness(stats1D)
  13. FROM t;