time_weight() functions

Introduction

Calculate time-weighted summary statistics, such as averages (means) and integrals. Time weighting is used when data is unevenly sampled over time. In that case, a straight average gives misleading results, as it biases towards more frequently sampled values.

For example, a sensor might silently spend long periods of time in a steady state, and send data only when a significant change occurs. The regular mean counts the steady-state reading as only a single point, whereas a time-weighted mean accounts for the long period of time spent in the steady state. In essence, the time-weighted mean takes an integral over time, then divides by the elapsed time.

Two-step aggregation

Hide content

This group of functions uses the two-step aggregation pattern.

Rather than calculating the final result in one step, you first create an intermediate aggregate by using the aggregate function.

Then, use any of the accessors on the intermediate aggregate to calculate a final result. You can also roll up multiple intermediate aggregates with the rollup functions.

The two-step aggregation pattern has several advantages:

  1. More efficient because multiple accessors can reuse the same aggregate
  2. Easier to reason about performance, because aggregation is separate from final computation
  3. Easier to understand when calculations can be rolled up into larger intervals, especially in window functions and continuous aggregates
  4. Can perform retrospective analysis even when underlying data is dropped, because the intermediate aggregate stores extra information not available in the final result

To learn more, see the blog post on two-step aggregates.

Functions in this group

warning

This function group includes some experimental functions. Experimental functions might change or be removed in future releases. We do not recommend using them in production. Experimental functions are marked with an Experimental tag.

Aggregate

time_weight

Aggregate data into an intermediate time-weighted aggregate form for further calculation

Accessor

average

Calculate the time-weighted average of values in a TimeWeightSummary

first_time

Get the first timestamp from a TimeWeightSummary aggregate

first_val

Get the first value from a TimeWeightSummary aggregate

integral

ExperimentalCalculate the integral from a TimeWeightSummary

interpolated_average

ExperimentalCalculate the time-weighted average over an interval, while interpolating the interval bounds

interpolated_integral

ExperimentalCalculate the integral over an interval, while interpolating the interval bounds

last_time

Get the last timestamp from a TimeWeightSummary aggregate

last_val

Get the last value from a TimeWeightSummary aggregate

Rollup

rollup

Combine multiple TimeWeightSummaries

Function details

time_weight()

Stabilized in Toolkit v1.0.0

Hide content

`

  1. time_weight(

`

  1. method TEXT,
  2. ts TIMESTAMPTZ,
  3. value DOUBLE PRECISION

`

  1. ) RETURNS TimeWeightSummary

`

This is the first step for performing any time-weighted calculations. Use time_weight to create an intermediate aggregate (TimeWeightSummary) from your data. This intermediate form can then be used by one or more accessors in this group to compute final results. Optionally, multiple such intermediate aggregate objects can be combined using rollup() before an accessor is applied.

Required arguments

NameTypeDescription
methodTEXTThe weighting method to use. The available methods are linear (or its alias trapezoidal, for those familiar with numeric integration methods) and LOCF, which stands for ‘last observation carried forward’. linear fills in missing data by interpolating linearly between the start and end points of the gap. LOCF fills in the gap by assuming that the value remains constant until the next value is seen. LOCF is most useful when a measurement is taken only when a value changes. linear is most useful if there are no such guarantees on the measurement. The method names are case-insensitive.
tsTIMESTAMPTZThe time at each point. Null values are ignored. An aggregate evaluated on only null values returns null.
valueDOUBLE PRECISIONThe value at each point to use for the time-weighted aggregate. Null values are ignored. An aggregate evaluated on only null values returns null.

Returns

ColumnTypeDescription
time_weightTimeWeightSummaryA TimeWeightSummary object that can be passed to other functions within the time-weighting API

Examples

Aggregate data from column val into daily time-weighted aggregates, using the linear interpolation method:

  1. SELECT
  2. time_bucket('1 day'::interval, ts) as dt,
  3. time_weight('Linear', ts, val) AS tw
  4. FROM foo
  5. GROUP BY time_bucket('1 day'::interval, ts)

average()

Stabilized in Toolkit v1.0.0

Hide content

`

  1. average(

`

  1. tws TimeWeightSummary

`

  1. ) RETURNS DOUBLE PRECISION

`

Calculate the time-weighted average. Equal to integral divided by the elapsed time.

Required arguments

NameTypeDescription
twsTimeWeightSummaryThe input TimeWeightSummary from a time_weight() call.

Returns

ColumnTypeDescription
averageDOUBLE PRECISIONThe time-weighted average.

Examples

Calculate the time-weighted average of the column val, using the ‘last observation carried forward’ interpolation method:

  1. SELECT
  2. id,
  3. average(tws)
  4. FROM (
  5. SELECT
  6. id,
  7. time_weight('LOCF', ts, val) AS tws
  8. FROM foo
  9. GROUP BY id
  10. ) t

first_time()

Stabilized in Toolkit v1.11.0

Hide content

`

  1. first_time(

`

  1. tw TimeWeightSummary

`

  1. ) RETURNS TIMESTAMPTZ

`

Get the timestamp of the first point in a TimeWeightSummary aggregate.

Required arguments

NameTypeDescription
twsTimeWeightSummaryThe input TimeWeightSummary from a time_weight() call.

Returns

ColumnTypeDescription
first_timeTIMESTAMPTZThe time of the first point in the TimeWeightSummary

Examples

Produce a linear TimeWeightSummary over the column val and get the first timestamp:

  1. WITH t as (
  2. SELECT
  3. time_bucket('1 day'::interval, ts) as dt,
  4. time_weight('Linear', ts, val) AS tw
  5. FROM table
  6. GROUP BY time_bucket('1 day'::interval, ts)
  7. )
  8. SELECT
  9. dt,
  10. first_time(tw)
  11. FROM t;

first_val()

Stabilized in Toolkit v1.11.0

Hide content

`

  1. first_val(

`

  1. tw TimeWeightSummary

`

  1. ) RETURNS DOUBLE PRECISION

`

Get the value of the first point in a TimeWeightSummary aggregate.

Required arguments

NameTypeDescription
twsTimeWeightSummaryThe input TimeWeightSummary from a time_weight() call.

Returns

ColumnTypeDescription
first_valDOUBLE PRECISIONThe value of the first point in the TimeWeightSummary

Examples

Produce a linear TimeWeightSummary over the column val and get the first value:

  1. WITH t as (
  2. SELECT
  3. time_bucket('1 day'::interval, ts) as dt,
  4. time_weight('Linear', ts, val) AS tw
  5. FROM table
  6. GROUP BY time_bucket('1 day'::interval, ts)
  7. )
  8. SELECT
  9. dt,
  10. first_val(tw)
  11. FROM t;

integral()

ExperimentalExperimental TimescaleDB Toolkit functions are not suitable for production environments. They may have bugs and may cause data loss. Click to learn more.

Introduced in Toolkit v1.11.0

Hide content

`

  1. integral(

`

  1. tws TimeWeightSummary,
  2. unit TEXT

`

  1. ) RETURNS DOUBLE PRECISION

`

Calculate the integral, or the area under the curve formed by the data points. Equal to average multiplied by the elapsed time.

Required arguments

NameTypeDescription
twsTimeWeightSummaryThe input TimeWeightSummary from a time_weight() call.

Optional arguments

NameTypeDescription
unitTEXTThe unit of time to express the integral in. Can be microsecond, millisecond, second, minute, hour, or any alias for those units supported by PostgreSQL. Defaults to second.

Returns

ColumnTypeDescription
integralDOUBLE PRECISIONThe time-weighted integral.

Examples

Create a table to track irregularly sampled storage usage in bytes, and get the total storage used in byte-hours. Use the ‘last observation carried forward’ interpolation method:

  1. -- Create a table to track irregularly sampled storage usage
  2. CREATE TABLE user_storage_usage(ts TIMESTAMP, storage_bytes BIGINT);
  3. INSERT INTO user_storage_usage(ts, storage_bytes) VALUES
  4. ('01-01-2022 00:00', 0),
  5. ('01-01-2022 00:30', 100),
  6. ('01-01-2022 03:00', 300),
  7. ('01-01-2022 03:10', 1000),
  8. ('01-01-2022 03:25', 817);
  9. -- Get the total byte-hours used
  10. SELECT
  11. toolkit_experimental.integral(time_weight('LOCF', ts, storage_bytes), 'hours')
  12. FROM
  13. user_storage_usage;

interpolated_average()

ExperimentalExperimental TimescaleDB Toolkit functions are not suitable for production environments. They may have bugs and may cause data loss. Click to learn more.

Introduced in Toolkit v1.8.0

Hide content

`

  1. interpolated_average(

`

  1. tws TimeWeightSummary,
  2. start TIMESTAMPTZ,
  3. interval INTERVAL
  4. [, prev TimeWeightSummary]
  5. [, next TimeWeightSummary]

`

  1. ) RETURNS DOUBLE PRECISION

`

Calculate the time-weighted average over an interval, while interpolating the interval bounds.

Similar to average, but allows an accurate calculation across interval bounds when data has been bucketed into separate time intervals, and there is no data point precisely at the interval bound. For example, this is useful in a window function.

Values from the previous and next buckets are used to interpolate the values at the bounds, using the same interpolation method used within the TimeWeightSummary itself.

Equal to interpolated_integral divided by the elapsed time.

Required arguments

NameTypeDescription
twsTimeWeightSummaryThe input TimeWeightSummary from a time_weight() call.
startTIMESTAMPTZThe start of the interval which the time-weighted average should cover (if there is a preceeding point).
intervalINTERVALThe length of the interval which the time-weighted average should cover.

Optional arguments

NameTypeDescription
prevTimeWeightSummaryThe TimeWeightSummary from the prior interval, used to interpolate the value at start. If NULL, the first timestamp in tws is used for the starting value. The prior interval can be determined from the PostgreSQL lag() function.
nextTimeWeightSummaryThe TimeWeightSummary from the next interval, used to interpolate the value at start + interval. If NULL, the first timestamp in tws is used for the starting value. The next interval can be determined from the PostgreSQL lead() function.

Returns

ColumnTypeDescription
averageDOUBLE PRECISIONThe time-weighted average for the interval (start, start + interval), computed from the TimeWeightSummary plus end points interpolated from prev and next

Examples

Calculate the time-weighted daily average of the column val, interpolating over bucket bounds using the ‘last observation carried forward’ method:

  1. SELECT
  2. id,
  3. time,
  4. interpolated_average(
  5. tws,
  6. time,
  7. '1 day',
  8. LAG(tws) OVER (PARTITION BY id ORDER by time),
  9. LEAD(tws) OVER (PARTITION BY id ORDER by time)
  10. )
  11. FROM (
  12. SELECT
  13. id,
  14. time_bucket('1 day', ts) AS time,
  15. time_weight('LOCF', ts, val) AS tws
  16. FROM foo
  17. GROUP BY id, time
  18. ) t

interpolated_integral()

ExperimentalExperimental TimescaleDB Toolkit functions are not suitable for production environments. They may have bugs and may cause data loss. Click to learn more.

Introduced in Toolkit v1.11.0

Hide content

`

  1. interpolated_integral(

`

  1. tws TimeWeightSummary,
  2. start TIMESTAMPTZ,
  3. interval INTERVAL
  4. [, prev TimeWeightSummary]
  5. [, next TimeWeightSummary]

`

  1. ) RETURNS DOUBLE PRECISION

`

Calculate the integral over an interval, while interpolating the interval bounds.

Similar to integral, but allows an accurate calculation across interval bounds when data has been bucketed into separate time intervals, and there is no data point precisely at the interval bound. For example, this is useful in a window function.

Values from the previous and next buckets are used to interpolate the values at the bounds, using the same interpolation method used within the TimeWeightSummary itself.

Equal to interpolated_average multiplied by the elapsed time.

Required arguments

NameTypeDescription
twsTimeWeightSummaryThe input TimeWeightSummary from a time_weight() call.
startTIMESTAMPTZThe start of the interval which the time-weighted average should cover (if there is a preceding point).
intervalINTERVALThe length of the interval which the time-weighted average should cover.

Optional arguments

NameTypeDescription
prevTimeWeightSummaryThe TimeWeightSummary from the prior interval, used to interpolate the value at start. If NULL, the first timestamp in tws is used for the starting value. The prior interval can be determined from the PostgreSQL lag() function.
nextTimeWeightSummaryThe TimeWeightSummary from the next interval, used to interpolate the value at start + interval. If NULL, the first timestamp in tws is used for the starting value. The next interval can be determined from the PostgreSQL lead() function.

Returns

ColumnTypeDescription
integralDOUBLE PRECISIONThe time-weighted integral for the interval (start, start + interval), computed from the TimeWeightSummary plus end points interpolated from prev and next

Examples

Create a table to track irregularly sampled storage usage in bytes, and get the total storage used in byte-hours between January 1 and January 6. Use the ‘last observation carried forward’ interpolation method:

  1. -- Create a table to track irregularly sampled storage usage
  2. CREATE TABLE user_storage_usage(ts TIMESTAMP, storage_bytes BIGINT);
  3. INSERT INTO user_storage_usage(ts, storage_bytes) VALUES
  4. ('01-01-2022 20:55', 27),
  5. ('01-02-2022 18:33', 100),
  6. ('01-03-2022 03:05', 300),
  7. ('01-04-2022 12:13', 1000),
  8. ('01-05-2022 07:26', 817);
  9. -- Get the total byte-hours used between Jan. 1 and Jan. 6
  10. SELECT
  11. toolkit_experimental.interpolated_integral(
  12. time_weight('LOCF', ts, storage_bytes),
  13. '01-01-2022',
  14. '5 days',
  15. NULL,
  16. NULL,
  17. 'hours'
  18. )
  19. FROM
  20. user_storage_usage;

last_time()

Stabilized in Toolkit v1.11.0

Hide content

`

  1. last_time(

`

  1. tw TimeWeightSummary

`

  1. ) RETURNS TIMESTAMPTZ

`

Get the timestamp of the last point in a TimeWeightSummary aggregate.

Required arguments

NameTypeDescription
twsTimeWeightSummaryThe input TimeWeightSummary from a time_weight() call.

Returns

ColumnTypeDescription
last_timeTIMESTAMPTZThe time of the last point in the TimeWeightSummary

Examples

Produce a linear TimeWeightSummary over the column val and get the last timestamp:

  1. WITH t as (
  2. SELECT
  3. time_bucket('1 day'::interval, ts) as dt,
  4. time_weight('Linear', ts, val) AS tw
  5. FROM table
  6. GROUP BY time_bucket('1 day'::interval, ts)
  7. )
  8. SELECT
  9. dt,
  10. last_time(tw)
  11. FROM t;

last_val()

Stabilized in Toolkit v1.11.0

Hide content

`

  1. last_val(

`

  1. tw TimeWeightSummary

`

  1. ) RETURNS DOUBLE PRECISION

`

Get the value of the last point in a TimeWeightSummary aggregate.

Required arguments

NameTypeDescription
twsTimeWeightSummaryThe input TimeWeightSummary from a time_weight() call.

Returns

ColumnTypeDescription
last_valDOUBLE PRECISIONThe value of the last point in the TimeWeightSummary

Examples

Produce a linear TimeWeightSummary over the column val and get the last value:

  1. WITH t as (
  2. SELECT
  3. time_bucket('1 day'::interval, ts) as dt,
  4. time_weight('Linear', ts, val) AS tw
  5. FROM table
  6. GROUP BY time_bucket('1 day'::interval, ts)
  7. )
  8. SELECT
  9. dt,
  10. last_val(tw)
  11. FROM t;

rollup()

Stabilized in Toolkit v1.0.0

Hide content

`

  1. rollup(

`

  1. tws TimeWeightSummary

`

  1. ) RETURNS TimeWeightSummary

`

Combine multiple intermediate time-weighted aggregate (TimeWeightSummary) objects produced by time_weight() into a single intermediate TimeWeightSummary object. For example, you can use rollup to combine time-weighted aggregates from 15-minute buckets into daily buckets.

Required arguments

NameTypeDescription
time_weightTimeWeightSummaryThe TimeWeightSummary aggregate produced by a time_weight call

Returns

ColumnTypeDescription
rollupStatsSummary1DA new TimeWeightSummary aggregate produced by combining the input TimeWeightSummary aggregates

Extended examples

Aggregate data into a TimeWeightSummary and calculate the average

Given a table foo with data in a column val, aggregate data into a daily TimeWeightSummary. Use that to calculate the average for column val:

  1. WITH t as (
  2. SELECT
  3. time_bucket('1 day'::interval, ts) as dt,
  4. time_weight('Linear', ts, val) AS tw
  5. FROM foo
  6. WHERE measure_id = 10
  7. GROUP BY time_bucket('1 day'::interval, ts)
  8. )
  9. SELECT
  10. dt,
  11. average(tw)
  12. FROM t;

Advanced usage

Parallelism and ordering

Time-weighted average calculations are not strictly parallelizable, as defined by PostgreSQL. These calculations require inputs to be strictly ordered, but in general, PostgreSQL parallelizes by assigning rows randomly to workers.

However, the algorithm can be parallelized if it is guaranteed that all rows within some time range go to the same worker. This is the case for both continuous aggregates and distributed hypertables. (Note that the partitioning keys of the distributed hypertable must be within the GROUP BY clause, but this is usually the case.)

Combining aggregates across measurement series

If you try to combine overlapping TimeWeightSummaries, an error is thrown. For example, you might create a TimeWeightSummary for device_1 and a separate TimeWeightSummary for device_2, both covering the same period of time. You can’t combine these because the interpolation techniques only make sense when restricted to a single measurement series.

If you want to calculate a single summary statistic across all devices, use a simple average, like this:

  1. WITH t as (SELECT measure_id,
  2. average(
  3. time_weight('LOCF', ts, val)
  4. ) as time_weighted_average
  5. FROM foo
  6. GROUP BY measure_id)
  7. SELECT avg(time_weighted_average) -- use the normal avg function to average the time-weighted averages
  8. FROM t;

Parallelism in multi-node

The time-weighted average functions are not strictly parallelizable in the PostgreSQL sense. PostgreSQL requires that parallelizable functions accept potentially overlapping input. As explained above, the time-weighted functions do not. However, they do support partial aggregation and partition-wise aggregation in multi-node setups.

Reducing memory usage

Because the time-weighted aggregates require ordered sets, they build up a buffer of input data, sort it, and then perform the aggregation steps. When memory is too small to build up a buffer of points, you might see Out of Memory failures or other issues. In these cases, try using a multi-level aggregate. For example:

  1. WITH t as (SELECT measure_id,
  2. time_bucket('1 day'::interval, ts),
  3. time_weight('LOCF', ts, val)
  4. FROM foo
  5. GROUP BY measure_id, time_bucket('1 day'::interval, ts)
  6. )
  7. SELECT measure_id,
  8. average(
  9. rollup(time_weight)
  10. )
  11. FROM t
  12. GROUP BY measure_id;