Continuous aggregates

Aggregate queries (min(), max(), avg()…) which touch large swathes of time-series data can take a long time to compute because the system needs to scan large amounts of data on every query execution. To make such queries faster, a continuous aggregate allows materializing the computed aggregates, while also providing means to continuously, and with low overhead, keep them up-to-date as the underlying source data changes.

Continuous aggregates are somewhat similar to PostgreSQL’s materialized views, but, unlike a materialized view, a continuous aggregate can be continuously and incrementally refreshed. The refreshing can be done either manually or via a policy that runs in the background, and can cover the entire continuous aggregate or just a specific time range. In either case, the refresh only recomputes the aggregate buckets that have changed since the last refresh.

An introductory example

As a quick introductory example, let’s create a hypertable conditions containing temperature data for devices and a continuous aggregate to compute the hourly average, minimum, and maximum temperature. Start off by creating the hypertable and populate it with some data:

  1. CREATE TABLE conditions (
  2. time TIMESTAMPTZ NOT NULL,
  3. device INTEGER NOT NULL,
  4. temperature FLOAT NOT NULL,
  5. PRIMARY KEY(time, device)
  6. );
  7. SELECT * FROM create_hypertable('conditions', 'time', 'device', 3);
  8. INSERT INTO conditions
  9. SELECT time, (random()*30)::int, random()*80 - 40
  10. FROM generate_series(TIMESTAMP '2020-01-01 00:00:00',
  11. TIMESTAMP '2020-06-01 00:00:00',
  12. INTERVAL '10 min') AS time;

You can then create a continuous aggregate view to compute the hourly average, minimum, and maximum temperature:

  1. CREATE MATERIALIZED VIEW conditions_summary_hourly
  2. WITH (timescaledb.continuous) AS
  3. SELECT device,
  4. time_bucket(INTERVAL '1 hour', time) AS bucket,
  5. AVG(temperature),
  6. MAX(temperature),
  7. MIN(temperature)
  8. FROM conditions
  9. GROUP BY device, bucket;

Lastly, you should add a policy to ensure that the continuous aggregate is refreshed on a regular basis.

  1. SELECT add_continuous_aggregate_policy('conditions_summary_hourly',
  2. start_offset => INTERVAL '1 month',
  3. end_offset => INTERVAL '1 h',
  4. schedule_interval => INTERVAL '1 h');

In this case, the continuous aggregate is refreshed every hour and refresh the last month’s data.

You can now run a normal SELECT on the continuous aggregate and it gives you the aggregated data, for example, to select the hourly averages for device 1 during the first three months:

  1. SELECT bucket, avg
  2. FROM conditions_summary_hourly
  3. WHERE device = 1 AND bucket BETWEEN '2020-01-01' AND '2020-03-31'
  4. ORDER BY bucket;
important

Continuous aggregates support many PostgreSQL aggregate functions and features. Support depends on your TimescaleDB version. For a table of supported features, see the how-to guide on continuous aggregates.

Real-time aggregation

A query on a continuous aggregate, by default, uses real-time aggregation (first introduced in TimescaleDB 1.7) to combine materialized aggregates with recent data from the source hypertable. By combining raw and materialized data in this way, real-time aggregation produces accurate and up-to-date results while still benefiting from pre-computed aggregates for a large portion of the result.

Real-time aggregation is the default behavior for any new continuous aggregates. To disable real-time aggregation and show only materialized data, add the parameter timescaledb.materialized_only=true when creating the continuous aggregate view or set it on an existing continuous aggregate using ALTER MATERIALIZED VIEW.

tip

To use real-time aggregation on a continuous aggregate created in a version earlier than TimescaleDB 1.7, alter the view to set timescaledb.materialized_only=false.

Automatic refresh with a continuous aggregate policy

Continuous aggregates can be kept up-to-date through the last bucket width of time by using continuous aggregate policies. Policies allow you to keep a specified window of time within the continuous aggregate updated on a schedule. This provides the ability to do things like:

  • have the continuous aggregate and the hypertable be in sync, even when data is removed from the hypertable, or
  • keep the aggregate data in the continuous aggregate when removing source data from the hypertable.

Manually refreshing continuous aggregate ranges

It is also possible, starting with TimescaleDB 2.0, to manually refresh a specific window of time in a continuous aggregate using refresh_continuous_aggregate. Using this TimescaleDB function gives users the ability to have the best of both worlds: automatic refresh of recent data and targeted updates to time ranges that may occur further in history through a backfill process. Using both tools to keep continuous aggregates up-to-date provides great control and flexibility!