gp_lttb()

Gap preserving LTTB is a specialization of the LTTB algorithm that preserves gaps in the underlying data. It uses a minimum gap size, which can be provided by the caller, or that defaults to the time span of the incoming data divided by the desired target resolution. It then scans through the input and breaks it into ranges of points separated by gaps of at least the minimum size. It then performs a LTTB on each range, with resolution proportional to the percentage of the total points with that range, and concatenates the results.

If the minimum gap size is set too low, the algorithm might produce more points than the target resolution. This occurs if there are more ranges than half the target resolution, because each range returns at least two points.

Required arguments

NameTypeDescription
timeTIMESTAMPTZTime (x) value for the data point
valueDOUBLE PRECISIONData (y) value for the data point
resolutionINTEGERNumber of points the output should have

Optional arguments

NameTypeDescription
gapsizeINTERVALMinimum gap size to divide input on

If the gapsize is not provided it will be computed as the interval between the first and last times, divided by the resolution.

Returns

ColumnTypeDescription
timevectorTimevectorA timevector object containing the downsampled points. It can be unpacked with unnest.

Sample usage

This example uses a table with raw data generated as a sine wave, and removes a day from the middle of the data:

  1. SET TIME ZONE 'UTC';
  2. CREATE TABLE metrics(date TIMESTAMPTZ, reading DOUBLE PRECISION);
  3. INSERT INTO metrics
  4. SELECT
  5. '2020-1-1 UTC'::timestamptz + make_interval(hours=>foo),
  6. (5 + 5 * sin(foo / 24.0 * PI()))
  7. FROM generate_series(1,168) foo;
  8. DELETE FROM metrics WHERE date BETWEEN '2020-1-4 UTC' AND '2020-1-5 UTC';

You can use gap preserving LTTB to downsample the data while keeping the bounds of the missing region:

  1. SELECT time, value
  2. FROM unnest((
  3. SELECT toolkit_experimental.gp_lttb(date, reading, 8)
  4. FROM metrics))

Which provides these results:

  1. time | value
  2. ------------------------+-------------------
  3. 2020-01-01 01:00:00+00 | 5.652630961100257
  4. 2020-01-02 12:00:00+00 | 0
  5. 2020-01-03 23:00:00+00 | 5.652630961100255
  6. 2020-01-05 01:00:00+00 | 5.652630961100259
  7. 2020-01-05 13:00:00+00 | 9.957224306869051
  8. 2020-01-06 12:00:00+00 | 0
  9. 2020-01-07 10:00:00+00 | 9.82962913144534
  10. 2020-01-08 00:00:00+00 | 5.000000000000004