titlesidebar_labeldescriptionimage
Configuring commit lag of out-of-order (O3) data
Out-of-order commit lag
This document describes server configuration parameters for out-of-order commit-lag along with details when and why these settings should be applied
/img/guides/out-of-order-commit-lag/o3-data.jpeg

Server configuration may be applied when ingesting data over InfluxDB Line Protocol (ILP) to allow user control on how the system processes and commits late-arriving data for optimum throughput.

Background

As of software version 6.0, QuestDB adds support for out-of-order (O3) data ingestion. The skew and latency of out-of-order data are likely to be relatively constant, so users may configure ingestion based on the characteristics of the data.

Most real-time out-of-order data patterns are caused by the delivery mechanism and hardware jitter, and therefore the timestamp distribution will be contained by some boundary.

import Screenshot from “@theme/Screenshot”

If any new timestamp value has a high probability to arrive within 10 seconds of the previously received value, the boundary for this data is 10 seconds and we name this lag.

When the order of timestamp values follow this pattern, it will be recognized by our out-of-order algorithm and prioritized using an optimized processing path. A commit of this data re-orders uncommitted rows and then commits all rows up to the boundary; the remaining rows stay in memory to be committed later.

Out-of-order (O3) commit parameters

Commit parameters allow for specifying that commits of out-of-order data should occur when:

  • they are outside a window of time for which they are expected to be out-of-order or
  • when the row-count passes a certain threshold.

:::info

Commit parameters are user-configurable for ingestion using InfluxDB line protocol only. This is the case as commits over Postgres wire protocol are invoked client-side and commits via REST API occur either row-by-row or after a CSV import is complete.

:::

The following server configuration parameters are user-configurable:

  1. # the maximum number of uncommitted rows
  2. cairo.max.uncommitted.rows=X
  3. # the maximum time between jobs that commit uncommitted rows
  4. cairo.commit.lag=X
  5. # the maximum time between ILP jobs that commit uncommitted rows
  6. line.tcp.maintenance.job.interval=X

These parameters are enforced so that commits occur if any one of these conditions are met, therefore out-of-order commits occur based on the age of out-of-order records or by record count.

An out-of-order commit will occur:

  • every cairo.max.uncommitted.rows or
  • if records haven’t been committed for line.tcp.maintenance.job.interval

If a commit occurs due to cairo.max.uncommitted.rows being reached, then cairo.commit.lag will be applied.

When to change out-of-order commit configuration

The defaults for the out-of-order algorithm are optimized for real-world usage and should cover most patterns for timestamp arrival. The default configuration is as follows:

  1. cairo.commit.lag=300000
  2. cairo.max.uncommitted.rows=500000
  3. line.tcp.maintenance.job.interval=30000

Users should modify out-of-order parameters if there is a known or expected pattern for:

  1. The length of time by which most records are late
  2. The frequency of incoming records and the expected throughput

For optimal ingestion performance, the number of commits of out-of-order data should be minimized. For this reason, if throughput is low and timestamps are expected to be consistently delayed up to thirty seconds, the following configuration settings can be applied

  1. cairo.commit.lag=30000
  2. cairo.max.uncommitted.rows=500

For high-throughput scenarios, lower commit timer and larger number of uncommitted rows may be more appropriate. The following settings would assume a throughput of ten thousand records per second with a likely maximum of 1 second lateness for timestamp values:

  1. cairo.commit.lag=1000
  2. cairo.max.uncommitted.rows=10000

How to configure out-of-order ingestion

Server-wide configuration

These settings may be applied via server configuration file:

  1. cairo.max.uncommitted.rows=500
  2. cairo.commit.lag=10000
  3. line.tcp.maintenance.job.interval=1000

As with other server configuration parameters, these settings may be passed as environment variables:

  • QDB_LINE_TCP_MAINTENANCE_JOB_INTERVAL
  • QDB_CAIRO_MAX_UNCOMMITTED_ROWS
  • QDB_CAIRO_COMMIT_LAG

To set this configuration for the current shell:

  1. export QDB_CAIRO_MAX_UNCOMMITTED_ROWS=1000
  2. export QDB_CAIRO_COMMIT_LAG=20000
  3. questdb start

Passing the environment variables via Docker is done using the -e flag:

  1. docker run -p 9000:9000 \
  2. -p 9009:9009 \
  3. -p 8812:8812 \
  4. -p 9003:9003 \
  5. -e QDB_CAIRO_MAX_UNCOMMITTED_ROWS=1000 \
  6. -e QDB_CAIRO_COMMIT_LAG=20000 \
  7. questdb/questdb

Per-table lag and maximum uncommitted rows

It’s possible to set out-of-order values per table when creating a new table as part of the PARTITION BY clause. Configuration is passed using the WITH keyword with the following two parameters:

  • maxUncommittedRows - equivalent to cairo.max.uncommitted.rows
  • commitLag - equivalent to cairo.commit.lag
  1. CREATE TABLE my_table (timestamp TIMESTAMP) timestamp(timestamp)
  2. PARTITION BY DAY WITH maxUncommittedRows=250000, commitLag=240s

Checking the values per-table may be done using the tables() function:

  1. select id, name, maxUncommittedRows, commitLag from tables();
idnamemaxUncommittedRowscommitLag
1my_table250000240000000
2device_data1000030000000

The values can changed per each table with:

  1. ALTER TABLE my_table SET PARAM maxUncommittedRows = 10000

and

  1. ALTER TABLE my_table SET PARAM commitLag = 20s

For more information on checking table metadata, see the meta functions documentation page.

INSERT lag and batch size

The INSERT keyword may be passed parameters for handling the expected lag of out-of-order records and a batch size for the number of rows to process and insert at once. The following query shows an INSERT AS SELECT operation with lag and batch size applied:

  1. INSERT batch 100000 lag 180000000 INTO trades
  2. SELECT ts, instrument, quantity, price
  3. FROM unordered_trades

:::info

Using the lag and batch size parameters during INSERT AS SELECT statements is a convenient strategy to load and order large datasets from CSV in bulk. This strategy along with an example workflow is described in the importing data guide.

:::