Handle duplicate data points

This page documents an earlier version of InfluxDB. InfluxDB v2.7 is the latest stable version. View this page in the v2.7 documentation.

InfluxDB identifies unique data points by their measurement, tag set, and timestamp (each a part of Line protocol used to write data to InfluxDB).

  1. web,host=host2,region=us_west firstByte=15.0 1559260800000000000
  2. --- ------------------------- -------------------
  3. | | |
  4. Measurement Tag set Timestamp

Duplicate data points

For points that have the same measurement name, tag set, and timestamp, InfluxDB creates a union of the old and new field sets. For any matching field keys, InfluxDB uses the field value of the new point. For example:

  1. # Existing data point
  2. web,host=host2,region=us_west firstByte=24.0,dnsLookup=7.0 1559260800000000000
  3. # New data point
  4. web,host=host2,region=us_west firstByte=15.0 1559260800000000000

After you submit the new data point, InfluxDB overwrites firstByte with the new field value and leaves the field dnsLookup alone:

  1. # Resulting data point
  2. web,host=host2,region=us_west firstByte=15.0,dnsLookup=7.0 1559260800000000000
  1. from(bucket: "example-bucket")
  2. |> range(start: 2019-05-31T00:00:00Z, stop: 2019-05-31T12:00:00Z)
  3. |> filter(fn: (r) => r._measurement == "web")
  4. Table: keys: [_measurement, host, region]
  5. _time _measurement host region dnsLookup firstByte
  6. -------------------- ------------ ----- ------- --------- ---------
  7. 2019-05-31T00:00:00Z web host2 us_west 7 15

Preserve duplicate points

To preserve both old and new field values in duplicate points, use one of the following strategies:

Add an arbitrary tag

Add an arbitrary tag with unique values so InfluxDB reads the duplicate points as unique.

For example, add a uniq tag to each data point:

  1. # Existing point
  2. web,host=host2,region=us_west,uniq=1 firstByte=24.0,dnsLookup=7.0 1559260800000000000
  3. # New point
  4. web,host=host2,region=us_west,uniq=2 firstByte=15.0 1559260800000000000

It is not necessary to retroactively add the unique tag to the existing data point. Tag sets are evaluated as a whole. The arbitrary uniq tag on the new point allows InfluxDB to recognize it as a unique point. However, this causes the schema of the two points to differ and may lead to challenges when querying the data.

After writing the new point to InfluxDB:

  1. from(bucket: "example-bucket")
  2. |> range(start: 2019-05-31T00:00:00Z, stop: 2019-05-31T12:00:00Z)
  3. |> filter(fn: (r) => r._measurement == "web")
  4. Table: keys: [_measurement, host, region, uniq]
  5. _time _measurement host region uniq firstByte dnsLookup
  6. -------------------- ------------ ----- ------- ---- --------- ---------
  7. 2019-05-31T00:00:00Z web host2 us_west 1 24 7
  8. Table: keys: [_measurement, host, region, uniq]
  9. _time _measurement host region uniq firstByte
  10. -------------------- ------------ ----- ------- ---- ---------
  11. 2019-05-31T00:00:00Z web host2 us_west 2 15

Increment the timestamp

Increment the timestamp by a nanosecond to enforce the uniqueness of each point.

  1. # Old data point
  2. web,host=host2,region=us_west firstByte=24.0,dnsLookup=7.0 1559260800000000000
  3. # New data point
  4. web,host=host2,region=us_west firstByte=15.0 1559260800000000001

After writing the new point to InfluxDB:

  1. from(bucket: "example-bucket")
  2. |> range(start: 2019-05-31T00:00:00Z, stop: 2019-05-31T12:00:00Z)
  3. |> filter(fn: (r) => r._measurement == "web")
  4. Table: keys: [_measurement, host, region]
  5. _time _measurement host region firstByte dnsLookup
  6. ------------------------------ ------------ ----- ------- --------- ---------
  7. 2019-05-31T00:00:00.000000000Z web host2 us_west 24 7
  8. 2019-05-31T00:00:00.000000001Z web host2 us_west 15

The output of examples queries in this article has been modified to clearly show the different approaches and results for handling duplicate data.

best practices write