Handle duplicate data points

InfluxDB identifies unique data points by their measurement, tag set, and timestamp (each a part of Line protocol used to write data to InfluxDB).

  1. web,host=host2,region=us_west firstByte=15.0 1559260800000000000
  2. --- ------------------------- -------------------
  3. | | |
  4. Measurement Tag set Timestamp

Duplicate data points

For points that have the same measurement name, tag set, and timestamp, InfluxDB creates a union of the old and new field sets. For any matching field keys, InfluxDB uses the field value of the new point. For example:

  1. # Existing data point
  2. web,host=host2,region=us_west firstByte=24.0,dnsLookup=7.0 1559260800000000000
  3. # New data point
  4. web,host=host2,region=us_west firstByte=15.0 1559260800000000000

After you submit the new data point, InfluxDB overwrites firstByte with the new field value and leaves the field dnsLookup alone:

  1. # Resulting data point
  2. web,host=host2,region=us_west firstByte=15.0,dnsLookup=7.0 1559260800000000000
  1. from(bucket: "example-bucket")
  2. |> range(start: 2019-05-31T00:00:00Z, stop: 2019-05-31T12:00:00Z)
  3. |> filter(fn: (r) => r._measurement == "web")
  4. Table: keys: [_measurement, host, region]
  5. _time _measurement host region dnsLookup firstByte
  6. -------------------- ------------ ----- ------- --------- ---------
  7. 2019-05-31T00:00:00Z web host2 us_west 7 15

Preserve duplicate points

To preserve both old and new field values in duplicate points, use one of the following strategies:

Add an arbitrary tag

Add an arbitrary tag with unique values so InfluxDB reads the duplicate points as unique.

For example, add a uniq tag to each data point:

  1. # Existing point
  2. web,host=host2,region=us_west,uniq=1 firstByte=24.0,dnsLookup=7.0 1559260800000000000
  3. # New point
  4. web,host=host2,region=us_west,uniq=2 firstByte=15.0 1559260800000000000

It is not necessary to retroactively add the unique tag to the existing data point. Tag sets are evaluated as a whole. The arbitrary uniq tag on the new point allows InfluxDB to recognize it as a unique point. However, this causes the schema of the two points to differ and may lead to challenges when querying the data.

After writing the new point to InfluxDB:

  1. from(bucket: "example-bucket")
  2. |> range(start: 2019-05-31T00:00:00Z, stop: 2019-05-31T12:00:00Z)
  3. |> filter(fn: (r) => r._measurement == "web")
  4. Table: keys: [_measurement, host, region, uniq]
  5. _time _measurement host region uniq firstByte dnsLookup
  6. -------------------- ------------ ----- ------- ---- --------- ---------
  7. 2019-05-31T00:00:00Z web host2 us_west 1 24 7
  8. Table: keys: [_measurement, host, region, uniq]
  9. _time _measurement host region uniq firstByte
  10. -------------------- ------------ ----- ------- ---- ---------
  11. 2019-05-31T00:00:00Z web host2 us_west 2 15

Increment the timestamp

Increment the timestamp by a nanosecond to enforce the uniqueness of each point.

  1. # Old data point
  2. web,host=host2,region=us_west firstByte=24.0,dnsLookup=7.0 1559260800000000000
  3. # New data point
  4. web,host=host2,region=us_west firstByte=15.0 1559260800000000001

After writing the new point to InfluxDB:

  1. from(bucket: "example-bucket")
  2. |> range(start: 2019-05-31T00:00:00Z, stop: 2019-05-31T12:00:00Z)
  3. |> filter(fn: (r) => r._measurement == "web")
  4. Table: keys: [_measurement, host, region]
  5. _time _measurement host region firstByte dnsLookup
  6. ------------------------------ ------------ ----- ------- --------- ---------
  7. 2019-05-31T00:00:00.000000000Z web host2 us_west 24 7
  8. 2019-05-31T00:00:00.000000001Z web host2 us_west 15

The output of examples queries in this article has been modified to clearly show the different approaches and results for handling duplicate data.

best practices write