InfluxDB design insights and tradeoffs

InfluxDB is a time series database.Optimizing for this use case entails some tradeoffs, primarily to increase performance at the cost of functionality.Below is a list of some of those design insights that lead to tradeoffs:

  • For the time series use case, we assume that if the same data is sent multiple times, it is the exact same data that a client just sent several times.

Pro: Simplified conflict resolution increases write performance.Con: Cannot store duplicate data; may overwrite data in rare circumstances.

  • Deletes are a rare occurrence.When they do occur it is almost always against large ranges of old data that are cold for writes.

Pro: Restricting access to deletes allows for increased query and write performance.Con: Delete functionality is significantly restricted.

  • Updates to existing data are a rare occurrence and contentious updates never happen.Time series data is predominantly new data that is never updated.

Pro: Restricting access to updates allows for increased query and write performance.Con: Update functionality is significantly restricted.

  • The vast majority of writes are for data with very recent timestamps and the data is added in time ascending order.

Pro: Adding data in time ascending order is significantly more performant.Con: Writing points with random times or with time not in ascending order is significantly less performant.

  • Scale is critical.The database must be able to handle a high volume of reads and writes.

Pro: The database can handle a high volume of reads and writes.Con: The InfluxDB development team was forced to make tradeoffs to increase performance.

  • Being able to write and query the data is more important than having a strongly consistent view.

Pro: Writing and querying the database can be done by multiple clients and at high loads.Con: Query returns may not include the most recent points if database is under heavy load.

  • Many time series are ephemeral.There are often time series that appear only for a few hours and then go away, e.g.a new host that gets started and reports for a while and then gets shut down.

Pro: InfluxDB is good at managing discontinuous data.Con: Schema-less design means that some database functions are not supported e.g. there are no cross table joins.

  • No one point is too important.

Pro: InfluxDB has very powerful tools to deal with aggregate data and large data sets.Con: Points don’t have IDs in the traditional sense, they are differentiated by timestamp and series.