Remote write tuning

Prometheus implements sane defaults for remote write, but many users havedifferent requirements and would like to optimize their remote settings.

This page describes the tuning parameters available via the remote writeconfiguration.

Remote write characteristics

Each remote write destination starts a queue which reads from the write-aheadlog (WAL), writes the samples into an in memory queue owned by a shard, whichthen sends a request to the configured endpoint. The flow of data looks like:

  1. |--> queue (shard_1) --> remote endpoint
  2. WAL --|--> queue (shard_...) --> remote endpoint
  3. |--> queue (shard_n) --> remote endpoint

When one shard backs up and fills its queue, Prometheus will block reading fromthe WAL into any shards. Failures will be retried without loss of data unlessthe remote endpoint remains down for more than 2 hours. After 2 hours, the WALwill be compacted and data that has not been sent will be lost.

During operation, Prometheus will continuously calculate the optimal number ofshards to use based on the incoming sample rate, number of outstanding samplesnot sent, and time taken to send each sample.

Memory usage

Using remote write increases the memory footprint of Prometheus. Most usersreport ~25% increased memory usage, but that number is dependent on the shapeof the data. For each series in the WAL, the remote write code caches a mappingof series ID to label values, causing large amounts of series churn tosignificantly increase memory usage.

In addition to the series cache, each shard and its queue increases memoryusage. Shard memory is proportional to the number of shards * (capacity +max_samples_per_send). When tuning, consider reducing max_shards alongsideincreases to capacity and max_samples_per_send to avoid inadvertantlyrunning out of memory. The default values for capacity andmax_samples_per_send will constrain shard memory usage to less than 100 kB pershard.

Parameters

All the relevant parameters are found under the queue_config section of theremote write configuration.

capacity

Capacity controls how many samples are queued in memory per shard beforeblocking reading from the WAL. Once the WAL is blocked, samples cannot beappended to any shards and all throughput will cease.

Capacity should be high enough to avoid blocking other shards in mostcases, but too much capacity can cause excess memory consumption and longertimes to clear queues during resharding. It is recommended to set capacityto 3-10 times max_samples_per_send.

max_shards

Max shards configures the maximum number of shards, or parallelism, Prometheuswill use for each remote write queue. Prometheus will try not to use too manyshards, but if the queue falls behind the remote write component will increasethe number of shards up to max shards to increase thoughput. Unless remotewriting to a very slow endpoint, it is unlikely that max_shards should beincreased beyond the default. However, it may be necessary to reduce max shardsif there is potential to overwhelm the remote endpoint, or to reduce memoryusage when data is backed up.

min_shards

Min shards configures the minimum number of shards used by Prometheus, and isthe number of shards used when remote write starts. If remote write fallsbehind, Prometheus will automatically scale up the number of shards so mostusers do not have to adjust this parameter. However, increasing min shards willallow Prometheus to avoid falling behind at the beginning while calculating therequired number of shards.

max_samples_per_send

Max samples per send can be adjusted depending on the backend in use. Manysystems work very well by sending more samples per batch without a significantincrease in latency. Other backends will have issues if trying to send a largenumber of samples in each request. The default value is small enough to work formost systems.

batch_send_deadline

Batch send deadline sets the maximum amount of time between sends for a singleshard. Even if the queued shards has not reached max_samples_per_send, arequest will be sent. Batch send deadline can be increased for low volumesystems that are not latency sensitive in order to increase request efficiency.

min_backoff

Min backoff controls the minimum amount of time to wait before retrying a failedrequest. Increasing the backoff spreads out requests when a remote endpointcomes back online. The backoff interval is doubled for each failed requests upto max_backoff.

max_backoff

Max backoff controls the maximum amount of time to wait before retrying a failedrequest.