Troubleshooting

This section contains some ideas for troubleshooting common problems experienced with Promscale.

Backing up or upgrading Promscale gives an out-of-memory error

Promscale creates a table for each metric. When you run pg_dump and pg_restore, they lock each table to work on it. This can require a lot of locks in a single transaction, which can quickly grow higher than the maximum number of locks allowed. If the number of locks required exceeds the number allowed, Promscale gives an out of shared memory error, and the action is aborted.

To fix this error, increase the max_locks_per_transaction setting in the PostgreSQL configuration file, so that pg_dump and pg_restore can create the required number of locks, and the backup or restore can continue.

important

You need to restart PostgreSQL to pick up changes to the max_locks_per_transaction parameter.

You can use this query to calculate the new minimum value to use for max_locks_per_transaction:

  1. WITH max_objects_touched AS (
  2. SELECT 400 + count(*) as max_objects_touched
  3. FROM pg_class
  4. JOIN pg_namespace n ON n.oid = pg_class.relnamespace
  5. WHERE n.nspname <> 'pg_catalog'
  6. AND n.nspname !~ '^pg_toast'
  7. AND n.nspname <> 'information_schema'
  8. ),
  9. max_conns AS (
  10. SELECT SUM(setting::INTEGER) as max_conns FROM pg_settings WHERE name IN ('max_connections', 'max_prepared_transactions')
  11. )
  12. SELECT max_objects_touched/max_conns AS max_locks_per_transaction FROM max_objects_touched, max_conns;

Data takes up too much disk space

Promscale keeps metric data in chunks. The most recent chunk is kept uncompressed as a cache for faster querying, and chunks are compressed as they age. If your data is taking up more disk space than expected, compression might not be working correctly. If compression is not working correctly, older chunks might not be getting compressed, and they might be taking up too much room on disk. To check that compression is working correctly, query the prom_info.metric view and make sure that total_chunks-number_compressed_chunks is not larger than 2.

If compression is not working correctly, make sure that you have enough background workers to do scheduled jobs like compression and retention. To calculate the number of background workers you require, start with the number of databases you have in your environment, plus 2. If you are using TimescaleDB version 2.0.0 or earlier, make sure that you are running the maintenance cron jobs, and that they are returning success. When compression is working correctly, your data is being compressed as it ages.

If compression is working properly, then do the following:

  • Check you are using the most recent version of Promscale, to ensure you have the latest features.
  • Consider reducing the length of time that you retain data for.
  • Reduce the chunk interval to 4 hours or 2 hours. This could slow down query performance.

PromQL dashboard doesn’t show Promscale data

Backfilling data which has not been ingested into Prometheus can cause the PromQL dashboard to not show Promscale data.

To fix this error, enable the read_recent option for Promscale in the Prometheus configuration’s remote_read section. This ensures that backfilled data is shown in the PromQL dashboard:

  1. remote_read:
  2. - url: "http://<CONNECTOR-ADDRESS>:9201/read"
  3. read_recent: true

For more information, see the Prometheus remote read documentation.

Tracing span data is missing

`

  1. 2021-10-08T12:34:00.360Z warn batchprocessor/batch_processor.go:184 Sender failed {"kind": "processor", "name": "batch", "error": "sending_queue is full"}

`

`

  1. 2021-10-10T18:49:23.304Z info exporterhelper/queued_retry.go:325 Exporting failed. Will retry the request after interval. {"kind": "exporter", "name": "otlp", "error": "failed to push trace data via OTLP exporter: rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "5.872756134s"}

`

If OpenTelemetry Tracing is not recording all Promscale spans, it is usually due to an error in the batch processor. The trace could be attempting to send batches of spans that are larger than allowed, so the span is not sent.

To fix this error, check the OpenTelemetry logs for the errors above.

If you are getting these errors, reduce the send_batch_size and send_batch_max_size in the OpenTelemetry configuration file. This means that OpenTelemetry can send batches of spans, even if they are large, and the spans are shown as expected in the trace.

For more information, see the batch processor documentation.