When to use the Pushgateway

The Pushgateway is an intermediary service which allows you to push metricsfrom jobs which cannot be scraped. For details, see Pushing metrics.

Should I be using the Pushgateway?

We only recommend using the Pushgateway in certain limited cases. There areseveral pitfalls when blindly using the Pushgateway instead of Prometheus'susual pull model for general metrics collection:

  • When monitoring multiple instances through a single Pushgateway, thePushgateway becomes both a single point of failure and a potentialbottleneck.
  • You lose Prometheus's automatic instance health monitoring via the upmetric (generated on every scrape).
  • The Pushgateway never forgets series pushed to it and will expose them toPrometheus forever unless those series are manually deleted via thePushgateway's API.The latter point is especially relevant when multiple instances of a jobdifferentiate their metrics in the Pushgateway via an instance label orsimilar. Metrics for an instance will then remain in the Pushgateway even ifthe originating instance is renamed or removed. This is because the lifecycleof the Pushgateway as a metrics cache is fundamentally separate from thelifecycle of the processes that push metrics to it. Contrast this toPrometheus's usual pull-style monitoring: when an instance disappears(intentional or not), its metrics will automatically disappear along with it.When using the Pushgateway, this is not the case, and you would now have todelete any stale metrics manually or automate this lifecycle synchronizationyourself.

Usually, the only valid use case for the Pushgateway is for capturing theoutcome of a service-level batch job. A "service-level" batch job is onewhich is not semantically related to a specific machine or job instance (forexample, a batch job that deletes a number of users for an entire service).Such a job's metrics should not include a machine or instance label to decouplethe lifecycle of specific machines or instances from the pushed metrics. Thisdecreases the burden for managing stale metrics in the Pushgateway. See alsothe best practices for monitoring batch jobs.

Alternative strategies

If an inbound firewall or NAT is preventing you from pulling metrics fromtargets, consider moving the Prometheus server behind the network barrier aswell. We generally recommend running Prometheus servers on the same network asthe monitored instances.

For batch jobs that are related to a machine (such as automaticsecurity update cronjobs or configuration management client runs), expose theresulting metrics using the Node Exporter'stextfile module instead of the Pushgateway.