Comparison to alternatives

Prometheus vs. Graphite

Scope

Graphite focuses on being apassive time series database with a query language and graphing features. Anyother concerns are addressed by external components.

Prometheus is a full monitoring and trending system that includes built-in andactive scraping, storing, querying, graphing, and alerting based on time seriesdata. It has knowledge about what the world should look like (which endpointsshould exist, what time series patterns mean trouble, etc.), and actively triesto find faults.

Data model

Graphite stores numeric samples for named time series, much like Prometheusdoes. However, Prometheus's metadata model is richer: while Graphite metricnames consist of dot-separated components which implicitly encode dimensions,Prometheus encodes dimensions explicitly as key-value pairs, called labels, attachedto a metric name. This allows easy filtering, grouping, and matching by theselabels via the query language.

Further, especially when Graphite is used in combination withStatsD, it is common to store onlyaggregated data over all monitored instances, rather than preserving theinstance as a dimension and being able to drill down into individualproblematic instances.

For example, storing the number of HTTP requests to API servers with theresponse code 500 and the method POST to the /tracks endpoint wouldcommonly be encoded like this in Graphite/StatsD:

  1. stats.api-server.tracks.post.500 -> 93

In Prometheus the same data could be encoded like this (assuming three api-server instances):

  1. api_server_http_requests_total{method="POST",handler="/tracks",status="500",instance="<sample1>"} -> 34
  2. api_server_http_requests_total{method="POST",handler="/tracks",status="500",instance="<sample2>"} -> 28
  3. api_server_http_requests_total{method="POST",handler="/tracks",status="500",instance="<sample3>"} -> 31

Storage

Graphite stores time series data on local disk in theWhisper format, anRRD-style database that expects samples to arrive at regular intervals. Everytime series is stored in a separate file, and new samples overwrite old onesafter a certain amount of time.

Prometheus also creates one local file per time series, but allows storingsamples at arbitrary intervals as scrapes or rule evaluations occur. Since newsamples are simply appended, old data may be kept arbitrarily long. Prometheusalso works well for many short-lived, frequently changing sets of time series.

Summary

Prometheus offers a richer data model and query language, in addition to beingeasier to run and integrate into your environment. If you want a clusteredsolution that can hold historical data long term, Graphite may be a betterchoice.

Prometheus vs. InfluxDB

InfluxDB is an open-source time series database,with a commercial option for scaling and clustering. The InfluxDB project wasreleased almost a year after Prometheus development began, so we were unable toconsider it as an alternative at the time. Still, there are significantdifferences between Prometheus and InfluxDB, and both systems are gearedtowards slightly different use cases.

Scope

For a fair comparison, we must also considerKapacitor together with InfluxDB, asin combination they address the same problem space as Prometheus and theAlertmanager.

The same scope differences as in the case ofGraphite apply here for InfluxDB itself. In additionInfluxDB offers continuous queries, which are equivalent to Prometheusrecording rules.

Kapacitor’s scope is a combination of Prometheus recording rules, alertingrules, and the Alertmanager's notification functionality. Prometheus offers amore powerful query language for graphing andalerting.The Prometheus Alertmanager additionally offers grouping, deduplication andsilencing functionality.

Data model / storage

Like Prometheus, the InfluxDB data model has key-value pairs as labels, whichare called tags. In addition, InfluxDB has a second level of labels calledfields, which are more limited in use. InfluxDB supports timestamps with up tonanosecond resolution, and float64, int64, bool, and string data types.Prometheus, by contrast, supports the float64 data type with limited support forstrings, and millisecond resolution timestamps.

InfluxDB uses a variant of a log-structured merge tree for storage with a write ahead log,sharded by time. This is much more suitable to event logging than Prometheus'sappend-only file per time series approach.

Logs and Metrics and Graphs, Oh My!describes the differences between event logging and metrics recording.

Architecture

Prometheus servers run independently of each other and only rely on their localstorage for their core functionality: scraping, rule processing, and alerting.The open source version of InfluxDB is similar.

The commercial InfluxDB offering is, by design, a distributed storage clusterwith storage and queries being handled by many nodes at once.

This means that the commercial InfluxDB will be easier to scale horizontally,but it also means that you have to manage the complexity of a distributedstorage system from the beginning. Prometheus will be simpler to run, but atsome point you will need to shard servers explicitly along scalabilityboundaries like products, services, datacenters, or similar aspects.Independent servers (which can be run redundantly in parallel) may also giveyou better reliability and failure isolation.

Kapacitor's open-source release has no built-in distributed/redundant options for rules, alerting, or notifications. The open-source release of Kapacitor can be scaled via manual sharding by the user, similar to Prometheus itself.Influx offers Enterprise Kapacitor, which supports an HA/redundant alerting system.

Prometheus and the Alertmanager by contrast offer a fully open-source redundant option via running redundant replicas of Prometheus and using the Alertmanager's High Availabilitymode.

Summary

There are many similarities between the systems. Both have labels (called tagsin InfluxDB) to efficiently support multi-dimensional metrics. Both usebasically the same data compression algorithms. Both have extensiveintegrations, including with each other. Both have hooks allowing you to extendthem further, such as analyzing data in statistical tools or performingautomated actions.

Where InfluxDB is better:

  • If you're doing event logging.
  • Commercial option offers clustering for InfluxDB, which is also better for long term data storage.
  • Eventually consistent view of data between replicas.Where Prometheus is better:

  • If you're primarily doing metrics.

  • More powerful query language, alerting, and notification functionality.
  • Higher availability and uptime for graphing and alerting.InfluxDB is maintained by a single commercial company following the open-coremodel, offering premium features like closed-source clustering, hosting andsupport. Prometheus is a fully open source and independent project, maintainedby a number of companies and individuals, some of whom also offer commercial services and support.

Prometheus vs. OpenTSDB

OpenTSDB is a distributed time series database based onHadoop and HBase.

Scope

The same scope differences as in the case ofGraphite apply here.

Data model

OpenTSDB's data model is almost identical to Prometheus's: time series areidentified by a set of arbitrary key-value pairs (OpenTSDB tags arePrometheus labels). All data for a metric is stored together,limiting the cardinality of metrics. There are minor differences though: Prometheusallows arbitrary characters in label values, while OpenTSDB is more restrictive. OpenTSDB also lacks a full query language, only allowing simple aggregation and math via its API.

Storage

OpenTSDB's storage is implemented on top ofHadoop and HBase. Thismeans that it is easy to scale OpenTSDB horizontally, but you have to acceptthe overall complexity of running a Hadoop/HBase cluster from the beginning.

Prometheus will be simpler to run initially, but will require explicit shardingonce the capacity of a single node is exceeded.

Summary

Prometheus offers a much richer query language, can handle higher cardinalitymetrics, and forms part of a complete monitoring system. If you're alreadyrunning Hadoop and value long term storage over these benefits, OpenTSDB is agood choice.

Prometheus vs. Nagios

Nagios is a monitoring system that originated in the1990s as NetSaint.

Scope

Nagios is primarily about alerting based on the exit codes of scripts. These are called “checks”. There is silencing of individual alerts, however no grouping, routing or deduplication.

There are a variety of plugins. For example, piping the few kilobytes ofperfData plugins are allowed to return to a time series database such as Graphite or using NRPE to run checks on remote machines.

Data model

Nagios is host-based. Each host can have one or more services and each servicecan perform one check.

There is no notion of labels or a query language.

Storage

Nagios has no storage per-se, beyond the current check state.There are plugins which can store data such as for visualisation.

Architecture

Nagios servers are standalone. All configuration of checks is via file.

Summary

Nagios is suitable for basic monitoring of small and/or static systems whereblackbox probing is sufficient.

If you want to do whitebox monitoring, or have a dynamic or cloud basedenvironment, then Prometheus is a good choice.

Prometheus vs. Sensu

Sensu is a composable monitoring pipeline that can reuse existing Nagios checks.

Scope

The same general scope differences as in the case of Nagios apply here.

There is also a client socket permitting ad-hoc check results to be pushed into Sensu.

Data model

Sensu has the same rough data model as Nagios.

Storage

Sensu uses Redis to persist monitoring data, including the Sensu client registry, check results, check execution history, and current event data.

Architecture

Sensu has a number of components. It usesRabbitMQ as a transport, Redis for current state, and a separate server forprocessing and API access.

All components of a Sensu deployment (RabbitMQ, Redis, and Sensu Server/API) can be clustered for highly available and redundant configurations.

Summary

If you have an existing Nagios setup that you wish to scale as-is, or want to take advantage of the automatic registration feature of Sensu, then Sensu is a good choice.

If you want to do whitebox monitoring, or have a very dynamic or cloud based environment, then Prometheus is a good choice.