Querying Prometheus

Prometheus provides a functional query language called PromQL (Prometheus QueryLanguage) that lets the user select and aggregate time series data in realtime. The result of an expression can either be shown as a graph, viewed astabular data in Prometheus's expression browser, or consumed by externalsystems via the HTTP API.

Examples

This document is meant as a reference. For learning, it might be easier tostart with a couple of examples.

Expression language data types

In Prometheus's expression language, an expression or sub-expression canevaluate to one of four types:

  • Instant vector - a set of time series containing a single sample for each time series, all sharing the same timestamp
  • Range vector - a set of time series containing a range of data points over time for each time series
  • Scalar - a simple numeric floating point value
  • String - a simple string value; currently unusedDepending on the use-case (e.g. when graphing vs. displaying the output of anexpression), only some of these types are legal as the result from auser-specified expression. For example, an expression that returns an instantvector is the only type that can be directly graphed.

Literals

String literals

Strings may be specified as literals in single quotes, double quotes orbackticks.

PromQL follows the same escaping rules asGo. In single or double quotes abackslash begins an escape sequence, which may be followed by a, b, f,n, r, t, v or \. Specific characters can be provided using octal(\nnn) or hexadecimal (\xnn, \unnnn and \Unnnnnnnn).

No escaping is processed inside backticks. Unlike Go, Prometheus does not discard newlines inside backticks.

Example:

  1. "this is a string"
  2. 'these are unescaped: \n \\ \t'
  3. `these are not unescaped: \n ' " \t`

Float literals

Scalar float values can be literally written as numbers of the form-[.(digits)].

  1. -2.43

Time series Selectors

Instant vector selectors

Instant vector selectors allow the selection of a set of time series and asingle sample value for each at a given timestamp (instant): in the simplestform, only a metric name is specified. This results in an instant vectorcontaining elements for all time series that have this metric name.

This example selects all time series that have the http_requests_total metricname:

  1. http_requests_total

It is possible to filter these time series further by appending a set of labelsto match in curly braces ({}).

This example selects only those time series with the http_requests_totalmetric name that also have the job label set to prometheus and theirgroup label set to canary:

  1. http_requests_total{job="prometheus",group="canary"}

It is also possible to negatively match a label value, or to match label valuesagainst regular expressions. The following label matching operators exist:

  • =: Select labels that are exactly equal to the provided string.
  • !=: Select labels that are not equal to the provided string.
  • =~: Select labels that regex-match the provided string.
  • !~: Select labels that do not regex-match the provided string.For example, this selects all http_requests_total time series for staging,testing, and development environments and HTTP methods other than GET.
  1. http_requests_total{environment=~"staging|testing|development",method!="GET"}

Label matchers that match empty label values also select all time series thatdo not have the specific label set at all. Regex-matches are fully anchored. Itis possible to have multiple matchers for the same label name.

Vector selectors must either specify a name or at least one label matcherthat does not match the empty string. The following expression is illegal:

  1. {job=~".*"} # Bad!

In contrast, these expressions are valid as they both have a selector that does notmatch empty label values.

  1. {job=~".+"} # Good!
  2. {job=~".*",method="get"} # Good!

Label matchers can also be applied to metric names by matching against the internalname label. For example, the expression httprequeststotal is equivalent to{__name="http_requests_total"}. Matchers other than = (!=, =~, !~) may also be used.The following expression selects all metrics that have a name starting with job::

  1. {__name__=~"job:.*"}

All regular expressions in Prometheus use RE2syntax.

Range Vector Selectors

Range vector literals work like instant vector literals, except that theyselect a range of samples back from the current instant. Syntactically, a rangeduration is appended in square brackets ([]) at the end of a vector selectorto specify how far back in time values should be fetched for each resultingrange vector element.

Time durations are specified as a number, followed immediately by one of thefollowing units:

  • s - seconds
  • m - minutes
  • h - hours
  • d - days
  • w - weeks
  • y - yearsIn this example, we select all the values we have recorded within the last 5minutes for all time series that have the metric name http_requests_total anda job label set to prometheus:
  1. http_requests_total{job="prometheus"}[5m]

Offset modifier

The offset modifier allows changing the time offset for individualinstant and range vectors in a query.

For example, the following expression returns the value ofhttp_requests_total 5 minutes in the past relative to the currentquery evaluation time:

  1. http_requests_total offset 5m

Note that the offset modifier always needs to follow the selectorimmediately, i.e. the following would be correct:

  1. sum(http_requests_total{method="GET"} offset 5m) // GOOD.

While the following would be incorrect:

  1. sum(http_requests_total{method="GET"}) offset 5m // INVALID.

The same works for range vectors. This returns the 5-minutes rate thathttp_requests_total had a week ago:

  1. rate(http_requests_total[5m] offset 1w)

Subquery

Subquery allows you to run an instant query for a given range and resolution. The result of a subquery is a range vector.

Syntax: <instant_query> '[' <range> ':' [<resolution>] ']' [ offset <duration> ]

  • <resolution> is optional. Default is the global evaluation interval.

Operators

Prometheus supports many binary and aggregation operators. These are describedin detail in the expression language operators page.

Functions

Prometheus supports several functions to operate on data. These are describedin detail in the expression language functions page.

Gotchas

Staleness

When queries are run, timestamps at which to sample data are selectedindependently of the actual present time series data. This is mainly to supportcases like aggregation (sum, avg, and so on), where multiple aggregatedtime series do not exactly align in time. Because of their independence,Prometheus needs to assign a value at those timestamps for each relevant timeseries. It does so by simply taking the newest sample before this timestamp.

If a target scrape or rule evaluation no longer returns a sample for a timeseries that was previously present, that time series will be marked as stale.If a target is removed, its previously returned time series will be marked asstale soon afterwards.

If a query is evaluated at a sampling timestamp after a time series is markedstale, then no value is returned for that time series. If new samples aresubsequently ingested for that time series, they will be returned as normal.

If no sample is found (by default) 5 minutes before a sampling timestamp,no value is returned for that time series at this point in time. Thiseffectively means that time series "disappear" from graphs at times where theirlatest collected sample is older than 5 minutes or after they are marked stale.

Staleness will not be marked for time series that have timestamps included intheir scrapes. Only the 5 minute threshold will be applied in that case.

Avoiding slow queries and overloads

If a query needs to operate on a very large amount of data, graphing it mighttime out or overload the server or browser. Thus, when constructing queriesover unknown data, always start building the query in the tabular view ofPrometheus's expression browser until the result set seems reasonable(hundreds, not thousands, of time series at most). Only when you have filteredor aggregated your data sufficiently, switch to graph mode. If the expressionstill takes too long to graph ad-hoc, pre-record it via a recordingrule.

This is especially relevant for Prometheus's query language, where a baremetric name selector like api_http_requests_total could expand to thousandsof time series with different labels. Also keep in mind that expressions whichaggregate over many time series will generate load on the server even if theoutput is only a small number of time series. This is similar to how it wouldbe slow to sum all values of a column in a relational database, even if theoutput value is only a single number.