Histograms and summaries

Histograms and summaries are more complex metric types. Not only doesa single histogram or summary create a multitude of time series, it isalso more difficult to use these metric types correctly. This sectionhelps you to pick and configure the appropriate metric type for youruse case.

Library support

First of all, check the library support forhistograms andsummaries.

Some libraries support only one of the two types, or they support summariesonly in a limited fashion (lacking quantile calculation).

Count and sum of observations

Histograms and summaries both sample observations, typically requestdurations or response sizes. They track the number of observationsand the sum of the observed values, allowing you to calculate theaverage of the observed values. Note that the number of observations(showing up in Prometheus as a time series with a _count suffix) isinherently a counter (as described above, it only goes up). The sum ofobservations (showing up as a time series with a _sum suffix)behaves like a counter, too, as long as there are no negativeobservations. Obviously, request durations or response sizes arenever negative. In principle, however, you can use summaries andhistograms to observe negative values (e.g. temperatures incentigrade). In that case, the sum of observations can go down, so youcannot apply rate() to it anymore.

To calculate the average request duration during the last 5 minutesfrom a histogram or summary called http_request_duration_seconds,use the following expression:

  1. rate(http_request_duration_seconds_sum[5m])
  2. /
  3. rate(http_request_duration_seconds_count[5m])

Apdex score

A straight-forward use of histograms (but not summaries) is to countobservations falling into particular buckets of observationvalues.

You might have an SLA to serve 95% of requests within 300ms. In thatcase, configure a histogram to have a bucket with an upper limit of0.3 seconds. You can then directly express the relative amount ofrequests served within 300ms and easily alert if the value drops below0.95. The following expression calculates it by job for the requestsserved in the last 5 minutes. The request durations were collected witha histogram called http_request_duration_seconds.

  1. sum(rate(http_request_duration_seconds_bucket{le="0.3"}[5m])) by (job)
  2. /
  3. sum(rate(http_request_duration_seconds_count[5m])) by (job)

You can approximate the well-known Apdexscore in a similar way. Configurea bucket with the target request duration as the upper bound andanother bucket with the tolerated request duration (usually 4 timesthe target request duration) as the upper bound. Example: The targetrequest duration is 300ms. The tolerable request duration is 1.2s. Thefollowing expression yields the Apdex score for each job over the last5 minutes:

  1. (
  2. sum(rate(http_request_duration_seconds_bucket{le="0.3"}[5m])) by (job)
  3. +
  4. sum(rate(http_request_duration_seconds_bucket{le="1.2"}[5m])) by (job)
  5. ) / 2 / sum(rate(http_request_duration_seconds_count[5m])) by (job)

Note that we divide the sum of both buckets. The reason is that the histogrambuckets arecumulative. Thele="0.3" bucket is also contained in the le="1.2" bucket; dividing it by 2corrects for that.

The calculation does not exactly match the traditional Apdex score, as itincludes errors in the satisfied and tolerable parts of the calculation.

Quantiles

You can use both summaries and histograms to calculate so-called φ-quantiles,where 0 ≤ φ ≤ 1. The φ-quantile is the observation value that ranks at numberφ*N among the N observations. Examples for φ-quantiles: The 0.5-quantile isknown as the median. The 0.95-quantile is the 95th percentile.

The essential difference between summaries and histograms is that summariescalculate streaming φ-quantiles on the client side and expose them directly,while histograms expose bucketed observation counts and the calculation ofquantiles from the buckets of a histogram happens on the server side using thehistogram_quantile()function.

The two approaches have a number of different implications:

HistogramSummary
Required configurationPick buckets suitable for the expected range of observed values.Pick desired φ-quantiles and sliding window. Other φ-quantiles and sliding windows cannot be calculated later.
Client performanceObservations are very cheap as they only need to increment counters.Observations are expensive due to the streaming quantile calculation.
Server performanceThe server has to calculate quantiles. You can use recording rules should the ad-hoc calculation take too long (e.g. in a large dashboard).Low server-side cost.
Number of time series (in addition to the _sum and _count series)One time series per configured bucket.One time series per configured quantile.
Quantile error (see below for details)Error is limited in the dimension of observed values by the width of the relevant bucket.Error is limited in the dimension of φ by a configurable value.
Specification of φ-quantile and sliding time-windowAd-hoc with Prometheus expressions.Preconfigured by the client.
AggregationAd-hoc with Prometheus expressions.In general not aggregatable.

Note the importance of the last item in the table. Let us return tothe SLA of serving 95% of requests within 300ms. This time, you do notwant to display the percentage of requests served within 300ms, butinstead the 95th percentile, i.e. the request duration within whichyou have served 95% of requests. To do that, you can either configurea summary with a 0.95-quantile and (for example) a 5-minute decaytime, or you configure a histogram with a few buckets around the 300msmark, e.g. {le="0.1"}, {le="0.2"}, {le="0.3"}, and{le="0.45"}. If your service runs replicated with a number ofinstances, you will collect request durations from every single one ofthem, and then you want to aggregate everything into an overall 95thpercentile. However, aggregating the precomputed quantiles from asummary rarely makes sense. In this particular case, averaging thequantiles yields statistically nonsensical values.

  1. avg(http_request_duration_seconds{quantile="0.95"}) // BAD!

Using histograms, the aggregation is perfectly possible with thehistogram_quantile()function.

  1. histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) // GOOD.

Furthermore, should your SLA change and you now want to plot the 90thpercentile, or you want to take into account the last 10 minutesinstead of the last 5 minutes, you only have to adjust the expressionabove and you do not need to reconfigure the clients.

Errors of quantile estimation

Quantiles, whether calculated client-side or server-side, areestimated. It is important to understand the errors of thatestimation.

Continuing the histogram example from above, imagine your usualrequest durations are almost all very close to 220ms, or in otherwords, if you could plot the "true" histogram, you would see a verysharp spike at 220ms. In the Prometheus histogram metric as configuredabove, almost all observations, and therefore also the 95th percentile,will fall into the bucket labeled {le="0.3"}, i.e. the bucket from200ms to 300ms. The histogram implementation guarantees that the true95th percentile is somewhere between 200ms and 300ms. To return asingle value (rather than an interval), it applies linearinterpolation, which yields 295ms in this case. The calculatedquantile gives you the impression that you are close to breaking theSLA, but in reality, the 95th percentile is a tiny bit above 220ms,a quite comfortable distance to your SLA.

Next step in our thought experiment: A change in backend routingadds a fixed amount of 100ms to all request durations. Now the requestduration has its sharp spike at 320ms and almost all observations willfall into the bucket from 300ms to 450ms. The 95th percentile iscalculated to be 442.5ms, although the correct value is close to320ms. While you are only a tiny bit outside of your SLA, thecalculated 95th quantile looks much worse.

A summary would have had no problem calculating the correct percentilevalue in both cases, at least if it uses an appropriate algorithm onthe client side (like the one used by the Goclient). Unfortunately,you cannot use a summary if you need to aggregate the observationsfrom a number of instances.

Luckily, due to your appropriate choice of bucket boundaries, even inthis contrived example of very sharp spikes in the distribution ofobserved values, the histogram was able to identify correctly if youwere within or outside of your SLA. Also, the closer the actual valueof the quantile is to our SLA (or in other words, the value we areactually most interested in), the more accurate the calculated valuebecomes.

Let us now modify the experiment once more. In the new setup, thedistributions of request durations has a spike at 150ms, but it is notquite as sharp as before and only comprises 90% of theobservations. 10% of the observations are evenly spread out in a longtail between 150ms and 450ms. With that distribution, the 95thpercentile happens to be exactly at our SLA of 300ms. With thehistogram, the calculated value is accurate, as the value of the 95thpercentile happens to coincide with one of the bucket boundaries. Evenslightly different values would still be accurate as the (contrived)even distribution within the relevant buckets is exactly what thelinear interpolation within a bucket assumes.

The error of the quantile reported by a summary gets more interestingnow. The error of the quantile in a summary is configured in thedimension of φ. In our case we might have configured 0.95±0.01,i.e. the calculated value will be between the 94th and 96thpercentile. The 94th quantile with the distribution described above is270ms, the 96th quantile is 330ms. The calculated value of the 95thpercentile reported by the summary can be anywhere in the intervalbetween 270ms and 330ms, which unfortunately is all the differencebetween clearly within the SLA vs. clearly outside the SLA.

The bottom line is: If you use a summary, you control the error in thedimension of φ. If you use a histogram, you control the error in thedimension of the observed value (via choosing the appropriate bucketlayout). With a broad distribution, small changes in φ result inlarge deviations in the observed value. With a sharp distribution, asmall interval of observed values covers a large interval of φ.

Two rules of thumb:

  • If you need to aggregate, choose histograms.

  • Otherwise, choose a histogram if you have an idea of the range and distribution of values that will be observed. Choose a summary if you need an accurate quantile, no matter what the range and distribution of the values is.

What can I do if my client library does not support the metric type I need?

Implement it! Code contributions are welcome. In general, weexpect histograms to be more urgently needed than summaries. Histograms arealso easier to implement in a client library, so we recommend to implementhistograms first, if in doubt.