CAUTION: This page documents an old version of Prometheus. Check out the latest stable version.

Unit Testing for Rules

You can use promtool to test your rules.

  1. # For a single test file.
  2. ./promtool test rules test.yml
  3. # If you have multiple test files, say test1.yml,test2.yml,test2.yml
  4. ./promtool test rules test1.yml test2.yml test3.yml

Test file format

  1. # This is a list of rule files to consider for testing. Globs are supported.
  2. rule_files:
  3. [ - <file_name> ]
  4. [ evaluation_interval: <duration> | default = 1m ]
  5. # The order in which group names are listed below will be the order of evaluation of
  6. # rule groups (at a given evaluation time). The order is guaranteed only for the groups mentioned below.
  7. # All the groups need not be mentioned below.
  8. group_eval_order:
  9. [ - <group_name> ]
  10. # All the tests are listed here.
  11. tests:
  12. [ - <test_group> ]

<test_group>

  1. # Series data
  2. interval: <duration>
  3. input_series:
  4. [ - <series> ]
  5. # Name of the test group
  6. [ name: <string> ]
  7. # Unit tests for the above data.
  8. # Unit tests for alerting rules. We consider the alerting rules from the input file.
  9. alert_rule_test:
  10. [ - <alert_test_case> ]
  11. # Unit tests for PromQL expressions.
  12. promql_expr_test:
  13. [ - <promql_test_case> ]
  14. # External labels accessible to the alert template.
  15. external_labels:
  16. [ <labelname>: <string> ... ]
  17. # External URL accessible to the alert template.
  18. # Usually set using --web.external-url.
  19. [ external_url: <string> ]

<series>

  1. # This follows the usual series notation '<metric name>{<label name>=<label value>, ...}'
  2. # Examples:
  3. # series_name{label1="value1", label2="value2"}
  4. # go_goroutines{job="prometheus", instance="localhost:9090"}
  5. series: <string>
  6. # This uses expanding notation.
  7. # Expanding notation:
  8. # 'a+bxc' becomes 'a a+b a+(2*b) a+(3*b) … a+(c*b)'
  9. # 'a-bxc' becomes 'a a-b a-(2*b) a-(3*b) … a-(c*b)'
  10. # There are special values to indicate missing and stale samples:
  11. # '_' represents a missing sample from scrape
  12. # 'stale' indicates a stale sample
  13. # Examples:
  14. # 1. '-2+4x3' becomes '-2 2 6 10'
  15. # 2. ' 1-2x4' becomes '1 -1 -3 -5 -7'
  16. # 3. ' 1 _x3 stale' becomes '1 _ _ _ stale'
  17. values: <string>

<alert_test_case>

Prometheus allows you to have same alertname for different alerting rules. Hence in this unit testing, you have to list the union of all the firing alerts for the alertname under a single <alert_test_case>.

  1. # The time elapsed from time=0s when the alerts have to be checked.
  2. eval_time: <duration>
  3. # Name of the alert to be tested.
  4. alertname: <string>
  5. # List of expected alerts which are firing under the given alertname at
  6. # given evaluation time. If you want to test if an alerting rule should
  7. # not be firing, then you can mention the above fields and leave 'exp_alerts' empty.
  8. exp_alerts:
  9. [ - <alert> ]

<alert>

  1. # These are the expanded labels and annotations of the expected alert.
  2. # Note: labels also include the labels of the sample associated with the
  3. # alert (same as what you see in `/alerts`, without series `__name__` and `alertname`)
  4. exp_labels:
  5. [ <labelname>: <string> ]
  6. exp_annotations:
  7. [ <labelname>: <string> ]

<promql_test_case>

  1. # Expression to evaluate
  2. expr: <string>
  3. # The time elapsed from time=0s when the expression has to be evaluated.
  4. eval_time: <duration>
  5. # Expected samples at the given evaluation time.
  6. exp_samples:
  7. [ - <sample> ]

<sample>

  1. # Labels of the sample in usual series notation '<metric name>{<label name>=<label value>, ...}'
  2. # Examples:
  3. # series_name{label1="value1", label2="value2"}
  4. # go_goroutines{job="prometheus", instance="localhost:9090"}
  5. labels: <string>
  6. # The expected value of the PromQL expression.
  7. value: <number>

Example

This is an example input file for unit testing which passes the test. test.yml is the test file which follows the syntax above and alerts.yml contains the alerting rules.

With alerts.yml in the same directory, run ./promtool test rules test.yml.

test.yml

  1. # This is the main input for unit testing.
  2. # Only this file is passed as command line argument.
  3. rule_files:
  4. - alerts.yml
  5. evaluation_interval: 1m
  6. tests:
  7. # Test 1.
  8. - interval: 1m
  9. # Series data.
  10. input_series:
  11. - series: 'up{job="prometheus", instance="localhost:9090"}'
  12. values: '0 0 0 0 0 0 0 0 0 0 0 0 0 0 0'
  13. - series: 'up{job="node_exporter", instance="localhost:9100"}'
  14. values: '1+0x6 0 0 0 0 0 0 0 0' # 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
  15. - series: 'go_goroutines{job="prometheus", instance="localhost:9090"}'
  16. values: '10+10x2 30+20x5' # 10 20 30 30 50 70 90 110 130
  17. - series: 'go_goroutines{job="node_exporter", instance="localhost:9100"}'
  18. values: '10+10x7 10+30x4' # 10 20 30 40 50 60 70 80 10 40 70 100 130
  19. # Unit test for alerting rules.
  20. alert_rule_test:
  21. # Unit test 1.
  22. - eval_time: 10m
  23. alertname: InstanceDown
  24. exp_alerts:
  25. # Alert 1.
  26. - exp_labels:
  27. severity: page
  28. instance: localhost:9090
  29. job: prometheus
  30. exp_annotations:
  31. summary: "Instance localhost:9090 down"
  32. description: "localhost:9090 of job prometheus has been down for more than 5 minutes."
  33. # Unit tests for promql expressions.
  34. promql_expr_test:
  35. # Unit test 1.
  36. - expr: go_goroutines > 5
  37. eval_time: 4m
  38. exp_samples:
  39. # Sample 1.
  40. - labels: 'go_goroutines{job="prometheus",instance="localhost:9090"}'
  41. value: 50
  42. # Sample 2.
  43. - labels: 'go_goroutines{job="node_exporter",instance="localhost:9100"}'
  44. value: 50

alerts.yml

  1. # This is the rules file.
  2. groups:
  3. - name: example
  4. rules:
  5. - alert: InstanceDown
  6. expr: up == 0
  7. for: 5m
  8. labels:
  9. severity: page
  10. annotations:
  11. summary: "Instance {{ $labels.instance }} down"
  12. description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
  13. - alert: AnotherInstanceDown
  14. expr: up == 0
  15. for: 10m
  16. labels:
  17. severity: page
  18. annotations:
  19. summary: "Instance {{ $labels.instance }} down"
  20. description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."

This documentation is open-source. Please help improve it by filing issues or pull requests.