Alerting based on metrics

In this tutorial we will create alerts on the ping_request_count metric that we instrumented earlier in the Instrumenting HTTP server written in Go tutorial.

For the sake of this tutorial we will alert when the ping_request_count metric is greater than 5, Checkout real world best practices to learn more about alerting principles.

Download the latest release of Alertmanager for your operating system from here

Alertmanager supports various receivers like email, webhook, pagerduty, slack etc through which it can notify when an alert is firing. You can find the list of receivers and how to configure them here. We will use webhook as a receiver for this tutorial, head over to webhook.site and copy the webhook URL which we will use later to configure the Alertmanager.

First let’s setup Alertmanager with webhook receiver.

alertmanager.yml

  1. global:
  2. resolve_timeout: 5m
  3. route:
  4. receiver: webhook_receiver
  5. receivers:
  6. - name: webhook_receiver
  7. webhook_configs:
  8. - url: '<INSERT-YOUR-WEBHOOK>'
  9. send_resolved: false

Replace <INSERT-YOUR-WEBHOOK> with the webhook that we copied earlier in the alertmanager.yml file and run the Alertmanager using the following command.

alertmanager --config.file=alertmanager.yml

Once the Alertmanager is up and running navigate to http://localhost:9093 and you should be able to access it.

Now that we have configured the Alertmanager with webhook receiver let’s add the rules to the Prometheus config.

prometheus.yml

  1. global:
  2. scrape_interval: 15s
  3. evaluation_interval: 10s
  4. rule_files:
  5. - rules.yml
  6. alerting:
  7. alertmanagers:
  8. - static_configs:
  9. - targets:
  10. - localhost:9093
  11. scrape_configs:
  12. - job_name: prometheus
  13. static_configs:
  14. - targets: ["localhost:9090"]
  15. - job_name: simple_server
  16. static_configs:
  17. - targets: ["localhost:8090"]

If you notice the evaluation_interval,rule_files and alerting sections are added to the Prometheus config, the evaluation_interval defines the intervals at which the rules are evaluated, rule_files accepts an array of yaml files that defines the rules and the alerting section defines the Alertmanager configuration. As mentioned in the beginning of this tutorial we will create a basic rule where we want to raise an alert when the ping_request_count value is greater than 5.

rules.yml

  1. groups:
  2. - name: Count greater than 5
  3. rules:
  4. - alert: CountGreaterThan5
  5. expr: ping_request_count > 5
  6. for: 10s

Now let’s run Prometheus using the following command.

prometheus --config.file=./prometheus.yml

Open http://localhost:9090/rules in your browser to see the rules. Next run the instrumented ping server and visit the http://localhost:8090/ping endpoint and refresh the page atleast 6 times. You can check the ping count by navigating to http://localhost:8090/metrics endpoint. To see the status of the alert visit http://localhost:9090/alerts. Once the condition ping_request_count > 5 is true for more than 10s the state will become FIRING. Now if you navigate back to your webhook.site URL you will see the alert message.

Similarly Alertmanager can be configured with other receivers to notify when an alert is firing.

This documentation is open-source. Please help improve it by filing issues or pull requests.