Provision Grafana Alerting resources

Alerting infrastructure is often complex, with many pieces of the pipeline that often live in different places. Scaling this across multiple teams and organizations is an especially challenging task. Grafana Alerting provisioning makes this process easier by enabling you to create, manage, and maintain your alerting data in a way that best suits your organization.

There are three options to choose from:

  1. Use file provisioning to provision your Grafana Alerting resources, such as alert rules and contact points, through files on disk.

  2. Provision your alerting resources using the Grafana HTTP API.

    For more information on the Grafana Alerting provisioning API, refer to Alerting provisioning API.

  3. Provision your alerting resources using Terraform.

Note:

Currently, provisioning for Grafana Alerting supports alert rules, contact points, mute timings, and templates. Provisioned alerting resources can only be edited in the source that created them and not from within Grafana or any other source. For example, if you provision your alerting resources using files from disk, you cannot edit the data in Terraform or from within Grafana.

Useful Links:

Grafana provisioning

Grafana Cloud provisioning

Grafana Alerting provisioning API

Create and manage alerting resources using file provisioning

Provision your alerting resources using files from disk. When you start Grafana, the data from these files is created in your Grafana system. Grafana adds any new resources you created, updates any that you changed, and deletes old ones.

Arrange your files in a directory in a way that best suits your use case. For example, you can choose a team-based layout where every team has its own file, you can have one big file for all your teams; or you can have one file per resource type.

Details on how to set up the files and which fields are required for each object are listed below depending on which resource you are provisioning.

Note:

Provisioning takes place during the initial set up of your Grafana system, but you can re-run it at any time using the Grafana Alerting provisioning API.

Provision alert rules

Create or delete alert rules in your Grafana instance(s).

  1. Create an alert rule in Grafana.

  2. Use the Alerting provisioning API to extract the alert rule.

  3. Copy the contents into a YAML or JSON configuration file.

    Example configuration files can be found below.

  4. Add the file(s) to your GitOps workflow, so that they deploy alongside your Grafana instance(s).

  5. Delete the alert rule in Grafana.

    Note:

    If you do not delete the alert rule, it will clash with the provisioned alert rule once uploaded.

Here is an example of a configuration file for creating alert rules.

  1. # config file version
  2. apiVersion: 1
  3. # List of rule groups to import or update
  4. groups:
  5. # <int> organization ID, default = 1
  6. - orgId: 1
  7. # <string, required> name of the rule group
  8. name: my_rule_group
  9. # <string, required> name of the folder the rule group will be stored in
  10. folder: my_first_folder
  11. # <duration, required> interval that the rule group should evaluated at
  12. interval: 60s
  13. # <list, required> list of rules that are part of the rule group
  14. rules:
  15. # <string, required> unique identifier for the rule
  16. - uid: my_id_1
  17. # <string, required> title of the rule that will be displayed in the UI
  18. title: my_first_rule
  19. # <string, required> which query should be used for the condition
  20. condition: A
  21. # <list, required> list of query objects that should be executed on each
  22. # evaluation - should be obtained trough the API
  23. data:
  24. - refId: A
  25. datasourceUid: '-100'
  26. model:
  27. conditions:
  28. - evaluator:
  29. params:
  30. - 3
  31. type: gt
  32. operator:
  33. type: and
  34. query:
  35. params:
  36. - A
  37. reducer:
  38. type: last
  39. type: query
  40. datasource:
  41. type: __expr__
  42. uid: '-100'
  43. expression: 1==0
  44. intervalMs: 1000
  45. maxDataPoints: 43200
  46. refId: A
  47. type: math
  48. # <string> UID of a dashboard that the alert rule should be linked to
  49. dashboardUid: my_dashboard
  50. # <int> ID of the panel that the alert rule should be linked to
  51. panelId: 123
  52. # <string> the state the alert rule will have when no data is returned
  53. # possible values: "NoData", "Alerting", "OK", default = NoData
  54. noDataState: Alerting
  55. # <string> the state the alert rule will have when the query execution
  56. # failed - possible values: "Error", "Alerting", "OK"
  57. # default = Alerting
  58. # <duration, required> for how long should the alert fire before alerting
  59. for: 60s
  60. # <map<string, string>> a map of strings to pass around any data
  61. annotations:
  62. some_key: some_value
  63. # <map<string, string> a map of strings that can be used to filter and
  64. # route alerts
  65. labels:
  66. team: sre_team_1

Here is an example of a configuration file for deleting alert rules.

  1. # config file version
  2. apiVersion: 1
  3. # List of alert rule UIDs that should be deleted
  4. deleteRules:
  5. # <int> organization ID, default = 1
  6. - orgId: 1
  7. # <string, required> unique identifier for the rule
  8. uid: my_id_1

Provision contact points

Create or delete contact points in your Grafana instance(s).

  1. Create a YAML or JSON configuration file.

    Example configuration files can be found below.

  2. Add the file(s) to your GitOps workflow, so that they deploy alongside your Grafana instance(s).

Here is an example of a configuration file for creating contact points.

  1. # config file version
  2. apiVersion: 1
  3. # List of contact points to import or update
  4. contactPoints:
  5. # <int> organization ID, default = 1
  6. - orgId: 1
  7. # <string, required> name of the contact point
  8. name: cp_1
  9. receivers:
  10. # <string, required> unique identifier for the receiver
  11. - uid: first_uid
  12. # <string, required> type of the receiver
  13. type: prometheus-alertmanager
  14. # <object, required> settings for the specific receiver type
  15. settings:
  16. url: http://test:9000

Here is an example of a configuration file for deleting contact points.

  1. # config file version
  2. apiVersion: 1
  3. # List of receivers that should be deleted
  4. deleteContactPoints:
  5. # <int> organization ID, default = 1
  6. - orgId: 1
  7. # <string, required> unique identifier for the receiver
  8. uid: first_uid

Settings

Here are some examples of settings you can use for the different contact point types.

Alertmanager
  1. type: prometheus-alertmanager
  2. settings:
  3. # <string, required>
  4. url: http://localhost:9093
  5. # <string>
  6. basicAuthUser: abc
  7. # <string>
  8. basicAuthPassword: abc123
DingDing
  1. type: dingding
  2. settings:
  3. # <string, required>
  4. url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxx
  5. # <string> options: link, actionCard
  6. msgType: link
  7. # <string>
  8. message: |
  9. {{ template "default.message" . }}
Discord
  1. type: discord
  2. settings:
  3. # <string, required>
  4. url: https://discord/webhook
  5. # <string>
  6. avatar_url: https://my_avatar
  7. # <string>
  8. use_discord_username: Grafana
  9. # <string>
  10. message: |
  11. {{ template "default.message" . }}
E-Mail
  1. type: email
  2. settings:
  3. # <string, required>
  4. addresses: me@example.com;you@example.com
  5. # <bool>
  6. singleEmail: false
  7. # <string>
  8. message: my optional message to include
  9. # <string>
  10. subject: |
  11. {{ template "default.title" . }}
Google Hangouts Chat
  1. type: googlechat
  2. settings:
  3. # <string, required>
  4. url: https://google/webhook
  5. # <string>
  6. message: |
  7. {{ template "default.message" . }}
Kafka
  1. type: kafka
  2. settings:
  3. # <string, required>
  4. kafkaRestProxy: http://localhost:8082
  5. # <string, required>
  6. kafkaTopic: topic1
LINE
  1. type: line
  2. settings:
  3. # <string, required>
  4. token: xxx
Microsoft Teams
  1. type: teams
  2. settings:
  3. # <string, required>
  4. url: https://ms_teams_url
  5. # <string>
  6. title: |
  7. {{ template "default.title" . }}
  8. # <string>
  9. sectiontitle: ''
  10. # <string>
  11. message: |
  12. {{ template "default.message" . }}
OpsGenie
  1. type: opsgenie
  2. settings:
  3. # <string, required>
  4. apiKey: xxx
  5. # <string, required>
  6. apiUrl: https://api.opsgenie.com/v2/alerts
  7. # <string>
  8. message: |
  9. {{ template "default.title" . }}
  10. # <string>
  11. description: some descriptive description
  12. # <bool>
  13. autoClose: false
  14. # <bool>
  15. overridePriority: false
  16. # <string> options: tags, details, both
  17. sendTagsAs: both
PagerDuty
  1. type: pagerduty
  2. settings:
  3. # <string, required>
  4. integrationKey: XXX
  5. # <string> options: critical, error, warning, info
  6. severity: critical
  7. # <string>
  8. class: ping failure
  9. # <string>
  10. component: Grafana
  11. # <string>
  12. group: app-stack
  13. # <string>
  14. summary: |
  15. {{ template "default.message" . }}
Pushover
  1. type: pushover
  2. settings:
  3. # <string, required>
  4. apiToken: XXX
  5. # <string, required>
  6. userKey: user1,user2
  7. # <string>
  8. device: device1,device2
  9. # <string> options (high to low): 2,1,0,-1,-2
  10. priority: '2'
  11. # <string>
  12. retry: '30'
  13. # <string>
  14. expire: '120'
  15. # <string>
  16. sound: siren
  17. # <string>
  18. okSound: magic
  19. # <string>
  20. message: |
  21. {{ template "default.message" . }}
Slack
  1. type: slack
  2. settings:
  3. # <string, required>
  4. recipient: alerting-dev
  5. # <string, required>
  6. token: xxx
  7. # <string>
  8. username: grafana_bot
  9. # <string>
  10. icon_emoji: heart
  11. # <string>
  12. icon_url: https://icon_url
  13. # <string>
  14. mentionUsers: user_1,user_2
  15. # <string>
  16. mentionGroups: group_1,group_2
  17. # <string> options: here, channel
  18. mentionChannel: here
  19. # <string> Optionally provide a Slack incoming webhook URL for sending messages, in this case the token isn't necessary
  20. url: https://some_webhook_url
  21. # <string>
  22. endpointUrl: https://custom_url/api/chat.postMessage
  23. # <string>
  24. title: |
  25. {{ template "slack.default.title" . }}
  26. text: |
  27. {{ template "slack.default.text" . }}
Sensu Go
  1. type: sensugo
  2. settings:
  3. # <string, required>
  4. url: http://sensu-api.local:8080
  5. # <string, required>
  6. apikey: xxx
  7. # <string>
  8. entity: default
  9. # <string>
  10. check: default
  11. # <string>
  12. handler: some_handler
  13. # <string>
  14. namespace: default
  15. # <string>
  16. message: |
  17. {{ template "default.message" . }}
Telegram
  1. type: telegram
  2. settings:
  3. # <string, required>
  4. bottoken: xxx
  5. # <string, required>
  6. chatid: some_chat_id
  7. # <string>
  8. message: |
  9. {{ template "default.message" . }}
Threema Gateway
  1. type: threema
  2. settings:
  3. # <string, required>
  4. api_secret: xxx
  5. # <string, required>
  6. gateway_id: A5K94S9
  7. # <string, required>
  8. recipient_id: A9R4KL4S
VictorOps
  1. type: victorops
  2. settings:
  3. # <string, required>
  4. url: XXX
  5. # <string> options: CRITICAL, WARNING
  6. messageType: CRITICAL
Webhook
  1. type: webhook
  2. settings:
  3. # <string, required>
  4. url: https://endpoint_url
  5. # <string> options: POST, PUT
  6. httpMethod: POST
  7. # <string>
  8. username: abc
  9. # <string>
  10. password: abc123
  11. # <string>
  12. authorization_scheme: Bearer
  13. # <string>
  14. authorization_credentials: abc123
  15. # <string>
  16. maxAlerts: '10'
WeCom
  1. type: wecom
  2. settings:
  3. # <string, required>
  4. url: https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxxx
  5. # <string>
  6. message: |
  7. {{ template "default.message" . }}
  8. # <string>
  9. title: |
  10. {{ template "default.title" . }}

Provision notification policies

Create or reset notification policies in your Grafana instance(s).

  1. Create a YAML or JSON configuration file.

    Example configuration files can be found below.

  2. Add the file(s) to your GitOps workflow, so that they deploy alongside your Grafana instance(s).

Here is an example of a configuration file for creating notification policiies.

  1. # config file version
  2. apiVersion: 1
  3. # List of notification policies
  4. policies:
  5. # <int> organization ID, default = 1
  6. - orgId: 1
  7. # <string> name of the contact point that should be used for this route
  8. receiver: grafana-default-email
  9. # <list> The labels by which incoming alerts are grouped together. For example,
  10. # multiple alerts coming in for cluster=A and alertname=LatencyHigh would
  11. # be batched into a single group.
  12. #
  13. # To aggregate by all possible labels use the special value '...' as
  14. # the sole label name, for example:
  15. # group_by: ['...']
  16. # This effectively disables aggregation entirely, passing through all
  17. # alerts as-is. This is unlikely to be what you want, unless you have
  18. # a very low alert volume or your upstream notification system performs
  19. # its own grouping.
  20. group_by: ['...']
  21. # <list> a list of matchers that an alert has to fulfill to match the node
  22. matchers:
  23. - alertname = Watchdog
  24. - severity =~ "warning|critical"
  25. # <list> Times when the route should be muted. These must match the name of a
  26. # mute time interval.
  27. # Additionally, the root node cannot have any mute times.
  28. # When a route is muted it will not send any notifications, but
  29. # otherwise acts normally (including ending the route-matching process
  30. # if the `continue` option is not set)
  31. mute_time_intervals:
  32. - abc
  33. # <duration> How long to initially wait to send a notification for a group
  34. # of alerts. Allows to collect more initial alerts for the same group.
  35. # (Usually ~0s to few minutes), default = 30s
  36. group_wait: 30s
  37. # <duration> How long to wait before sending a notification about new alerts that
  38. # are added to a group of alerts for which an initial notification has
  39. # already been sent. (Usually ~5m or more), default = 5m
  40. group_internval: 5m
  41. # <duration> How long to wait before sending a notification again if it has already
  42. # been sent successfully for an alert. (Usually ~3h or more), default = 4h
  43. repeat_interval: 4h
  44. # <list> Zero or more child routes
  45. # routes:
  46. # ...

Here is an example of a configuration file for resetting notification policies.

  1. # config file version
  2. apiVersion: 1
  3. # List of orgIds that should be reset to the default policy
  4. resetPolicies:
  5. - 1

Provision templates

Create or delete templates in your Grafana instance(s).

  1. Create a YAML or JSON configuration file.

    Example configuration files can be found below.

  2. Add the file(s) to your GitOps workflow, so that they deploy alongside your Grafana instance(s).

Here is an example of a configuration file for creating templates.

  1. # config file version
  2. apiVersion: 1
  3. # List of templates to import or update
  4. templates:
  5. # <int> organization ID, default = 1
  6. - orgID: 1
  7. # <string, required> name of the template, must be unique
  8. name: my_first_template
  9. # <string, required> content of the the template
  10. template: Alerting with a custom text template

Here is an example of a configuration file for deleting templates.

  1. # config file version
  2. apiVersion: 1
  3. # List of alert rule UIDs that should be deleted
  4. deleteTemplates:
  5. # <int> organization ID, default = 1
  6. - orgId: 1
  7. # <string, required> name of the template, must be unique
  8. name: my_first_template

Provision mute timings

Create or delete mute timings in your Grafana instance(s).

  1. Create a YAML or JSON configuration file.

    Example configuration files can be found below.

  2. Add the file(s) to your GitOps workflow, so that they deploy alongside your Grafana instance(s).

Here is an example of a configuration file for creating mute timings.

  1. # config file version
  2. apiVersion: 1
  3. # List of mute time intervals to import or update
  4. muteTimes:
  5. # <int> organization ID, default = 1
  6. - orgId: 1
  7. # <string, required> name of the mute time interval, must be unique
  8. name: mti_1
  9. # <list> time intervals that should trigger the muting
  10. # refer to https://prometheus.io/docs/alerting/latest/configuration/#time_interval-0
  11. time_intervals:
  12. - times:
  13. - start_time: '06:00'
  14. end_time: '23:59'
  15. weekdays: ['monday:wednesday', 'saturday', 'sunday']
  16. months: ['1:3', 'may:august', 'december']
  17. years: ['2020:2022', '2030']
  18. days_of_month: ['1:5', '-3:-1']

Here is an example of a configuration file for deleting mute timings.

  1. # config file version
  2. apiVersion: 1
  3. # List of mute time intervals that should be deleted
  4. deleteMuteTimes:
  5. # <int> organization ID, default = 1
  6. - orgId: 1
  7. # <string, required> name of the mute time interval, must be unique
  8. name: mti_1

File provisioning using Kubernetes

If you are a Kubernetes user, you can leverage file provisioning using Kubernetes configuration maps.

  1. Create one or more configuration maps as follows.
  1. apiVersion: v1
  2. kind: ConfigMap
  3. metadata:
  4. name: grafana-alerting
  5. data:
  6. provisioning.yaml: |
  7. templates:
  8. - name: my_first_template
  9. template: the content for my template
  1. Add the file(s) to your GitOps workflow, so that they deploy alongside your Grafana instance(s).
  1. apiVersion: apps/v1
  2. kind: Deployment
  3. metadata:
  4. name: grafana
  5. spec:
  6. replicas: 1
  7. selector:
  8. matchLabels:
  9. app: grafana
  10. template:
  11. metadata:
  12. name: grafana
  13. labels:
  14. app: grafana
  15. spec:
  16. containers:
  17. - name: grafana
  18. image: grafana/grafana:latest
  19. ports:
  20. - name: grafana
  21. containerPort: 3000
  22. volumeMounts:
  23. - mountPath: /etc/grafana/provisioning/alerting
  24. name: grafana-alerting
  25. readOnly: false
  26. volumes:
  27. - name: grafana-alerting
  28. configMap:
  29. defaultMode: 420
  30. name: grafana-alerting

This eliminates the need for a persistent database to use Grafana Alerting in Kubernetes; all your provisioned resources appear after each restart or re-deployment.