Alerting

Overview

The alerting of IoTDB is expected to support two modes:

  • Writing triggered: the user writes data to the original time series, and every time a piece of data is inserted, the judgment logic of trigger will be triggered. If the alerting requirements are met, an alert is sent to the data sink, The data sink then forwards the alert to the external terminal.

    • This mode is suitable for scenarios that need to monitor every piece of data in real time.
    • Since the operation in the trigger will affect the data writing performance, it is suitable for scenarios that are not sensitive to the original data writing performance.
  • Continuous query: the user writes data to the original time series, ContinousQuery periodically queries the original time series, and writes the query results into the new time series, Each write triggers the judgment logic of trigger, If the alerting requirements are met, an alert is sent to the data sink, The data sink then forwards the alert to the external terminal.

    • This mode is suitable for scenarios where data needs to be regularly queried within a certain period of time.
    • It is Suitable for scenarios where the original data needs to be down-sampled and persisted.
    • Since the timing query hardly affects the writing of the original time series, it is suitable for scenarios that are sensitive to the performance of the original data writing performance.

With the introduction of the Trigger into IoTDB, at present, users can use these two modules with AlertManager to realize the writing triggered alerting mode.

Deploying AlertManager

Installation

Precompiled binaries

The pre-compiled binary file can be downloaded hereAlerting - 图1open in new window.

Running command:

  1. ./alertmanager --config.file=<your_file>

Docker image

Available at Quay.ioAlerting - 图2open in new window or Docker HubAlerting - 图3open in new window.

Running command:

  1. docker run --name alertmanager -d -p 127.0.0.1:9093:9093 quay.io/prometheus/alertmanager

Configuration

The following is an example, which can cover most of the configuration rules. For detailed configuration rules, see hereAlerting - 图4open in new window.

Example:

  1. # alertmanager.yml
  2. global:
  3. # The smarthost and SMTP sender used for mail notifications.
  4. smtp_smarthost: 'localhost:25'
  5. smtp_from: 'alertmanager@example.org'
  6. # The root route on which each incoming alert enters.
  7. route:
  8. # The root route must not have any matchers as it is the entry point for
  9. # all alerts. It needs to have a receiver configured so alerts that do not
  10. # match any of the sub-routes are sent to someone.
  11. receiver: 'team-X-mails'
  12. # The labels by which incoming alerts are grouped together. For example,
  13. # multiple alerts coming in for cluster=A and alertname=LatencyHigh would
  14. # be batched into a single group.
  15. #
  16. # To aggregate by all possible labels use '...' as the sole label name.
  17. # This effectively disables aggregation entirely, passing through all
  18. # alerts as-is. This is unlikely to be what you want, unless you have
  19. # a very low alert volume or your upstream notification system performs
  20. # its own grouping. Example: group_by: [...]
  21. group_by: ['alertname', 'cluster']
  22. # When a new group of alerts is created by an incoming alert, wait at
  23. # least 'group_wait' to send the initial notification.
  24. # This way ensures that you get multiple alerts for the same group that start
  25. # firing shortly after another are batched together on the first
  26. # notification.
  27. group_wait: 30s
  28. # When the first notification was sent, wait 'group_interval' to send a batch
  29. # of new alerts that started firing for that group.
  30. group_interval: 5m
  31. # If an alert has successfully been sent, wait 'repeat_interval' to
  32. # resend them.
  33. repeat_interval: 3h
  34. # All the above attributes are inherited by all child routes and can
  35. # overwritten on each.
  36. # The child route trees.
  37. routes:
  38. # This routes performs a regular expression match on alert labels to
  39. # catch alerts that are related to a list of services.
  40. - match_re:
  41. service: ^(foo1|foo2|baz)$
  42. receiver: team-X-mails
  43. # The service has a sub-route for critical alerts, any alerts
  44. # that do not match, i.e. severity != critical, fall-back to the
  45. # parent node and are sent to 'team-X-mails'
  46. routes:
  47. - match:
  48. severity: critical
  49. receiver: team-X-pager
  50. - match:
  51. service: files
  52. receiver: team-Y-mails
  53. routes:
  54. - match:
  55. severity: critical
  56. receiver: team-Y-pager
  57. # This route handles all alerts coming from a database service. If there's
  58. # no team to handle it, it defaults to the DB team.
  59. - match:
  60. service: database
  61. receiver: team-DB-pager
  62. # Also group alerts by affected database.
  63. group_by: [alertname, cluster, database]
  64. routes:
  65. - match:
  66. owner: team-X
  67. receiver: team-X-pager
  68. - match:
  69. owner: team-Y
  70. receiver: team-Y-pager
  71. # Inhibition rules allow to mute a set of alerts given that another alert is
  72. # firing.
  73. # We use this to mute any warning-level notifications if the same alert is
  74. # already critical.
  75. inhibit_rules:
  76. - source_match:
  77. severity: 'critical'
  78. target_match:
  79. severity: 'warning'
  80. # Apply inhibition if the alertname is the same.
  81. # CAUTION:
  82. # If all label names listed in `equal` are missing
  83. # from both the source and target alerts,
  84. # the inhibition rule will apply!
  85. equal: ['alertname']
  86. receivers:
  87. - name: 'team-X-mails'
  88. email_configs:
  89. - to: 'team-X+alerts@example.org, team-Y+alerts@example.org'
  90. - name: 'team-X-pager'
  91. email_configs:
  92. - to: 'team-X+alerts-critical@example.org'
  93. pagerduty_configs:
  94. - routing_key: <team-X-key>
  95. - name: 'team-Y-mails'
  96. email_configs:
  97. - to: 'team-Y+alerts@example.org'
  98. - name: 'team-Y-pager'
  99. pagerduty_configs:
  100. - routing_key: <team-Y-key>
  101. - name: 'team-DB-pager'
  102. pagerduty_configs:
  103. - routing_key: <team-DB-key>

In the following example, we used the following configuration:

  1. # alertmanager.yml
  2. global:
  3. smtp_smarthost: ''
  4. smtp_from: ''
  5. smtp_auth_username: ''
  6. smtp_auth_password: ''
  7. smtp_require_tls: false
  8. route:
  9. group_by: ['alertname']
  10. group_wait: 1m
  11. group_interval: 10m
  12. repeat_interval: 10h
  13. receiver: 'email'
  14. receivers:
  15. - name: 'email'
  16. email_configs:
  17. - to: ''
  18. inhibit_rules:
  19. - source_match:
  20. severity: 'critical'
  21. target_match:
  22. severity: 'warning'
  23. equal: ['alertname']

API

The AlertManager API is divided into two versions, v1 and v2. The current AlertManager API version is v2 (For configuration see api/v2/openapi.yamlAlerting - 图5open in new window).

By default, the prefix is /api/v1 or /api/v2 and the endpoint for sending alerts is /api/v1/alerts or /api/v2/alerts. If the user specifies --web.route-prefix, for example --web.route-prefix=/alertmanager/, then the prefix will become /alertmanager/api/v1 or /alertmanager/api/v2, and the endpoint that sends the alert becomes /alertmanager/api/v1/alerts or /alertmanager/api/v2/alerts.

Creating trigger

Writing the trigger class

The user defines a trigger by creating a Java class and writing the logic in the hook. Please refer to Triggers for the specific configuration process.

The following example creates the org.apache.iotdb.trigger.ClusterAlertingExample class, Its alertManagerHandler member variables can send alerts to the AlertManager instance at the address of http://127.0.0.1:9093/.

When value> 100.0, send an alert of critical severity; when 50.0 <value <= 100.0, send an alert of warning severity .

  1. /*
  2. * Licensed to the Apache Software Foundation (ASF) under one
  3. * or more contributor license agreements. See the NOTICE file
  4. * distributed with this work for additional information
  5. * regarding copyright ownership. The ASF licenses this file
  6. * to you under the Apache License, Version 2.0 (the
  7. * "License"); you may not use this file except in compliance
  8. * with the License. You may obtain a copy of the License at
  9. *
  10. * http://www.apache.org/licenses/LICENSE-2.0
  11. *
  12. * Unless required by applicable law or agreed to in writing,
  13. * software distributed under the License is distributed on an
  14. * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  15. * KIND, either express or implied. See the License for the
  16. * specific language governing permissions and limitations
  17. * under the License.
  18. */
  19. package org.apache.iotdb.trigger;
  20. import org.apache.iotdb.db.engine.trigger.sink.alertmanager.AlertManagerConfiguration;
  21. import org.apache.iotdb.db.engine.trigger.sink.alertmanager.AlertManagerEvent;
  22. import org.apache.iotdb.db.engine.trigger.sink.alertmanager.AlertManagerHandler;
  23. import org.apache.iotdb.trigger.api.Trigger;
  24. import org.apache.iotdb.trigger.api.TriggerAttributes;
  25. import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType;
  26. import org.apache.iotdb.tsfile.write.record.Tablet;
  27. import org.apache.iotdb.tsfile.write.schema.MeasurementSchema;
  28. import org.slf4j.Logger;
  29. import org.slf4j.LoggerFactory;
  30. import java.io.IOException;
  31. import java.util.HashMap;
  32. import java.util.List;
  33. public class ClusterAlertingExample implements Trigger {
  34. private static final Logger LOGGER = LoggerFactory.getLogger(ClusterAlertingExample.class);
  35. private final AlertManagerHandler alertManagerHandler = new AlertManagerHandler();
  36. private final AlertManagerConfiguration alertManagerConfiguration =
  37. new AlertManagerConfiguration("http://127.0.0.1:9093/api/v2/alerts");
  38. private String alertname;
  39. private final HashMap<String, String> labels = new HashMap<>();
  40. private final HashMap<String, String> annotations = new HashMap<>();
  41. @Override
  42. public void onCreate(TriggerAttributes attributes) throws Exception {
  43. alertname = "alert_test";
  44. labels.put("series", "root.ln.wf01.wt01.temperature");
  45. labels.put("value", "");
  46. labels.put("severity", "");
  47. annotations.put("summary", "high temperature");
  48. annotations.put("description", "{{.alertname}}: {{.series}} is {{.value}}");
  49. alertManagerHandler.open(alertManagerConfiguration);
  50. }
  51. @Override
  52. public void onDrop() throws IOException {
  53. alertManagerHandler.close();
  54. }
  55. @Override
  56. public boolean fire(Tablet tablet) throws Exception {
  57. List<MeasurementSchema> measurementSchemaList = tablet.getSchemas();
  58. for (int i = 0, n = measurementSchemaList.size(); i < n; i++) {
  59. if (measurementSchemaList.get(i).getType().equals(TSDataType.DOUBLE)) {
  60. // for example, we only deal with the columns of Double type
  61. double[] values = (double[]) tablet.values[i];
  62. for (double value : values) {
  63. if (value > 100.0) {
  64. LOGGER.info("trigger value > 100");
  65. labels.put("value", String.valueOf(value));
  66. labels.put("severity", "critical");
  67. AlertManagerEvent alertManagerEvent =
  68. new AlertManagerEvent(alertname, labels, annotations);
  69. alertManagerHandler.onEvent(alertManagerEvent);
  70. } else if (value > 50.0) {
  71. LOGGER.info("trigger value > 50");
  72. labels.put("value", String.valueOf(value));
  73. labels.put("severity", "warning");
  74. AlertManagerEvent alertManagerEvent =
  75. new AlertManagerEvent(alertname, labels, annotations);
  76. alertManagerHandler.onEvent(alertManagerEvent);
  77. }
  78. }
  79. }
  80. }
  81. return true;
  82. }
  83. }

Creating trigger

The following SQL statement registered the trigger named root-ln-wf01-wt01-alert on the root.ln.wf01.wt01.temperature time series, whose operation logic is defined by org.apache.iotdb.trigger.ClusterAlertingExample java class.

  1. CREATE STATELESS TRIGGER `root-ln-wf01-wt01-alert`
  2. AFTER INSERT
  3. ON root.ln.wf01.wt01.temperature
  4. AS "org.apache.iotdb.trigger.AlertingExample"
  5. USING URI 'http://jar/ClusterAlertingExample.jar'

Writing data

When we finish the deployment and startup of AlertManager as well as the creation of Trigger, we can test the alerting by writing data to the time series.

  1. INSERT INTO root.ln.wf01.wt01(timestamp, temperature) VALUES (1, 0);
  2. INSERT INTO root.ln.wf01.wt01(timestamp, temperature) VALUES (2, 30);
  3. INSERT INTO root.ln.wf01.wt01(timestamp, temperature) VALUES (3, 60);
  4. INSERT INTO root.ln.wf01.wt01(timestamp, temperature) VALUES (4, 90);
  5. INSERT INTO root.ln.wf01.wt01(timestamp, temperature) VALUES (5, 120);

After executing the above writing statements, we can receive an alerting email. Because our AlertManager configuration above makes alerts of critical severity inhibit those of warning severity, the alerting email we receive only contains the alert triggered by the writing of (5, 120).

alerting