Monitor CockroachDB with Prometheus

CockroachDB generates detailed time series metrics for each node in a cluster. This page shows you how to pull these metrics into Prometheus, an open source tool for storing, aggregating, and querying time series data. It also shows you how to connect Grafana and Alertmanager to Prometheus for flexible data visualizations and notifications.

Tip:
For details about other monitoring options, see Monitoring and Alerting.

Before you begin

  • Make sure you have already started a CockroachDB cluster, either locally or in a production environment.

  • Note that all files used in this tutorial can be found in the monitoring directory of the CockroachDB repository.

Step 1. Install Prometheus

  • Download the 2.x Prometheus tarball for your OS.

  • Extract the binary and add it to your PATH. This makes it easy to start Prometheus from any shell.

  • Make sure Prometheus installed successfully:

  1. $ prometheus --version
  1. prometheus, version 2.2.1 (branch: HEAD, revision: bc6058c81272a8d938c05e75607371284236aadc)
  2. build user: root@149e5b3f0829
  3. build date: 20180314-14:21:40
  4. go version: go1.10

Step 2. Configure Prometheus

  1. $ wget https://raw.githubusercontent.com/cockroachdb/cockroach/master/monitoring/prometheus.yml \
  2. -O prometheus.yml

When you examine the configuration file, you'll see that it is set up to scrape the time series metrics of a single, insecure local node every 10 seconds:

  • scrape_interval: 10s defines the scrape interval.
  • metrics_path: '/_status/vars' defines the Prometheus-specific CockroachDB endpoint for scraping time series metrics.
  • scheme: 'http' specifies that the cluster being scraped is insecure.
  • targets: ['localhost:8080'] specifies the hostname and http-port of the Cockroach node to collect time series metrics on.
    • Edit the configuration file to match your deployment scenario:

ScenarioConfig ChangeMulti-node local clusterExpand the targets field to include 'localhost:<http-port>' for each additional node.Production clusterChange the targets field to include '<hostname>:<http-port>' for each node in the cluster. Also, be sure your network configuration allows TCP communication on the specified ports.Secure clusterUncomment scheme: 'https' and comment out scheme: 'http'.

  1. $ mkdir rules
  1. $ cd rules
  1. $ wget -P rules https://raw.githubusercontent.com/cockroachdb/cockroach/master/monitoring/rules/aggregation.rules.yml
  1. $ wget -P rules https://raw.githubusercontent.com/cockroachdb/cockroach/master/monitoring/rules/alerts.rules.yml

Step 3. Start Prometheus

  • Start the Prometheus server, with the —config.file flag pointing to the configuration file:
  1. $ prometheus --config.file=prometheus.yml
  1. INFO[0000] Starting prometheus (version=1.4.1, branch=master, revision=2a89e8733f240d3cd57a6520b52c36ac4744ce12) source=main.go:77
  2. INFO[0000] Build context (go=go1.7.3, user=root@e685d23d8809, date=20161128-10:02:41) source=main.go:78
  3. INFO[0000] Loading configuration file prometheus.yml source=main.go:250
  4. INFO[0000] Loading series map and head chunks... source=storage.go:354
  5. INFO[0000] 0 series loaded. source=storage.go:359
  6. INFO[0000] Listening on :9090 source=web.go:248
  7. INFO[0000] Starting target manager... source=targetmanager.go:63
  • Point your browser to http://<hostname of machine running prometheus>:9090, where you can use the Prometheus UI to query, aggregate, and graph CockroachDB time series metrics.

    • Prometheus auto-completes CockroachDB time series metrics for you, but if you want to see a full listing, with descriptions, point your browser to http://<hostname of a CockroachDB node>:8080/_status/vars.
    • For more details on using the Prometheus UI, see their official documentation.

Step 4. Send notifications with Alertmanager

Active monitoring helps you spot problems early, but it is also essential to send notifications when there are events that require investigation or intervention. In step 2, you already downloaded CockroachDB's starter alerting rules. Now, download, configure, and start Alertmanager.

  • Download the latest Alertmanager tarball for your OS.

  • Extract the binary and add it to your PATH. This makes it easy to start Alertmanager from any shell.

  • Make sure Alertmanager installed successfully:

  1. $ alertmanager --version
  1. alertmanager, version 0.15.0-rc.1 (branch: HEAD, revision: acb111e812530bec1ac6d908bc14725793e07cf3)
  2. build user: root@f278953f13ef
  3. build date: 20180323-13:07:06
  4. go version: go1.10
  • Edit the Alertmanager configuration file that came with the binary, simple.yml, to specify the desired receivers for notifications.

  • Start the Alertmanager server, with the —config.file flag pointing to the configuration file:

  1. $ alertmanager --config.file=simple.yml

Step 5. Visualize metrics in Grafana

Although Prometheus lets you graph metrics, Grafana is a much more powerful visualization tool that integrates with Prometheus easily.

FieldDefinitionNamePrometheusDefaultTrueTypePrometheusUrlhttp://<hostname of machine running prometheus>:9090AccessDirect

  1. # runtime dashboard: node status, including uptime, memory, and cpu.
  2. $ wget https://raw.githubusercontent.com/cockroachdb/cockroach/master/monitoring/grafana-dashboards/runtime.json
  1. # storage dashboard: storage availability.
  2. $ wget https://raw.githubusercontent.com/cockroachdb/cockroach/master/monitoring/grafana-dashboards/storage.json
  1. # sql dashboard: sql queries/transactions.
  2. $ wget https://raw.githubusercontent.com/cockroachdb/cockroach/master/monitoring/grafana-dashboards/sql.json
  1. # replicas dashboard: replica information and operations.
  2. $ wget https://raw.githubusercontent.com/cockroachdb/cockroach/master/monitoring/grafana-dashboards/replicas.json

See also

Was this page helpful?
YesNo