Using Rancher, you can quickly deploy leading open-source monitoring alerting solutions onto your cluster.

The rancher-monitoring operator, introduced in Rancher v2.5, is powered by Prometheus, Grafana, Alertmanager, the Prometheus Operator, and the Prometheus adapter. This page describes how to enable monitoring and alerting within a cluster using the new monitoring application.

Rancher’s solution allows users to:

Monitor the state and processes of your cluster nodes, Kubernetes components, and software deployments via Prometheus, a leading open-source monitoring solution.
Define alerts based on metrics collected via Prometheus
Create custom dashboards to make it easy to visualize collected metrics via Grafana
Configure alert-based notifications via Email, Slack, PagerDuty, etc. using Prometheus Alertmanager
Defines precomputed, frequently needed or computationally expensive expressions as new time series based on metrics collected via Prometheus (only available in 2.5)
Expose collected metrics from Prometheus to the Kubernetes Custom Metrics API via Prometheus Adapter for use in HPA (only available in 2.5)

More information about the resources that get deployed onto your cluster to support this solution can be found in the rancher-monitoring Helm chart, which closely tracks the upstream kube-prometheus-stack Helm chart maintained by the Prometheus community with certain changes tracked in the CHANGELOG.md.

If you previously enabled Monitoring, Alerting, or Notifiers in Rancher before v2.5, there is no upgrade path for switching to the new monitoring/ alerting solution. You will need to disable monitoring/ alerting/notifiers in Cluster Manager before deploying the new monitoring solution via Cluster Explorer.

For more information about upgrading the Monitoring app in Rancher 2.5, please refer to the migration docs.

About Prometheus
Enable Monitoring
- Default Alerts, Targets, and Grafana Dashboards
Windows Cluster Support
Using Monitoring
Uninstall Monitoring
Setting Resource Limits and Requests
Known Issues

About Prometheus

Prometheus provides a time series of your data, which is, according to the Prometheus documentation:

A stream of timestamped values belonging to the same metric and the same set of labeled dimensions, along with comprehensive statistics and metrics of the monitored cluster.

In other words, Prometheus lets you view metrics from your different Rancher and Kubernetes objects. Using timestamps, Prometheus lets you query and view these metrics in easy-to-read graphs and visuals, either through the Rancher UI or Grafana, which is an analytics viewing platform deployed along with Prometheus.

By viewing data that Prometheus scrapes from your cluster control plane, nodes, and deployments, you can stay on top of everything happening in your cluster. You can then use these analytics to better run your organization: stop system emergencies before they start, develop maintenance strategies, restore crashed servers, etc.

Enable Monitoring

As an administrator or cluster owner, you can configure Rancher to deploy Prometheus to monitor your Kubernetes cluster.

Requirements:

Make sure that you are allowing traffic on port 9796 for each of your nodes because Prometheus will scrape metrics from here.
Make sure your cluster fulfills the resource requirements. The cluster should have at least 1950Mi memory available, 2700m CPU, and 50Gi storage. A breakdown of the resource limits and requests is here.
When installing monitoring on an RKE cluster using RancherOS or Flatcar Linux nodes, change the etcd node certificate directory to /opt/rke/etc/kubernetes/ssl.

Enable Monitoring for use without SSL

In the Rancher UI, go to the cluster where you want to install monitoring and click Cluster Explorer.
Click Apps.
Click the rancher-monitoring app.
Optional: Click Chart Options and configure alerting, Prometheus and Grafana. For help, refer to the configuration reference.
Scroll to the bottom of the Helm chart README and click Install.

Result: The monitoring app is deployed in the cattle-monitoring-system namespace.

Enable Monitoring for use with SSL

Follow the steps on this page to create a secret in order for SSL to be used for alerts.
- The secret should be created in the cattle-monitoring-system namespace. If it doesn’t exist, create it first.
- Add the ca, cert, and key files to the secret.
In the Rancher UI, go to the cluster where you want to install monitoring and click Cluster Explorer.
Click Apps.
Click the rancher-monitoring app.
Click Alerting.
Click Additional Secrets and add the secrets created earlier.

Result: The monitoring app is deployed in the cattle-monitoring-system namespace.

When creating a receiver, SSL-enabled receivers such as email or webhook will have a SSL section with fields for CA File Path, Cert File Path, and Key File Path. Fill in these fields with the paths to each of ca, cert, and key. The path will be of the form /etc/alertmanager/secrets/name-of-file-in-secret.

For example, if you created a secret with these key-value pairs:

ca.crt=`base64-content`
cert.pem=`base64-content`
key.pfx=`base64-content`

Then Cert File Path would be set to /etc/alertmanager/secrets/cert.pem.

In the Rancher UI, go to the cluster where you want to install monitoring and click Cluster Explorer.
Click Apps.
Click the rancher-monitoring app.
Optional: Click Chart Options and configure alerting, Prometheus and Grafana. For help, refer to the configuration reference.
Scroll to the bottom of the Helm chart README and click Install.

Result: The monitoring app is deployed in the cattle-monitoring-system namespace.

Default Alerts, Targets, and Grafana Dashboards

By default, Rancher Monitoring deploys exporters (such as node-exporter and kube-state-metrics) as well as default Prometheus alerts and Grafana dashboards (curated by the kube-prometheus project) onto a cluster.

To see the default alerts, go to the Alertmanager UI and click Expand all groups.

To see what services you are monitoring, you will need to see your targets. To view the default targets, refer to Viewing the Prometheus Targets.

To see the default dashboards, go to the Grafana UI. In the left navigation bar, click the icon with four boxes and click Manage.

Next Steps

To configure Prometheus resources from the Rancher UI, click Apps & Marketplace > Monitoring in the upper left corner.

Windows Cluster Support

Available as of v2.5.8

When deployed onto an RKE1 Windows cluster, Monitoring V2 will now automatically deploy a windows-exporter DaemonSet and set up a ServiceMonitor to collect metrics from each of the deployed Pods. This will populate Prometheus with windows_ metrics that are akin to the node_ metrics exported by node_exporter for Linux hosts.

To be able to fully deploy Monitoring V2 for Windows, all of your Windows hosts must have a minimum wins version of v0.1.0.

For more details on how to upgrade wins on existing Windows hosts, refer to the section on Windows cluster support for Monitoring V2.

Using Monitoring

Installing rancher-monitoring makes the following dashboards available from the Rancher UI.

Note: If you want to set up Alertmanager, Grafana or Ingress, it has to be done with the settings on the Helm chart deployment. It’s problematic to create Ingress outside the deployment.

Grafana UI

Grafana allows you to query, visualize, alert on and understand your metrics no matter where they are stored. Create, explore, and share dashboards with your team and foster a data driven culture.

Rancher allows any users who are authenticated by Kubernetes and have access the Grafana service deployed by the Rancher Monitoring chart to access Grafana via the Rancher Dashboard UI. By default, all users who are able to access Grafana are given the Viewer role, which allows them to view any of the default dashboards deployed by Rancher.

However, users can choose to log in to Grafana as an Admin if necessary. The default Admin username and password for the Grafana instance will be admin/prom-operator, but alternative credentials can also be supplied on deploying or upgrading the chart.

Persistent Dashboards: To allow the Grafana dashboard to persist after it restarts, add the dashboard configuration JSON into a ConfigMap. ConfigMaps also allow the dashboards to be deployed with a GitOps or CD based approach. This allows the dashboard to be put under version control. For details, refer to this section.

To see the Grafana UI, install rancher-monitoring. Then go to the Cluster Explorer. In the top left corner, click Cluster Explorer > Monitoring. Then click **Grafana.

Cluster Compute Resources Dashboard in Grafana

Default Dashboards in Grafana

Prometheus UI

To see the Prometheus UI, install rancher-monitoring. Then go to the Cluster Explorer. In the top left corner, click Cluster Explorer > Monitoring. Then click Prometheus Graph.

Prometheus Graph UI

Viewing the Prometheus Targets

To see the Prometheus Targets, install rancher-monitoring. Then go to the Cluster Explorer. In the top left corner, click Cluster Explorer > Monitoring. Then click Prometheus Targets.

Targets in the Prometheus UI

Viewing the PrometheusRules

To see the PrometheusRules, install rancher-monitoring. Then go to the Cluster Explorer. In the top left corner, click Cluster Explorer > Monitoring. Then click Prometheus Rules.

Rules in the Prometheus UI

For more information on PrometheusRules in Rancher, see this page.

Viewing Active Alerts in Alertmanager

When rancher-monitoring is installed, the Prometheus Alertmanager UI is deployed.

The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or OpsGenie. It also takes care of silencing and inhibition of alerts.

In the Alertmanager UI, you can view your alerts and the current Alertmanager configuration.

To see the PrometheusRules, install rancher-monitoring. Then go to the Cluster Explorer. In the top left corner, click Cluster Explorer > Monitoring. Then click Alertmanager.

Result: The Alertmanager UI opens in a new tab. For help with configuration, refer to the official Alertmanager documentation.

For more information on configuring Alertmanager in Rancher, see this page.

The Alertmanager UI

Uninstall Monitoring

From the Cluster Explorer, click Apps & Marketplace.
Click Installed Apps.
Go to the cattle-monitoring-system namespace and check the boxes for rancher-monitoring-crd and rancher-monitoring.
Click Delete.
Confirm Delete.

Result: rancher-monitoring is uninstalled.

Note on Persistent Grafana Dashboards: For users who are using Monitoring V2 v9.4.203 or below, uninstalling the Monitoring chart will delete the cattle-dashboards namespace, which will delete all persisted dashboards, unless the namespace is marked with the annotation helm.sh/resource-policy: "keep". This annotation is added by default in Monitoring V2 v14.5.100+ but can be manually applied on the cattle-dashboards namespace before an uninstall if an older version of the Monitoring chart is currently installed onto your cluster.

Setting Resource Limits and Requests

The resource requests and limits can be configured when installing rancher-monitoring.

The default values are in the values.yaml in the rancher-monitoring Helm chart.

The default values in the table below are the minimum required resource limits and requests.

Resource Name	Memory Limit	CPU Limit	Memory Request	CPU Request
alertmanager	500Mi	1000m	100Mi	100m
grafana	200Mi	200m	100Mi	100m
kube-state-metrics subchart	200Mi	100m	130Mi	100m
prometheus-node-exporter subchart	50Mi	200m	30Mi	100m
prometheusOperator	500Mi	200m	100Mi	100m
prometheus	2500Mi	1000m	1750Mi	750m
Total	3950Mi	2700m	2210Mi	1250m

At least 50Gi storage is recommended.

Known Issues

There is a known issue that K3s clusters require more default memory. If you are enabling monitoring on a K3s cluster, we recommend to setting prometheus.prometheusSpec.resources.memory.limit to 2500 Mi and prometheus.prometheusSpec.resources.memory.request to 1750 Mi.

Monitoring/Alerting

About Prometheus

Enable Monitoring

Enable Monitoring for use without SSL

Enable Monitoring for use with SSL

Default Alerts, Targets, and Grafana Dashboards

Next Steps

Windows Cluster Support

Using Monitoring

Grafana UI

Prometheus UI

Viewing the Prometheus Targets

Viewing the PrometheusRules

Viewing Active Alerts in Alertmanager

Uninstall Monitoring

Setting Resource Limits and Requests

Known Issues