Orchestrate CockroachDB in a Single Kubernetes Cluster

This page shows you how to orchestrate the deployment, management, and monitoring of a secure 3-node CockroachDB cluster in a single Kubernetes cluster, using the StatefulSet feature directly or via the Helm package manager for Kubernetes.

To deploy across multiple Kubernetes clusters in different geographic regions instead, see Kubernetes Multi-Cluster Deployment. Also, for details about potential performance bottlenecks to be aware of when running CockroachDB in Kubernetes and guidance on how to optimize your deployment for better performance, see CockroachDB Performance on Kubernetes.

Before you begin

Before getting started, it's helpful to review some Kubernetes-specific terminology and current limitations.

Kubernetes terminology

FeatureDescription
instanceA physical or virtual machine. In this tutorial, you'll create GCE or AWS instances and join them into a single Kubernetes cluster from your local workstation.
podA pod is a group of one of more Docker containers. In this tutorial, each pod will run on a separate instance and include one Docker container running a single CockroachDB node. You'll start with 3 pods and grow to 4.
StatefulSetA StatefulSet is a group of pods treated as stateful units, where each pod has distinguishable network identity and always binds back to the same persistent storage on restart. StatefulSets are considered stable as of Kubernetes version 1.9 after reaching beta in version 1.5.
persistent volumeA persistent volume is a piece of networked storage (Persistent Disk on GCE, Elastic Block Store on AWS) mounted into a pod. The lifetime of a persistent volume is decoupled from the lifetime of the pod that's using it, ensuring that each CockroachDB node binds back to the same storage on restart.This tutorial assumes that dynamic volume provisioning is available. When that is not the case, persistent volume claims need to be created manually.
CSRA CSR, or Certificate Signing Request, is a request to have a TLS certificate signed by the Kubernetes cluster's built-in CA. As each pod is created, it issues a CSR for the CockroachDB node running in the pod, which must be manually checked and approved. The same is true for clients as they connect to the cluster.
RBACRBAC, or Role-Based Access Control, is the system Kubernetes uses to manage permissions within the cluster. In order to take an action (e.g., get or create) on an API resource (e.g., a pod or CSR), the client must have a Role that allows it to do so. This tutorial creates the RBAC resources necessary for CockroachDB to create and access certificates.

Limitations

Kubernetes version

Kubernetes 1.8 or higher is required in order to use our most up-to-date configuration files. Earlier Kubernetes releases do not support some of the options used in our configuration files. If you need to run on an older version of Kubernetes, we have kept around configuration files that work on older Kubernetes releases in the versioned subdirectories of https://github.com/cockroachdb/cockroach/tree/master/cloud/kubernetes (e.g., v1.7).

Storage

At this time, orchestrations of CockroachDB with Kubernetes use external persistent volumes that are often replicated by the provider. Because CockroachDB already replicates data automatically, this additional layer of replication is unnecessary and can negatively impact performance. High-performance use cases on a private Kubernetes cluster may want to consider a DaemonSet deployment until StatefulSets support node-local storage.

Step 1. Start Kubernetes

Choose whether you want to orchestrate CockroachDB with Kubernetes using the hosted Google Kubernetes Engine (GKE) service or manually on Google Compute Engine (GCE) or AWS. The instructions below will change slightly depending on your choice.

Hosted GKE

This includes installing gcloud, which is used to create and delete Kubernetes Engine clusters, and kubectl, which is the command-line tool used to manage Kubernetes from your workstation.

Tip:
The documentation offers the choice of using Google's Cloud Shell product or using a local shell on your machine. Choose to use a local shell if you want to be able to view the CockroachDB Admin UI using the steps in this guide.

  • From your local workstation, start the Kubernetes cluster:
  1. $ gcloud container clusters create cockroachdb
  1. Creating cluster cockroachdb...done.

This creates GKE instances and joins them into a single Kubernetes cluster named cockroachdb.

The process can take a few minutes, so do not move on to the next step until you see a Creating cluster cockroachdb…done message and details about your cluster.

  • Get the email address associated with your Google Cloud account:
  1. $ gcloud info | grep Account
  1. Account: [your.google.cloud.email@example.org]

Warning:

This command returns your email address in all lowercase. However, in the next step, you must enter the address using the accurate capitalization. For example, if your address is YourName@example.com, you must use YourName@example.com and not yourname@example.com.

  1. $ kubectl create clusterrolebinding $USER-cluster-admin-binding --clusterrole=cluster-admin --user=<your.google.cloud.email@example.org>
  1. clusterrolebinding "cluster-admin-binding" created

Manual GCE

From your local workstation, install prerequisites and start a Kubernetes cluster as described in the Running Kubernetes on Google Compute Engine documentation.

The process includes:

  • Creating a Google Cloud Platform account, installing gcloud, and other prerequisites.
  • Downloading and installing the latest Kubernetes release.
  • Creating GCE instances and joining them into a single Kubernetes cluster.
  • Installing kubectl, the command-line tool used to manage Kubernetes from your workstation.

Manual AWS

From your local workstation, install prerequisites and start a Kubernetes cluster as described in the Running Kubernetes on AWS EC2 documentation.

Step 2. Start CockroachDB

To start your CockroachDB cluster, you can either use our StatefulSet configuration and related files directly, or you can use the Helm package manager for Kubernetes to simplify the process.

Note:

If you want to use a different certificate authority than the one Kubernetes uses, or if your Kubernetes cluster doesn't fully support certificate-signing requests (e.g., in Amazon EKS), use these configuration files instead of the ones referenced below.

  • From your local workstation, use our cockroachdb-statefulset-secure.yaml file to create the StatefulSet that automatically creates 3 pods, each with a CockroachDB node running inside it:
  1. $ kubectl create -f https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/cockroachdb-statefulset-secure.yaml
  1. serviceaccount "cockroachdb" created
  2. role "cockroachdb" created
  3. clusterrole "cockroachdb" created
  4. rolebinding "cockroachdb" created
  5. clusterrolebinding "cockroachdb" created
  6. service "cockroachdb-public" created
  7. service "cockroachdb" created
  8. poddisruptionbudget "cockroachdb-budget" created
  9. statefulset "cockroachdb" created

Alternatively, if you'd rather start with a configuration file that has been customized for performance:

  1. $ curl -O https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/performance/cockroachdb-statefulset-secure.yaml
  • Modify the file wherever there is a TODO comment.

  • Use the file to create the StatefulSet and start the cluster:

  1. $ kubectl create -f cockroachdb-statefulset-secure.yaml
  • As each pod is created, it issues a Certificate Signing Request, or CSR, to have the node's certificate signed by the Kubernetes CA. You must manually check and approve each node's certificates, at which point the CockroachDB node is started in the pod.

    • Get the name of the Pending CSR for the first pod:
  1. $ kubectl get csr
  1. NAME AGE REQUESTOR CONDITION
  2. default.node.cockroachdb-0 1m system:serviceaccount:default:default Pending
  3. node-csr-0Xmb4UTVAWMEnUeGbW4KX1oL4XV_LADpkwjrPtQjlZ4 4m kubelet Approved,Issued
  4. node-csr-NiN8oDsLhxn0uwLTWa0RWpMUgJYnwcFxB984mwjjYsY 4m kubelet Approved,Issued
  5. node-csr-aU78SxyU69pDK57aj6txnevr7X-8M3XgX9mTK0Hso6o 5m kubelet Approved,Issued

If you do not see a Pending CSR, wait a minute and try again.

  • Examine the CSR for the first pod:
  1. $ kubectl describe csr default.node.cockroachdb-0
  1. Name: default.node.cockroachdb-0
  2. Labels: <none>
  3. Annotations: <none>
  4. CreationTimestamp: Thu, 09 Nov 2017 13:39:37 -0500
  5. Requesting User: system:serviceaccount:default:default
  6. Status: Pending
  7. Subject:
  8. Common Name: node
  9. Serial Number:
  10. Organization: Cockroach
  11. Subject Alternative Names:
  12. DNS Names: localhost
  13. cockroachdb-0.cockroachdb.default.svc.cluster.local
  14. cockroachdb-public
  15. IP Addresses: 127.0.0.1
  16. 10.48.1.6
  17. Events: <none>
  • If everything looks correct, approve the CSR for the first pod:
  1. $ kubectl certificate approve default.node.cockroachdb-0
  1. certificatesigningrequest "default.node.cockroachdb-0" approved
  • Repeat steps 1-3 for the other 2 pods.
  • Initialize the cluster:

    • Confirm that three pods are Running successfully. Note that they will notbe considered Ready until after the cluster has been initialized:
  1. $ kubectl get pods
  1. NAME READY STATUS RESTARTS AGE
  2. cockroachdb-0 0/1 Running 0 2m
  3. cockroachdb-1 0/1 Running 0 2m
  4. cockroachdb-2 0/1 Running 0 2m
  • Confirm that the persistent volumes and corresponding claims were created successfully for all three pods:
  1. $ kubectl get persistentvolumes
  1. NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM REASON AGE
  2. pvc-52f51ecf-8bd5-11e6-a4f4-42010a800002 1Gi RWO Delete Bound default/datadir-cockroachdb-0 26s
  3. pvc-52fd3a39-8bd5-11e6-a4f4-42010a800002 1Gi RWO Delete Bound default/datadir-cockroachdb-1 27s
  4. pvc-5315efda-8bd5-11e6-a4f4-42010a800002 1Gi RWO Delete Bound default/datadir-cockroachdb-2 27s
  1. $ kubectl create -f https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/cluster-init-secure.yaml
  1. job "cluster-init-secure" created
  • Approve the CSR for the one-off pod from which cluster initialization happens:
  1. $ kubectl certificate approve default.client.root
  1. certificatesigningrequest "default.client.root" approved
  • Confirm that cluster initialization has completed successfully. The jobshould be considered successful and the CockroachDB pods should soon beconsidered Ready:
  1. $ kubectl get job cluster-init-secure
  1. NAME DESIRED SUCCESSFUL AGE
  2. cluster-init-secure 1 1 2m
  1. $ kubectl get pods
  1. NAME READY STATUS RESTARTS AGE
  2. cockroachdb-0 1/1 Running 0 3m
  3. cockroachdb-1 1/1 Running 0 3m
  4. cockroachdb-2 1/1 Running 0 3m

Tip:

The StatefulSet configuration sets all CockroachDB nodes to log to stderr, so if you ever need access to a pod/node's logs to troubleshoot, use kubectl logs <podname> rather than checking the log on the persistent volume.

In the likely case that your Kubernetes cluster uses RBAC (e.g., if you are using GKE), you need to create RBAC resources to grant Tiller access to the Kubernetes API:

  • Create a rbac-config.yaml file to define a role and service account:
  1. apiVersion: v1
  2. kind: ServiceAccount
  3. metadata:
  4. name: tiller
  5. namespace: kube-system
  6. ---
  7. apiVersion: rbac.authorization.k8s.io/v1
  8. kind: ClusterRoleBinding
  9. metadata:
  10. name: tiller
  11. roleRef:
  12. apiGroup: rbac.authorization.k8s.io
  13. kind: ClusterRole
  14. name: cluster-admin
  15. subjects:
  16. - kind: ServiceAccount
  17. name: tiller
  18. namespace: kube-system
  • Create the service account:
  1. $ kubectl create -f rbac-config.yaml
  1. serviceaccount "tiller" created
  2. clusterrolebinding "tiller" created
  • Start the Helm server:
  1. $ helm init --service-account tiller
  • Install the CockroachDB Helm chart, providing a "release" name to identify and track this particular deployment of the chart and setting the Secure.Enabled parameter to true:

Note:

This tutorial uses my-release as the release name. If you use a different value, be sure to adjust the release name in subsequent commands.

  1. $ helm install --name my-release --set Secure.Enabled=true stable/cockroachdb

Behind the scenes, this command uses our cockroachdb-statefulset.yaml file to create the StatefulSet that automatically creates 3 pods, each with a CockroachDB node running inside it, where each pod has distinguishable network identity and always binds back to the same persistent storage on restart.

Note:

You can customize your deployment by passing additional configuration parameters to helm install using the —set key=value[,key=value] flag. For a production cluster, you should consider modifying the Storage and StorageClass parameters. This chart defaults to 100 GiB of disk space per pod, but you may want more or less depending on your use case, and the default persistent volume StorageClass in your environment may not be what you want for a database (e.g., on GCE and Azure the default is not SSD).

  • As each pod is created, it issues a Certificate Signing Request, or CSR, to have the node's certificate signed by the Kubernetes CA. You must manually check and approve each node's certificates, at which point the CockroachDB node is started in the pod.

    • Get the name of the Pending CSR for the first pod:
  1. $ kubectl get csr
  1. NAME AGE REQUESTOR CONDITION
  2. default.client.root 21s system:serviceaccount:default:my-release-cockroachdb Pending
  3. default.node.my-release-cockroachdb-0 15s system:serviceaccount:default:my-release-cockroachdb Pending
  4. default.node.my-release-cockroachdb-1 16s system:serviceaccount:default:my-release-cockroachdb Pending
  5. default.node.my-release-cockroachdb-2 15s system:serviceaccount:default:my-release-cockroachdb Pending

If you do not see a Pending CSR, wait a minute and try again.

  • Examine the CSR for the first pod:
  1. $ kubectl describe csr default.node.my-release-cockroachdb-0
  1. Name: default.node.my-release-cockroachdb-0
  2. Labels: <none>
  3. Annotations: <none>
  4. CreationTimestamp: Mon, 10 Dec 2018 05:36:35 -0500
  5. Requesting User: system:serviceaccount:default:my-release-cockroachdb
  6. Status: Pending
  7. Subject:
  8. Common Name: node
  9. Serial Number:
  10. Organization: Cockroach
  11. Subject Alternative Names:
  12. DNS Names: localhost
  13. my-release-cockroachdb-0.my-release-cockroachdb.default.svc.cluster.local
  14. my-release-cockroachdb-0.my-release-cockroachdb
  15. my-release-cockroachdb-public
  16. my-release-cockroachdb-public.default.svc.cluster.local
  17. IP Addresses: 127.0.0.1
  18. 10.48.1.6
  19. Events: <none>
  • If everything looks correct, approve the CSR for the first pod:
  1. $ kubectl certificate approve default.node.my-release-cockroachdb-0
  1. certificatesigningrequest "default.node.my-release-cockroachdb-0" approved
  • Repeat steps 1-3 for the other 2 pods.
  • Confirm that three pods are Running successfully:
  1. $ kubectl get pods
  1. NAME READY STATUS RESTARTS AGE
  2. my-release-cockroachdb-0 0/1 Running 0 6m
  3. my-release-cockroachdb-1 0/1 Running 0 6m
  4. my-release-cockroachdb-2 0/1 Running 0 6m
  5. my-release-cockroachdb-init-hxzsc 0/1 Init:0/1 0 6m
  • Approve the CSR for the one-off pod from which cluster initialization happens:
  1. $ kubectl certificate approve default.client.root
  1. certificatesigningrequest "default.client.root" approved
  • Confirm that cluster initialization has completed successfully, with each pod showing 1/1 under READY:
  1. $ kubectl get pods
  1. NAME READY STATUS RESTARTS AGE
  2. my-release-cockroachdb-0 1/1 Running 0 8m
  3. my-release-cockroachdb-1 1/1 Running 0 8m
  4. my-release-cockroachdb-2 1/1 Running 0 8m
  • Confirm that the persistent volumes and corresponding claims were created successfully for all three pods:
  1. $ kubectl get persistentvolumes
  1. NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
  2. pvc-71019b3a-fc67-11e8-a606-080027ba45e5 100Gi RWO Delete Bound default/datadir-my-release-cockroachdb-0 standard 11m
  3. pvc-7108e172-fc67-11e8-a606-080027ba45e5 100Gi RWO Delete Bound default/datadir-my-release-cockroachdb-1 standard 11m
  4. pvc-710dcb66-fc67-11e8-a606-080027ba45e5 100Gi RWO Delete Bound default/datadir-my-release-cockroachdb-2 standard 11m

Tip:

The StatefulSet configuration sets all CockroachDB nodes to log to stderr, so if you ever need access to a pod/node's logs to troubleshoot, use kubectl logs <podname> rather than checking the log on the persistent volume.

Step 3. Use the built-in SQL client

To use the built-in SQL client, you need to launch a pod that runs indefinitely with the cockroach binary inside it, get a shell into the pod, and then start the built-in SQL client.

  • From your local workstation, use our client-secure.yaml file to launch a pod and keep it running indefinitely:
  1. $ kubectl create -f https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/client-secure.yaml
  1. pod "cockroachdb-client-secure" created

The pod uses the root client certificate created earlier to initialize the cluster, so there's no CSR approval required.

  1. $ kubectl exec -it cockroachdb-client-secure -- ./cockroach sql --certs-dir=/cockroach-certs --host=cockroachdb-public
  1. # Welcome to the cockroach SQL interface.
  2. # All statements must be terminated by a semicolon.
  3. # To exit: CTRL + D.
  4. #
  5. # Server version: CockroachDB CCL v1.1.2 (linux amd64, built 2017/11/02 19:32:03, go1.8.3) (same version as client)
  6. # Cluster ID: 3292fe08-939f-4638-b8dd-848074611dba
  7. #
  8. # Enter \? for a brief introduction.
  9. #
  10. root@cockroachdb-public:26257/>
  1. > CREATE DATABASE bank;
  1. > CREATE TABLE bank.accounts (id INT PRIMARY KEY, balance DECIMAL);
  1. > INSERT INTO bank.accounts VALUES (1, 1000.50);
  1. > SELECT * FROM bank.accounts;
  1. +----+---------+
  2. | id | balance |
  3. +----+---------+
  4. | 1 | 1000.5 |
  5. +----+---------+
  6. (1 row)
  1. > CREATE USER roach WITH PASSWORD 'Q7gc8rEdS';

You will need this username and password to access the Admin UI later.

  • Exit the SQL shell and pod:
  1. > \q
  • From your local workstation, use our client-secure.yaml file to launch a pod and keep it running indefinitely.

    • Download the file:
  1. $ curl -OOOOOOOOO \
  2. https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/client-secure.yaml
  • In the file, change serviceAccountName: cockroachdb to serviceAccountName: my-release-cockroachdb.

  • Use the file to launch a pod and keep it running indefinitely:

  1. $ kubectl create -f client-secure.yaml
  1. pod "cockroachdb-client-secure" created

The pod uses the root client certificate created earlier to initialize the cluster, so there's no CSR approval required.

  1. $ kubectl exec -it cockroachdb-client-secure -- ./cockroach sql --certs-dir=/cockroach-certs --host=my-release-cockroachdb-public
  1. # Welcome to the cockroach SQL interface.
  2. # All statements must be terminated by a semicolon.
  3. # To exit: CTRL + D.
  4. #
  5. # Server version: CockroachDB CCL v1.1.2 (linux amd64, built 2017/11/02 19:32:03, go1.8.3) (same version as client)
  6. # Cluster ID: 3292fe08-939f-4638-b8dd-848074611dba
  7. #
  8. # Enter \? for a brief introduction.
  9. #
  10. root@my-release-cockroachdb-public:26257/>
  1. > CREATE DATABASE bank;
  1. > CREATE TABLE bank.accounts (id INT PRIMARY KEY, balance DECIMAL);
  1. > INSERT INTO bank.accounts VALUES (1, 1000.50);
  1. > SELECT * FROM bank.accounts;
  1. +----+---------+
  2. | id | balance |
  3. +----+---------+
  4. | 1 | 1000.5 |
  5. +----+---------+
  6. (1 row)
  1. > CREATE USER roach WITH PASSWORD 'Q7gc8rEdS';

You will need this username and password to access the Admin UI later.

  • Exit the SQL shell and pod:
  1. > \q

Tip:

This pod will continue running indefinitely, so any time you need to reopen the built-in SQL client or run any other cockroach client commands (e.g., cockroach node), repeat step 2 using the appropriate cockroach command.

If you'd prefer to delete the pod and recreate it when needed, run kubectl delete pod cockroachdb-client-secure.

Step 4. Access the Admin UI

To access the cluster's Admin UI:

  • Port-forward from your local machine to one of the pods:
  1. $ kubectl port-forward cockroachdb-0 8080
  1. $ kubectl port-forward my-release-cockroachdb-0 8080
  1. Forwarding from 127.0.0.1:8080 -> 8080

Note:
The port-forward command must be run on the same machine as the web browser in which you want to view the Admin UI. If you have been running these commands from a cloud instance or other non-local shell, you will not be able to view the UI without configuring kubectl locally and running the above port-forward command on your local machine.

  • Go to https://localhost:8080 and log in with the username and password you created earlier.

  • In the UI, verify that the cluster is running as expected:

    • Click View nodes list on the right to ensure that all nodes successfully joined the cluster.
    • Click the Databases tab on the left to verify that bank is listed.

Step 5. Simulate node failure

Based on the replicas: 3 line in the StatefulSet configuration, Kubernetes ensures that three pods/nodes are running at all times. When a pod/node fails, Kubernetes automatically creates another pod/node with the same network identity and persistent storage.

To see this in action:

  • Kill one of CockroachDB nodes:
  1. $ kubectl delete pod cockroachdb-2
  1. pod "cockroachdb-2" deleted
  1. $ kubectl delete pod my-release-cockroachdb-2
  1. pod "my-release-cockroachdb-2" deleted
  • In the Admin UI, the Cluster Overview will soon show one node as Suspect. As Kubernetes auto-restarts the node, watch how the node once again becomes healthy.

  • Back in the terminal, verify that the pod was automatically restarted:

  1. $ kubectl get pod cockroachdb-2
  1. NAME READY STATUS RESTARTS AGE
  2. cockroachdb-2 1/1 Running 0 12s
  1. $ kubectl get pod my-release-cockroachdb-2
  1. NAME READY STATUS RESTARTS AGE
  2. my-release-cockroachdb-2 1/1 Running 0 44s

Step 6. Set up monitoring and alerting

Despite CockroachDB's various built-in safeguards against failure, it is critical to actively monitor the overall health and performance of a cluster running in production and to create alerting rules that promptly send notifications when there are events that require investigation or intervention.

Configure Prometheus

Every node of a CockroachDB cluster exports granular timeseries metrics formatted for easy integration with Prometheus, an open source tool for storing, aggregating, and querying timeseries data. This section shows you how to orchestrate Prometheus as part of your Kubernetes cluster and pull these metrics into Prometheus for external monitoring.

This guidance is based on CoreOS's Prometheus Operator, which allows a Prometheus instance to be managed using native Kubernetes concepts.

Note:

If you're on Hosted GKE, before starting, make sure the email address associated with your Google Cloud account is part of the cluster-admin RBAC group, as shown in Step 1. Start Kubernetes.

  • From your local workstation, edit the cockroachdb service to add the prometheus: cockroachdb label:
  1. $ kubectl label svc cockroachdb prometheus=cockroachdb
  1. service "cockroachdb" labeled

This ensures that there is a prometheus job and monitoring data only for the cockroachdb service, not for the cockroach-public service.

  1. $ kubectl label svc my-release-cockroachdb prometheus=cockroachdb
  1. service "cockroachdb" labeled

This ensures that there is a prometheus job and monitoring data only for the my-release-cockroachdb service, not for the my-release-cockroach-public service.

  1. $ kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.20/bundle.yaml
  1. clusterrolebinding "prometheus-operator" created
  2. clusterrole "prometheus-operator" created
  3. serviceaccount "prometheus-operator" created
  4. deployment "prometheus-operator" created
  • Confirm that the prometheus-operator has started:
  1. $ kubectl get deploy prometheus-operator
  1. NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
  2. prometheus-operator 1 1 1 1 1m
  • Use our prometheus.yaml file to create the various objects necessary to run a Prometheus instance:
  1. $ kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/prometheus/prometheus.yaml
  1. clusterrole "prometheus" created
  2. clusterrolebinding "prometheus" created
  3. servicemonitor "cockroachdb" created
  4. prometheus "cockroachdb" created
  • Access the Prometheus UI locally and verify that CockroachDB is feeding data into Prometheus:

    • Port-forward from your local machine to the pod running Prometheus:
  1. $ kubectl port-forward prometheus-cockroachdb-0 9090
  • Go to http://localhost:9090 in your browser.

  • To verify that each CockroachDB node is connected to Prometheus, go to Status > Targets. The screen should look like this:

Prometheus targets

  • To verify that data is being collected, go to Graph, enter the sys_uptime variable in the field, click Execute, and then click the Graph tab. The screen should like this:

Prometheus graph

Tip:

Prometheus auto-completes CockroachDB time series metrics for you, but if you want to see a full listing, with descriptions, port-forward as described in Access the Admin UI and then point your browser to http://localhost:8080/_status/vars.

For more details on using the Prometheus UI, see their official documentation.

Configure Alertmanager

Active monitoring helps you spot problems early, but it is also essential to send notifications when there are events that require investigation or intervention. This section shows you how to use Alertmanager and CockroachDB's starter alerting rules to do this.

  1. $ kubectl create secret generic alertmanager-cockroachdb --from-file=alertmanager.yaml=alertmanager-config.yaml
  1. secret "alertmanager-cockroachdb" created
  1. $ kubectl label secret alertmanager-cockroachdb app=cockroachdb
  1. secret "alertmanager-cockroachdb" labeled

Warning:

The name of the secret, alertmanager-cockroachdb, must match the name used in the altermanager.yaml file. If they differ, the Alertmanager instance will start without configuration, and nothing will happen.

  • Use our alertmanager.yaml file to create the various objects necessary to run an Alertmanager instance, including a ClusterIP service so that Prometheus can forward alerts:
  1. $ kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/prometheus/alertmanager.yaml
  1. alertmanager "cockroachdb" created
  2. service "alertmanager-cockroachdb" created
  • Verify that Alertmanager is running:

    • Port-forward from your local machine to the pod running Alertmanager:
  1. $ kubectl port-forward alertmanager-cockroachdb-0 9093

Alertmanager

Alertmanager

  1. $ kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/prometheus/alert-rules.yaml
  1. prometheusrule "prometheus-cockroachdb-rules" created

Alertmanager

Alertmanager

  • To remove the example alert:

    • Use the kubectl edit command to open the rules for editing:
  1. $ kubectl edit prometheusrules prometheus-cockroachdb-rules
  • Remove the dummy.rules block and save the file:
  1. - name: rules/dummy.rules
  2. rules:
  3. - alert: TestAlertManager
  4. expr: vector(1)

Step 7. Maintain the cluster

Add nodes

The Kubernetes cluster contains 4 nodes, one master and 3 workers. Pods get placed only on worker nodes, so to ensure that you do not have two pods on the same node (as recommended in our production best practices), you need to add a new worker node and then edit your StatefulSet configuration to add another pod.The Kubernetes cluster we created contains 3 nodes that pods can be run on. To ensure that you do not have two pods on the same node (as recommended in our production best practices), you need to add a new node and then edit your StatefulSet configuration to add another pod.

  1. $ kubectl scale statefulset cockroachdb --replicas=4
  1. statefulset "cockroachdb" scaled
  1. $ kubectl scale statefulset my-release-cockroachdb --replicas=4
  1. statefulset "my-release-cockroachdb" scaled
  • Get the name of the Pending CSR for the new pod:
  1. $ kubectl get csr
  1. NAME AGE REQUESTOR CONDITION
  2. default.client.root 1h system:serviceaccount:default:default Approved,Issued
  3. default.node.cockroachdb-0 1h system:serviceaccount:default:default Approved,Issued
  4. default.node.cockroachdb-1 1h system:serviceaccount:default:default Approved,Issued
  5. default.node.cockroachdb-2 1h system:serviceaccount:default:default Approved,Issued
  6. default.node.cockroachdb-3 2m system:serviceaccount:default:default Pending
  7. node-csr-0Xmb4UTVAWMEnUeGbW4KX1oL4XV_LADpkwjrPtQjlZ4 1h kubelet Approved,Issued
  8. node-csr-NiN8oDsLhxn0uwLTWa0RWpMUgJYnwcFxB984mwjjYsY 1h kubelet Approved,Issued
  9. node-csr-aU78SxyU69pDK57aj6txnevr7X-8M3XgX9mTK0Hso6o 1h kubelet Approved,Issued

If you do not see a Pending CSR, wait a minute and try again.

  • Examine the CSR for the new pod:
  1. $ kubectl describe csr default.node.cockroachdb-3
  1. Name: default.node.cockroachdb-0
  2. Labels: <none>
  3. Annotations: <none>
  4. CreationTimestamp: Thu, 09 Nov 2017 13:39:37 -0500
  5. Requesting User: system:serviceaccount:default:default
  6. Status: Pending
  7. Subject:
  8. Common Name: node
  9. Serial Number:
  10. Organization: Cockroach
  11. Subject Alternative Names:
  12. DNS Names: localhost
  13. cockroachdb-0.cockroachdb.default.svc.cluster.local
  14. cockroachdb-public
  15. IP Addresses: 127.0.0.1
  16. 10.48.1.6
  17. Events: <none>
  • If everything looks correct, approve the CSR for the new pod:
  1. $ kubectl certificate approve default.node.cockroachdb-3
  1. certificatesigningrequest "default.node.cockroachdb-3" approved
  • Verify that the new pod started successfully:
  1. $ kubectl get pods
  1. NAME READY STATUS RESTARTS AGE
  2. cockroachdb-0 1/1 Running 0 51m
  3. cockroachdb-1 1/1 Running 0 47m
  4. cockroachdb-2 1/1 Running 0 3m
  5. cockroachdb-3 1/1 Running 0 1m
  6. cockroachdb-client-secure 1/1 Running 0 15m
  • Back in the Admin UI, view Node List to ensure that the fourth node successfully joined the cluster.

Remove nodes

To safely remove a node from your cluster, you must first decommission the node and only then adjust the —replicas value of your StatefulSet configuration to permanently remove it. This sequence is important because the decommissioning process lets a node finish in-flight requests, rejects any new requests, and transfers all range replicas and range leases off the node.

Warning:

If you remove nodes without first telling CockroachDB to decommission them, you may cause data or even cluster unavailability. For more details about how this works and what to consider before removing nodes, see Decommission Nodes.

  • Get a shell into the cockroachdb-client-secure pod you created earlier and use the cockroach node status command to get the internal IDs of nodes:
  1. $ kubectl exec -it cockroachdb-client-secure -- ./cockroach node status --certs-dir=/cockroach-certs --host=cockroachdb-public
  1. id | address | build | started_at | updated_at | is_available | is_live
  2. +----+---------------------------------------------------------------------------------+--------+----------------------------------+----------------------------------+--------------+---------+
  3. 1 | cockroachdb-0.cockroachdb.default.svc.cluster.local:26257 | v2.1.1 | 2018-11-29 16:04:36.486082+00:00 | 2018-11-29 18:24:24.587454+00:00 | true | true
  4. 2 | cockroachdb-2.cockroachdb.default.svc.cluster.local:26257 | v2.1.1 | 2018-11-29 16:55:03.880406+00:00 | 2018-11-29 18:24:23.469302+00:00 | true | true
  5. 3 | cockroachdb-1.cockroachdb.default.svc.cluster.local:26257 | v2.1.1 | 2018-11-29 16:04:41.383588+00:00 | 2018-11-29 18:24:25.030175+00:00 | true | true
  6. 4 | cockroachdb-3.cockroachdb.default.svc.cluster.local:26257 | v2.1.1 | 2018-11-29 17:31:19.990784+00:00 | 2018-11-29 18:24:26.041686+00:00 | true | true
  7. (4 rows)
  1. $ kubectl exec -it cockroachdb-client-secure -- ./cockroach node status --certs-dir=/cockroach-certs --host=my-release-cockroachdb-public
  1. id | address | build | started_at | updated_at | is_available | is_live
  2. +----+---------------------------------------------------------------------------------+--------+----------------------------------+----------------------------------+--------------+---------+
  3. 1 | my-release-cockroachdb-0.my-release-cockroachdb.default.svc.cluster.local:26257 | v2.1.1 | 2018-11-29 16:04:36.486082+00:00 | 2018-11-29 18:24:24.587454+00:00 | true | true
  4. 2 | my-release-cockroachdb-2.my-release-cockroachdb.default.svc.cluster.local:26257 | v2.1.1 | 2018-11-29 16:55:03.880406+00:00 | 2018-11-29 18:24:23.469302+00:00 | true | true
  5. 3 | my-release-cockroachdb-1.my-release-cockroachdb.default.svc.cluster.local:26257 | v2.1.1 | 2018-11-29 16:04:41.383588+00:00 | 2018-11-29 18:24:25.030175+00:00 | true | true
  6. 4 | my-release-cockroachdb-3.my-release-cockroachdb.default.svc.cluster.local:26257 | v2.1.1 | 2018-11-29 17:31:19.990784+00:00 | 2018-11-29 18:24:26.041686+00:00 | true | true
  7. (4 rows)

The pod uses the root client certificate created earlier to initialize the cluster, so there's no CSR approval required.

  • Note the ID of the node with the highest number in its address (in this case, the address including cockroachdb-3) and use the cockroach node decommission command to decommission it:

Note:

It's important to decommission the node with the highest number in its address because, when you reduce the —replica count, Kubernetes will remove the pod for that node.

  1. $ kubectl exec -it cockroachdb-client-secure -- ./cockroach node decommission <node ID> --insecure --host=cockroachdb-public
  1. $ kubectl exec -it cockroachdb-client-secure -- ./cockroach node decommission <node ID> --insecure --host=my-release-cockroachdb-public

You'll then see the decommissioning status print to stderr as it changes:

  1. id | is_live | replicas | is_decommissioning | is_draining
  2. +---+---------+----------+--------------------+-------------+
  3. 4 | true | 73 | true | false
  4. (1 row)

Once the node has been fully decommissioned and stopped, you'll see a confirmation:

  1. id | is_live | replicas | is_decommissioning | is_draining
  2. +---+---------+----------+--------------------+-------------+
  3. 4 | true | 0 | true | false
  4. (1 row)
  5. No more data reported on target nodes. Please verify cluster health before removing the nodes.
  • Once the node has been decommissioned, use the kubectl scale command to remove a pod from your StatefulSet:
  1. $ kubectl scale statefulset cockroachdb --replicas=3
  1. statefulset "cockroachdb" scaled
  1. $ kubectl scale statefulset my-release-cockroachdb --replicas=3
  1. statefulset "my-release-cockroachdb" scaled

Upgrade the cluster

As new versions of CockroachDB are released, it's strongly recommended to upgrade to newer versions in order to pick up bug fixes, performance improvements, and new features. The general CockroachDB upgrade documentation provides best practices for how to prepare for and execute upgrades of CockroachDB clusters, but the mechanism of actually stopping and restarting processes in Kubernetes is somewhat special.

Kubernetes knows how to carry out a safe rolling upgrade process of the CockroachDB nodes. When you tell it to change the Docker image used in the CockroachDB StatefulSet, Kubernetes will go one-by-one, stopping a node, restarting it with the new image, and waiting for it to be ready to receive client requests before moving on to the next one. For more information, see the Kubernetes documentation.

  • Decide how the upgrade will be finalized.

Note:

This step is relevant only when upgrading from v2.1.x to v19.1. For upgrades within the v19.1.x series, skip this step.

By default, after all nodes are running the new version, the upgrade process will be auto-finalized. This will enable certain performance improvements and bug fixes introduced in v19.1. After finalization, however, it will no longer be possible to perform a downgrade to v2.1. In the event of a catastrophic failure or corruption, the only option will be to start a new cluster using the old binary and then restore from one of the backups created prior to performing the upgrade.

We recommend disabling auto-finalization so you can monitor the stability and performance of the upgraded cluster before finalizing the upgrade:

  • Get a shell into the pod with the cockroach binary created earlier and start the CockroachDB built-in SQL client:
  1. $ kubectl exec -it cockroachdb-client-secure -- ./cockroach sql --certs-dir=/cockroach-certs --host=cockroachdb-public
  1. $ kubectl exec -it cockroachdb-client-secure -- ./cockroach sql --certs-dir=/cockroach-certs --host=my-release-cockroachdb-public
  1. > SET CLUSTER SETTING cluster.preserve_downgrade_option = '2.0';
  • Kick off the upgrade process by changing the desired Docker image. To do so, pick the version that you want to upgrade to, then run the following command, replacing "VERSION" with your desired new version:
  1. $ kubectl patch statefulset cockroachdb --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"cockroachdb/cockroach:VERSION"}]'
  1. statefulset "cockroachdb" patched
  1. $ kubectl patch statefulset my-release-cockroachdb --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"cockroachdb/cockroach:VERSION"}]'
  1. statefulset "my-release0-cockroachdb" patched
  • If you then check the status of your cluster's pods, you should see one of them being restarted:
  1. $ kubectl get pods
  1. NAME READY STATUS RESTARTS AGE
  2. cockroachdb-0 1/1 Running 0 2m
  3. cockroachdb-1 1/1 Running 0 2m
  4. cockroachdb-2 1/1 Running 0 2m
  5. cockroachdb-3 0/1 Terminating 0 1m
  1. NAME READY STATUS RESTARTS AGE
  2. my-release-cockroachdb-0 1/1 Running 0 2m
  3. my-release-cockroachdb-1 1/1 Running 0 2m
  4. my-release-cockroachdb-2 1/1 Running 0 2m
  5. my-release-cockroachdb-3 0/1 Terminating 0 1m
  • This will continue until all of the pods have restarted and are running the new image. To check the image of each pod to determine whether they've all be upgraded, run:
  1. $ kubectl get pods -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}'
  1. cockroachdb-0 cockroachdb/cockroach:v19.1.0
  2. cockroachdb-1 cockroachdb/cockroach:v19.1.0
  3. cockroachdb-2 cockroachdb/cockroach:v19.1.0
  4. cockroachdb-3 cockroachdb/cockroach:v19.1.0
  1. my-release-cockroachdb-0 cockroachdb/cockroach:v19.1.0
  2. my-release-cockroachdb-1 cockroachdb/cockroach:v19.1.0
  3. my-release-cockroachdb-2 cockroachdb/cockroach:v19.1.0
  4. my-release-cockroachdb-3 cockroachdb/cockroach:v19.1.0
  • Finish the upgrade.

Note:
This step is relevant only when upgrading from v2.1.x to v19.1. For upgrades within the v19.1.x series, skip this step.

If you disabled auto-finalization in step 1 above, monitor the stability and performance of your cluster for as long as you require to feel comfortable with the upgrade (generally at least a day). If during this time you decide to roll back the upgrade, repeat the rolling restart procedure with the old binary.

Once you are satisfied with the new version, re-enable auto-finalization:

  • Get a shell into the pod with the cockroach binary created earlier and start the CockroachDB built-in SQL client:
  1. $ kubectl exec -it cockroachdb-client-secure -- ./cockroach sql --certs-dir=/cockroach-certs --host=cockroachdb-public
  1. $ kubectl exec -it cockroachdb-client-secure -- ./cockroach sql --certs-dir=/cockroach-certs --host=my-release-cockroachdb-public
  • Re-enable auto-finalization:
  1. > RESET CLUSTER SETTING cluster.preserve_downgrade_option;

Stop the cluster

To shut down the CockroachDB cluster:

  • Delete all of the resources associated with the cockroachdb label, including the logs, remote persistent volumes, and Prometheus and Alertmanager resources:
  1. $ kubectl delete pods,statefulsets,services,persistentvolumeclaims,persistentvolumes,poddisruptionbudget,jobs,rolebinding,clusterrolebinding,role,clusterrole,serviceaccount,alertmanager,prometheus,prometheusrule,serviceMonitor -l app=cockroachdb
  1. pod "cockroachdb-0" deleted
  2. pod "cockroachdb-1" deleted
  3. pod "cockroachdb-2" deleted
  4. service "alertmanager-cockroachdb" deleted
  5. service "cockroachdb" deleted
  6. service "cockroachdb-public" deleted
  7. persistentvolumeclaim "datadir-cockroachdb-0" deleted
  8. persistentvolumeclaim "datadir-cockroachdb-1" deleted
  9. persistentvolumeclaim "datadir-cockroachdb-2" deleted
  10. poddisruptionbudget "cockroachdb-budget" deleted
  11. job "cluster-init-secure" deleted
  12. rolebinding "cockroachdb" deleted
  13. clusterrolebinding "cockroachdb" deleted
  14. clusterrolebinding "prometheus" deleted
  15. role "cockroachdb" deleted
  16. clusterrole "cockroachdb" deleted
  17. clusterrole "prometheus" deleted
  18. serviceaccount "cockroachdb" deleted
  19. serviceaccount "prometheus" deleted
  20. alertmanager "cockroachdb" deleted
  21. prometheus "cockroachdb" deleted
  22. prometheusrule "prometheus-cockroachdb-rules" deleted
  23. servicemonitor "cockroachdb" deleted
  • Delete the pod created for cockroach client commands, if you didn't do so earlier:
  1. $ kubectl delete pod cockroachdb-client-secure
  1. pod "cockroachdb-client-secure" deleted
  • Get the names of the CSRs for the cluster:
  1. $ kubectl get csr
  1. NAME AGE REQUESTOR CONDITION
  2. default.client.root 1h system:serviceaccount:default:default Approved,Issued
  3. default.node.cockroachdb-0 1h system:serviceaccount:default:default Approved,Issued
  4. default.node.cockroachdb-1 1h system:serviceaccount:default:default Approved,Issued
  5. default.node.cockroachdb-2 1h system:serviceaccount:default:default Approved,Issued
  6. default.node.cockroachdb-3 12m system:serviceaccount:default:default Approved,Issued
  7. node-csr-0Xmb4UTVAWMEnUeGbW4KX1oL4XV_LADpkwjrPtQjlZ4 1h kubelet Approved,Issued
  8. node-csr-NiN8oDsLhxn0uwLTWa0RWpMUgJYnwcFxB984mwjjYsY 1h kubelet Approved,Issued
  9. node-csr-aU78SxyU69pDK57aj6txnevr7X-8M3XgX9mTK0Hso6o 1h kubelet Approved,Issued
  • Delete the CSRs that you created:
  1. $ kubectl delete csr default.client.root default.node.cockroachdb-0 default.node.cockroachdb-1 default.node.cockroachdb-2 default.node.cockroachdb-3
  1. certificatesigningrequest "default.client.root" deleted
  2. certificatesigningrequest "default.node.cockroachdb-0" deleted
  3. certificatesigningrequest "default.node.cockroachdb-1" deleted
  4. certificatesigningrequest "default.node.cockroachdb-2" deleted
  5. certificatesigningrequest "default.node.cockroachdb-3" deleted
  • Get the names of the secrets for the cluster:
  1. $ kubectl get secrets
  1. NAME TYPE DATA AGE
  2. alertmanager-cockroachdb Opaque 1 1h
  3. default-token-d9gff kubernetes.io/service-account-token 3 5h
  4. default.client.root Opaque 2 5h
  5. default.node.cockroachdb-0 Opaque 2 5h
  6. default.node.cockroachdb-1 Opaque 2 5h
  7. default.node.cockroachdb-2 Opaque 2 5h
  8. default.node.cockroachdb-3 Opaque 2 5h
  9. prometheus-operator-token-bpdv8 kubernetes.io/service-account-token 3 3h
  • Delete the secrets that you created:
  1. $ kubectl delete secrets alertmanager-cockroachdb default.client.root default.node.cockroachdb-0 default.node.cockroachdb-1 default.node.cockroachdb-2 default.node.cockroachdb-3
  1. secret "alertmanager-cockroachdb" deleted
  2. secret "default.client.root" deleted
  3. secret "default.node.cockroachdb-0" deleted
  4. secret "default.node.cockroachdb-1" deleted
  5. secret "default.node.cockroachdb-2" deleted
  6. secret "default.node.cockroachdb-3" deleted
  • Stop Kubernetes:

    • Hosted GKE:
  1. $ gcloud container clusters delete cockroachdb
  • Manual GCE:
  1. $ cluster/kube-down.sh
  • Manual AWS:
  1. $ cluster/kube-down.sh

Warning:

If you stop Kubernetes without first deleting the persistent volumes, they will still exist in your cloud project.

See also

Was this page helpful?
YesNo