Orchestrate CockroachDB with Mesosphere DC/OS (Insecure)

Orchestrate CockroachDB with Mesosphere DC/OS (Insecure)

This page shows you how to orchestrate the deployment and management of an insecure 3-node CockroachDB cluster with Mesosphere DC/OS.

Warning:
Deploying an insecure cluster is not recommended for data in production. We'll update this page once it's possible to orchestrate a secure cluster with Mesosphere DC/OS.

Before you begin

Before getting started, it's important to review some current requirements and limitations.

Requirements

Your cluster must have at least 3 private nodes.
If you are using Enterprise DC/OS, you may need to provision a service account before installing CockroachDB. Only someone with superuser permission can create the service account.

Security ModeService AccountstrictRequiredpermissiveOptionaldisabledNot Required

Limitations

CockroachDB in DC/OS works the same as in other environments with the exception of the following limitations:

The cockroachdb DC/OS service has been tested only on DC/OS versions 1.9 and 1.10.
Running in secure mode is not supported at this time.
Running a multi-datacenter cluster is not supported at this time.
Removing a node is not supported at this time.
Neither volume type nor volume size requirements may be changed after initial deployment.
Rack placement and awareness are not supported at this time.

Step 1. Install and Launch DC/OS

The fastest way to get up and running is to use the open source DC/OS template on AWS CloudFormation. However, you can find details about other open source or enterprise DC/OS installation methods in the official documentation:

Open Source DC/OS
Enterprise DC/OS
When using AWS CloudFormation, the launch process generally takes 10 to 15 minutes. Once you see the CREATE_COMPLETE status in the CloudFormation UI, be sure to launch DC/OS and install the DC/OS CLI.

Step 2. Start CockroachDB

Review the default CockroachDB configuration:

$ dcos package describe --config cockroachdb

The default CockroachDB configuration creates a 3-node CockroachDB cluster with reasonable defaults, but you may require different settings depending on the context of your deployment. To customize the settings for your deployment, create a cockroach.json file based on the output of the command above. Be sure to consider our Recommended Production Settings.

Start the CockroachDB cluster as a DC/OS service.
- If you are using the default configuration, run:

$ dcos package install cockroachdb

If you created a custom cockroach.json configuration, run:

$ dcos package install cockroachdb --options=cockroach.json

Monitor the cluster's deployment from the Services tab of the DC/OS UI.

Note:
You can install CockroachDB from the DC/OS UI as well.

Step 3. Test the cluster

Discover the endpoints for your CockroachDB cluster:

$ dcos cockroachdb endpoints pg

{
  "address": [
    "10.0.0.212:26257",
    "10.0.2.57:26257",
    "10.0.3.81:26257"
  ],
  "dns": [
    "cockroachdb-0-node-init.cockroachdb.autoip.dcos.thisdcos.directory:26257",
    "cockroachdb-1-node-join.cockroachdb.autoip.dcos.thisdcos.directory:26257",
    "cockroachdb-2-node-join.cockroachdb.autoip.dcos.thisdcos.directory:26257"
  ],
  "vip": "pg.cockroachdb.l4lb.thisdcos.directory:26257"
}

The endpoints returned will include:

.mesos hostnames for each instance that will follow the instances if they're moved within the DC/OS cluster.
A direct IP address for each instance, if .mesos hostnames are not resolvable.
A vip address, which is an HA-enabled hostname for accessing any of the instances. You'll use the vip address in some of the next steps.
In general, the .mesos endpoints will only work from within the same DC/OS cluster. From outside the cluster, you can either use the direct IPs or set up a proxy service that acts as a frontend to your CockroachDB instance. For development and testing purposes, you can use a DC/OS tunnel to access services from outside the cluster, but this option is not suitable for production use. See monitor the cluster below for more details.

SSH to the DC/OS master node:

$ dcos node ssh --master-proxy --leader

Start a temporary container and open the built-in SQL shell inside it, using the vip endpoint as the —host:

$ docker run -it cockroachdb/cockroach:v19.1.0  sql --insecure --host=pg.cockroachdb.l4lb.thisdcos.directory

# Welcome to the cockroach SQL interface.
# All statements must be terminated by a semicolon.
# To exit: CTRL + D.
root@pg.cockroachdb.l4lb.thisdcos.directory:26257/>

Run some basic CockroachDB SQL statements:

> CREATE DATABASE bank;

> CREATE TABLE bank.accounts (id INT PRIMARY KEY, balance DECIMAL);

> INSERT INTO bank.accounts VALUES (1, 1000.50);

> SELECT * FROM bank.accounts;

+----+---------+
| id | balance |
+----+---------+
|  1 |  1000.5 |
+----+---------+
(1 row)

Exit the SQL shell and delete the temporary pod:

> \q

Step 4. Monitor the cluster

To access the cluster's Admin UI, you can use a DC/OS tunnel to run an HTTP proxy:

Install the DC/OS tunnel package:

 $ dcos package install tunnel-cli --cli

Start a DC/OS tunnel:

$ sudo dcos tunnel http

In a browser, go to http://http.cockroachdb.l4lb.thisdcos.directory.mydcos.directory.

Step 5. Scale the cluster

The default cockroachdb service creates a 3-node CockroachDB cluster. You can add nodes to the cluster after the service has been launched by updating the Scheduler process:

In the DC/OS UI, go to the Services table.
Select the cockroachdb service.
In the upper right, select Edit.
Select Environment.
Update the NODE_COUNT variable to match the number of CockroachDB nodes you want.
Click Review & Run and then Run Service.
The Scheduler process will restart with the new configuration and will validate any detected changes. To check that nodes were successfully added to the cluster, go back to the Admin UI, view Node List, and check for the new nodes.

Alternately, you can SSH to the DC/OS master node and then run the cockroach node status command in a temporary container, again using the vip endpoint as the —host:

$ dcos node ssh --master-proxy --leader

$ docker run -it cockroachdb/cockroach:v19.1.0 node status --all --insecure --host=pg.cockroachdb.l4lb.thisdcos.directory

+----+--------------------------------------------------------------------------+----------------------+---------------------+---------------------+------------+-----------+-------------+--------------+--------------+------------------+-----------------------+--------+--------------------+-----------------------+
| id |                                 address                                  | build                |     updated_at      |     started_at      | live_bytes | key_bytes | value_bytes | intent_bytes | system_bytes | replicas_leaders | replicas_leaseholders | ranges | ranges_unavailable | ranges_underreplicated |
+----+--------------------------------------------------------------------------+----------------------+---------------------+---------------------+------------+-----------+-------------+--------------+--------------+------------------+-----------------------+--------+--------------------+-----------------------+
|  1 | cockroachdb-0-node-init.cockroachdb.autoip.dcos.thisdcos.directory:26257 | v19.1.0 | 2017-12-11 20:59:12 | 2017-12-11 19:14:42 |   41183973 |      1769 |    41187432 |            0 |         6018 |                2 |                     2 |      2 |                  0 |                      0 |
|  2 | cockroachdb-1-node-join.cockroachdb.autoip.dcos.thisdcos.directory:26257 | v19.1.0 | 2017-12-11 20:59:12 | 2017-12-11 19:14:52 |     115448 |     71037 |      209282 |            0 |         6218 |                4 |                     4 |      4 |                  0 |                      0 |
|  3 | cockroachdb-2-node-join.cockroachdb.autoip.dcos.thisdcos.directory:26257 | v19.1.0 | 2017-12-11 20:59:03 | 2017-12-11 19:14:53 |     120325 |     72652 |      217422 |            0 |         6732 |                4 |                     3 |      4 |                  0 |                      0 |
|  4 | cockroachdb-3-node-join.cockroachdb.autoip.dcos.thisdcos.directory:26257 | v19.1.0 | 2017-12-11 20:59:03 | 2017-12-11 20:21:43 |   41248030 |     79147 |    41338632 |            0 |         6569 |                1 |                     1 |      1 |                  0 |                      0 |
|  5 | cockroachdb-4-node-join.cockroachdb.autoip.dcos.thisdcos.directory:26257 | v19.1.0 | 2017-12-11 20:59:04 | 2017-12-11 20:56:54 |   41211967 |     30550 |    41181417 |            0 |         6854 |                1 |                     1 |      1 |                  0 |                      0 |
+----+--------------------------------------------------------------------------+--------+---------------------+---------------------+------------+-----------+-------------+--------------+--------------+------------------+-----------------------+--------+--------------------+-------------+-----------------------+
(5 rows)

Step 6. Maintain the cluster

Choose the relevant maintenance task:

Update configurations

In addition to adding nodes, you can change the CPU and Memory requirements for nodes, update placement constraints, and make changes to other service settings. Just follow the instructions in the previous step, but change the environment variable for the update you want to make.

After making a change, the scheduler will restart and automatically deploy any detected changes to the service, one node at a time. For example, a given change will first be applied to cockroachdb-0, then cockroachdb-1, and so on.
Nodes are configured with a "Readiness check" to ensure that the underlying service appears to be in a healthy state before applying a given change to the next node in the sequence. However, this basic check is not foolproof and reasonable care should be taken to ensure that a given configuration change will not negatively affect the behavior of the service.

Restart a Node

You can restart a node while keeping it at its current location with its current persistent volume data. This may be thought of as similar to restarting a system process, but it also deletes any data that is not on a persistent volume.

Get the pod name for the node you want to restart:

$ dcos cockroachdb pod list

[
  "cockroachdb-0",
  "cockroachdb-1",
  "cockroachdb-2",
  "cockroachdb-3",
  "cockroachdb-4",
  "metrics-0"
]

Restart the relevant pod:

$ dcos cockroachdb pods restart cockroachdb-<NUM>

Replace a Node

You can move a node to a new system and discard the persistent volumes at the prior system to be rebuilt at the new system. Nodes are not moved automatically, so this step must be performed, for example, before a system is offlined or when a system has already been offlined.

Get the pod name for the node you want to restart:

$ dcos cockroachdb pod list

[
  "cockroachdb-0",
  "cockroachdb-1",
  "cockroachdb-2",
  "cockroachdb-3",
  "cockroachdb-4",
  "metrics-0"
]

Stop and restart the pod at a new location in the DC/OS cluster:

$ dcos cockroachdb pods replace cockroachdb-<NUM>

Troubleshoot (access logs)

Logs for the Scheduler and service (i.e., CockroachDB) can be viewed from the DC/OS web interface.

Scheduler logs are useful for determining why a node isn't being launched (this is under the purview of the Scheduler).
Node logs are useful for examining problems in the service itself, i.e., CockroachDB.
In all cases, logs are generally piped to files named stdout and/or stderr.

To view logs for a given node:

In the DC/OS UI, go to the Services table.
Select the cockroachdb service.
In the list of tasks for the service, select the task to be examined. The Scheduler is named after the service, and nodes are cockroachdb-0-node-init or cockroachdb-#-node-join.
In the task details, go to the Logs tab.

Backup and restore

The cockroachdb DC/OS service provides an easy way use CockroachDB's open source cockroach dump command to back up data on a per-database basis to an S3 bucket and to restore data from such a backup. Note that using datastores other than S3 is not yet supported.

Tip:
If you need to back up to/restore from datasources other than S3, or you have a very large database and need faster backups, incremental backups, or a faster, distributed restore process, consider contacting Cockroach Labs about an enterprise license.

Backup

To backup the tables in a database, run the following command:

$ dcos cockroachdb backup [<flags>] <database> <s3-bucket>

You can configure the communication with S3 using the following optional flags:

Flag	Description
`—aws-access-key`	AWS Access Key
`—aws-secret-key`	AWS Secret Key
`—s3-dir`	AWS S3 target path
`—s3-backup-dir`	Target path within s3-dir
`—region`	AWS region

By default, the AWS access and secret keys will be pulled from your environment via the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables, respectively. You must either have these environment variables defined or specify the flags for the backup to work.

Make sure that you provision your nodes with enough disk space to perform a backup. The backups are stored on disk before being uploaded to S3, and will take up as much space as the data currently in the tables, so you'll need half of your total available space to be free to backup every keyspace at once.

Restore

To restore cluster data, run the following command:

$ dcos cockroachdb restore [<flags>] <database> <s3-bucket> <s3-backup-dir>

You can configure the communication with S3 using the following optional flags to the CLI command:

Flag	Description
`—aws-access-key`	AWS Access Key
`—aws-secret-key`	AWS Secret Key
`—s3-dir`	AWS S3 target path
`—s3-backup-dir`	Target path within s3-dir
`—region`	AWS region

Step 7. Stop the cluster

To shut down the CockroachDB cluster:

Uninstall the cockroachdb service:

$ MY_SERVICE_NAME=cockroachdb

$ dcos package uninstall --app-id=$MY_SERVICE_NAME $MY_SERVICE_NAME

If you're using DC/OS version 1.9, use the following command to clean up remaining reserved resources with the framework cleaner script, janitor.py. Note that this step is not needed for DC/OS 1.10.

$ dcos node ssh --master-proxy --leader "docker run mesosphere/janitor /janitor.py \
-r $MY_SERVICE_NAME-role \
-p $MY_SERVICE_NAME-principal \
-z dcos-service-$MY_SERVICE_NAME"

Uninstall DC/OS. If you used AWS CloudFormation, see Uninstalling DC/OS on AWS EC2.

Mesosphere DC/OS Deployment