Upgrading to Latest 1.2.x

Introduction

This guide explains how to best upgrade a multi-datacenter Consul deployment that is using a version of Consul >= 0.8.5 and < 1.2.4 while maintaining replication. If you are on a version older than 0.8.5, but are in the 0.8.x series, please upgrade to 0.8.5 by following our General Upgrade Process. If you are on a version older than 0.8.0, please contact support.

In this guide, we will be using an example with two datacenters (DCs) and will be referring to them as DC1 and DC2. DC1 will be the primary datacenter.

Requirements

  • All Consul servers should be on a version of Consul >= 0.8.5 and < 1.2.4.
  • You need a Consul cluster with at least 3 nodes to perform this upgrade as documented. If you either have a single node cluster or several single node clusters joined via WAN, the servers will come up in a No cluster leader loop after upgrading. If that happens, you will need to recover the cluster using the method described here. You can avoid this issue entirely by growing your cluster to 3 nodes prior to upgrading.

Assumptions

This guide makes the following assumptions:

  • You have at least two datacenters configured and have ACL replication enabled. If you are not using multiple datacenters, you can follow along and simply skip the instructions related to replication.

Considerations

There are not too many major changes that might cause issues upgrading from 1.0.8, but notable changes are called out in our Specific Version Details page. You can find more granular details in the full changelog. Looking through these changes prior to upgrading is highly recommended.

Procedure

1. Check replication status in DC1 by issuing the following curl command from a consul server in that DC:

  1. $ curl --silent --header "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/replication?pretty

You should receive output similar to this:

  1. {
  2. "Enabled": false,
  3. "Running": false,
  4. "SourceDatacenter": "",
  5. "ReplicatedIndex": 0,
  6. "LastSuccess": "0001-01-01T00:00:00Z",
  7. "LastError": "0001-01-01T00:00:00Z"
  8. }

The primary datacenter (indicated by acl_datacenter) will always show as having replication disabled, so this is normal even if replication is happening.

2. Check replication status in DC2 by issuing the following curl command from a consul server in that DC:

  1. $ curl --silent --header "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/replication?pretty

You should receive output similar to this:

  1. {
  2. "Enabled": true,
  3. "Running": true,
  4. "SourceDatacenter": "dc1",
  5. "ReplicatedIndex": 9,
  6. "LastSuccess": "2020-09-10T21:16:15Z",
  7. "LastError": "0001-01-01T00:00:00Z"
  8. }

3. Upgrade the Consul agents in all DCs to version 1.2.4 by following our General Upgrade Process. This should be done one DC at a time, leaving the primary DC for last.

4. Confirm that replication is still working in DC2 by issuing the following curl command from a consul server in that DC:

  1. $ curl --silent --header "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/replication?pretty

You should receive output similar to this:

  1. {
  2. "Enabled": true,
  3. "Running": true,
  4. "SourceDatacenter": "dc1",
  5. "ReplicatedIndex": 9,
  6. "LastSuccess": "2020-09-10T21:16:15Z",
  7. "LastError": "0001-01-01T00:00:00Z"
  8. }

Post-Upgrade Configuration Changes

If you moved from a pre-1.0.0 version of Consul, you will find that many of the configuration options were renamed. Backwards compatibility has been maintained, so your old config options will continue working after upgrading, but you will want to update those now to avoid issues when moving to newer versions.

The full list of changes is available here:

You can make sure your config changes are valid by copying your existing configuration files, making the changes, and then verifying them by using consul validate $CONFIG_FILE1_PATH $CONFIG_FILE2_PATH ....

Once your config is passing the validation check, replace your old config files with the new ones and slowly roll your cluster again one server at a time – leaving the leader agent for last in each datacenter.