Following an upgrade to the latest version of Rancher, downstream Kubernetes clusters can be upgraded to use the latest supported version of Kubernetes.

Rancher calls RKE (Rancher Kubernetes Engine) as a library when provisioning and editing RKE clusters. For more information on configuring the upgrade strategy for RKE clusters, refer to the RKE documentation.

This section covers the following topics:

New Features

As of Rancher v2.3.0, the Kubernetes metadata feature was added, which allows Rancher to ship Kubernetes patch versions without upgrading Rancher. For details, refer to the section on Kubernetes metadata.

As of Rancher v2.4.0,

  • The ability to import K3s Kubernetes clusters into Rancher was added, along with the ability to upgrade Kubernetes when editing those clusters. For details, refer to the section on imported clusters.
  • New advanced options are exposed in the Rancher UI for configuring the upgrade strategy of an RKE cluster: Maximum Worker Nodes Unavailable and Drain nodes. These options leverage the new cluster upgrade process of RKE v1.1.0, in which worker nodes are upgraded in batches, so that applications can remain available during cluster upgrades, under certain conditions.

Tested Kubernetes Versions

Before a new version of Rancher is released, it’s tested with the latest minor versions of Kubernetes to ensure compatibility. For example, Rancher v2.3.0 is was tested with Kubernetes v1.15.4, v1.14.7, and v1.13.11. For details on which versions of Kubernetes were tested on each Rancher version, refer to the support maintenance terms.

How Upgrades Work

RKE v1.1.0 changed the way that clusters are upgraded.

In this section of the RKE documentation, you’ll learn what happens when you edit or upgrade your RKE Kubernetes cluster.

Recommended Best Practice for Upgrades

When upgrading the Kubernetes version of a cluster, we recommend that you:

  1. Take a snapshot.
  2. Initiate a Kubernetes upgrade.
  3. If the upgrade fails, revert the cluster to the pre-upgrade Kubernetes version. This is achieved by selecting the Restore etcd and Kubernetes version option. This will return your cluster to the pre-upgrade kubernetes version before restoring the etcd snapshot.

The restore operation will work on a cluster that is not in a healthy or active state.

When upgrading the Kubernetes version of a cluster, we recommend that you:

  1. Take a snapshot.
  2. Initiate a Kubernetes upgrade.
  3. If the upgrade fails, restore the cluster from the etcd snapshot.

The cluster cannot be downgraded to a previous Kubernetes version.

Upgrading the Kubernetes Version

Prerequisites:

  1. From the Global view, find the cluster for which you want to upgrade Kubernetes. Select ⋮ > Edit.

  2. Expand Cluster Options.

  3. From the Kubernetes Version drop-down, choose the version of Kubernetes that you want to use for the cluster.

  4. Click Save.

Result: Kubernetes begins upgrading for the cluster.

Rolling Back

Available as of v2.4

A cluster can be restored to a backup in which the previous Kubernetes version was used. For more information, refer to the following sections:

Configuring the Upgrade Strategy

As of RKE v1.1.0, additional upgrade options became available to give you more granular control over the upgrade process. These options can be used to maintain availability of your applications during a cluster upgrade if certain conditions and requirements are met.

The upgrade strategy can be configured in the Rancher UI, or by editing the cluster.yml. More advanced options are available by editing the cluster.yml.

Configuring the Maximum Unavailable Worker Nodes in the Rancher UI

From the Rancher UI, the maximum number of unavailable worker nodes can be configured. During a cluster upgrade, worker nodes will be upgraded in batches of this size.

By default, the maximum number of unavailable worker is defined as 10 percent of all worker nodes. This number can be configured as a percentage or as an integer. When defined as a percentage, the batch size is rounded down to the nearest node, with a minimum of one node.

To change the default number or percentage of worker nodes,

  1. Go to the cluster view in the Rancher UI.
  2. Click ⋮ > Edit.
  3. In the Advanced Options section, go to the Maxiumum Worker Nodes Unavailable field. Enter the percentage of worker nodes that can be upgraded in a batch. Optionally, select Count from the drop-down menu and enter the maximum unavailable worker nodes as an integer.
  4. Click Save.

Result: The cluster is updated to use the new upgrade strategy.

Enabling Draining Nodes During Upgrades from the Rancher UI

By default, RKE cordons each node before upgrading it. Draining is disabled during upgrades by default. If draining is enabled in the cluster configuration, RKE will both cordon and drain the node before it is upgraded.

To enable draining each node during a cluster upgrade,

  1. Go to the cluster view in the Rancher UI.
  2. Click ⋮ > Edit.
  3. In the Advanced Options section, go to the Drain nodes field and click Yes.
  4. Choose a safe or aggressive drain option. For more information about each option, refer to this section.
  5. Optionally, configure a grace period. The grace period is the timeout given to each pod for cleaning things up, so they will have chance to exit gracefully. Pods might need to finish any outstanding requests, roll back transactions or save state to some external storage. If this value is negative, the default value specified in the pod will be used.
  6. Optionally, configure a timeout, which is the amount of time the drain should continue to wait before giving up.
  7. Click Save.

Result: The cluster is updated to use the new upgrade strategy.

Note: As of Rancher v2.4.0, there is a known issue in which the Rancher UI doesn’t show state of etcd and controlplane as drained, even though they are being drained.

Maintaining Availability for Applications During Upgrades

Available as of RKE v1.1.0

In this section of the RKE documentation, you’ll learn the requirements to prevent downtime for your applications when upgrading the cluster.

Configuring the Upgrade Strategy in the cluster.yml

More advanced upgrade strategy configuration options are available by editing the cluster.yml.

For details, refer to Configuring the Upgrade Strategy in the RKE documentation. The section also includes an example cluster.yml for configuring the upgrade strategy.

Troubleshooting

If a node doesn’t come up after an upgrade, the rke up command errors out.

No upgrade will proceed if the number of unavailable nodes exceeds the configured maximum.

If an upgrade stops, you may need to fix an unavailable node or remove it from the cluster before the upgrade can continue.

A failed node could be in many different states:

  • Powered off
  • Unavailable
  • User drains a node while upgrade is in process, so there are no kubelets on the node
  • The upgrade itself failed

If the max unavailable number of nodes is reached during an upgrade, Rancher user clusters will be stuck in updating state and not move forward with upgrading any other control plane nodes. It will continue to evaluate the set of unavailable nodes in case one of the nodes becomes available. If the node cannot be fixed, you must remove the node in order to continue the upgrade.