Cluster Shutdown and Restart

This document describes the process of gracefully shutting down your cluster and how to restart it. You might need to temporarily shut down your cluster for maintenance reasons.

Warning

Shutting down a cluster is very dangerous. You must fully understand the operation and its consequences. Please make an etcd backup before you proceed. Usually, it is recommended to maintain your nodes one by one instead of restarting the whole cluster.

Prerequisites

Shutting Down Cluster

Tip

  • You must back up your etcd data before you shut down the cluster as your cluster can be restored if you encounter any issues when restarting the cluster.
  • Using the method in this tutorial can shut down a cluster gracefully, while the possibility of data corruption still exists.

Step 1: Get Node List

  1. nodes=$(kubectl get nodes -o name)

Step 2: Shut Down All Nodes

  1. for node in ${nodes[@]}
  2. do
  3. echo "==== Shut down $node ===="
  4. ssh $node sudo shutdown -h 1
  5. done

Then you can shut down other cluster dependencies, such as external storage.

Restart Cluster Gracefully

You can restart a cluster gracefully after shutting down the cluster gracefully.

Prerequisites

You have shut down your cluster gracefully.

Tip

Usually, a cluster can be used after restarting, but the cluster may be unavailable due to unexpected conditions. For example:

  • Etcd data corruption during the shutdown.
  • Node failures.
  • Unexpected network errors.

Step 1: Check All Cluster Dependencies’ Status

Ensure all cluster dependencies are ready, such as external storage.

Step 2: Power on Cluster Machines

Wait for the cluster to be up and running, which may take about 10 minutes.

Step 3: Check All Master Nodes’ Status

Check the status of core components, such as etcd services, and make sure everything is ready.

  1. kubectl get nodes -l node-role.kubernetes.io/master

Step 4: Check All Worker Nodes’ Status

  1. kubectl get nodes -l node-role.kubernetes.io/worker

If your cluster fails to restart, please try to restore the etcd cluster.