Support Managed Kubernetes Service - Upgrade Kubernetes on Azure AKS - 《Longhorn v1.3.0 Documentation》

Upgrade Kubernetes on Azure AKS

AKS provides az aks upgrade for in-places nodes upgrade by node reimaged, but this will cause the original Longhorn disks missing, then there will be no disks allowing replica rebuilding in upgraded nodes anymore.

We suggest using node-pool replacement to upgrade the agent nodes but use az aks upgrade for control plane nodes to ensure data safety.

In Longhorn, set replica-replenishment-wait-interval to 0.

Upgrade AKS control plane.

AKS_RESOURCE_GROUP=<aks-resource-group>
AKS_CLUSTER_NAME=<aks-cluster-name>
AKS_K8S_VERSION_UPGRADE=<aks-k8s-version>
az aks upgrade \
    --resource-group ${AKS_RESOURCE_GROUP} \
    --name ${AKS_CLUSTER_NAME} \
    --kubernetes-version ${AKS_K8S_VERSION_UPGRADE} \
    --control-plane-only

Add a new node-pool.

AKS_NODEPOOL_NAME_NEW=<new-nodepool-name>
AKS_DISK_SIZE=<disk-size-in-gb>
AKS_NODE_NUM=<number-of-nodes>
az aks nodepool add \
  --resource-group ${AKS_RESOURCE_GROUP} \
  --cluster-name ${AKS_CLUSTER_NAME} \
  --name ${AKS_NODEPOOL_NAME_NEW} \
  --node-count ${AKS_NODE_NUM} \
  --node-osdisk-size ${AKS_DISK_SIZE} \
  --kubernetes-version ${AKS_K8S_VERSION_UPGRADE} \
  --mode System

Using Longhorn UI to disable the disk scheduling and request eviction for nodes in the old node-pool.

Cordon and drain Kubernetes nodes in the old node-pool.

AKS_NODEPOOL_NAME_OLD=<old-nodepool-name>
for n in `kubectl get nodes | grep ${AKS_NODEPOOL_NAME_OLD}- | awk '{print $1}'`; do
  kubectl cordon $n && \
  kubectl drain $n --ignore-daemonsets --delete-emptydir-data
done

Delete old node-pool.

az aks nodepool delete \
  --cluster-name ${AKS_CLUSTER_NAME} \
  --name ${AKS_NODEPOOL_NAME_OLD} \
  --resource-group ${AKS_RESOURCE_GROUP}