Manage Node-Group on GCP GKE

See Migrating workloads to different machine types for more information.

The following is an example to replace cluster nodes with new storage size.

Storage Expansion

GKE supports adding additional disk with local-ssd-count. However, each local SSD is fixed size to 375 GB. We suggest expanding the node size via node pool replacement.

  1. In Longhorn, set replica-replenishment-wait-interval to 0.

  2. Add a new node-pool. Later Longhorn components will be automatically deployed on the nodes in this pool.

    1. GKE_NODEPOOL_NAME_NEW=<new-nodepool-name>
    2. GKE_REGION=<gke-region>
    3. GKE_CLUSTER_NAME=<gke-cluster-name>
    4. GKE_IMAGE_TYPE=Ubuntu
    5. GKE_MACHINE_TYPE=<gcp-machine-type>
    6. GKE_DISK_SIZE_NEW=<new-disk-size-in-gb>
    7. GKE_NODE_NUM=<number-of-nodes>
    8. gcloud container node-pools create ${GKE_NODEPOOL_NAME_NEW} \
    9. --region ${GKE_REGION} \
    10. --cluster ${GKE_CLUSTER_NAME} \
    11. --image-type ${GKE_IMAGE_TYPE} \
    12. --machine-type ${GKE_MACHINE_TYPE} \
    13. --disk-size ${GKE_DISK_SIZE_NEW} \
    14. --num-nodes ${GKE_NODE_NUM}
    15. gcloud container node-pools list \
    16. --zone ${GKE_REGION} \
    17. --cluster ${GKE_CLUSTER_NAME}
  3. Using Longhorn UI to disable the disk scheduling and request eviction for nodes in the old node-pool.

  4. Cordon and drain Kubernetes nodes in the old node-pool.

    1. GKE_NODEPOOL_NAME_OLD=<old-nodepool-name>
    2. for n in `kubectl get nodes | grep ${GKE_CLUSTER_NAME}-${GKE_NODEPOOL_NAME_OLD}- | awk '{print $1}'`; do
    3. kubectl cordon $n && \
    4. kubectl drain $n --ignore-daemonsets --delete-emptydir-data
    5. done
  5. Delete old node-pool.

    1. gcloud container node-pools delete ${GKE_NODEPOOL_NAME_OLD}\
    2. --zone ${GKE_REGION} \
    3. --cluster ${GKE_CLUSTER_NAME}