升级 kubeadm 集群

本页介绍如何将 kubeadm 创建的 Kubernetes 集群从 1.18.x 版本升级到 1.19.x 版本, 或者从版本 1.19.x 升级到 1.19.y ,其中 y > x

要查看 kubeadm 创建的有关旧版本集群升级的信息,请参考以下页面:

升级工作的基本流程如下:

  1. 升级主控制平面节点
  2. 升级其他控制平面节点
  3. 升级工作节点

准备开始

  • 你需要有一个由 kubeadm 创建并运行着 1.18.0 或更高版本的 Kubernetes 集群。
  • 禁用交换分区
  • 集群应使用静态的控制平面和 etcd Pod 或者 外部 etcd。
  • 务必仔细认真阅读发行说明
  • 务必备份所有重要组件,例如存储在数据库中应用层面的状态。 kubeadm upgrade 不会影响你的工作负载,只会涉及 Kubernetes 内部的组件,但备份终究是好的。

附加信息

  • 升级后,因为容器规约的哈希值已更改,所有容器都会被重新启动。
  • 你只能从一个次版本升级到下一个次版本,或者在次版本相同时升级补丁版本。 也就是说,升级时不可以跳过次版本。 例如,你只能从 1.y 升级到 1.y+1,而不能从 from 1.y 升级到 1.y+2。

确定要升级到哪个版本

找到最新的稳定版 1.19:

  1. apt update
  2. apt-cache policy kubeadm
  3. # 在列表中查找最新的 1.19 版本
  4. # 它看起来应该是 1.19.x-00 ,其中 x 是最新的补丁
  1. yum list --showduplicates kubeadm --disableexcludes=kubernetes
  2. # 在列表中查找最新的 1.19 版本
  3. # 它看起来应该是 1.19.x-0 ,其中 x 是最新的补丁版本

升级控制平面节点

升级第一个控制面节点

  1. # 用最新的修补程序版本替换 1.19.x-00 中的 x
  2. apt-mark unhold kubeadm && \
  3. apt-get update && apt-get install -y kubeadm=1.19.x-00 && \
  4. apt-mark hold kubeadm
  5. -
  6. # 从 apt-get 1.1 版本起,你也可以使用下面的方法
  7. apt-get update && \
  8. apt-get install -y --allow-change-held-packages kubeadm=1.19.x-00
  1. # 用最新的修补程序版本替换 1.19.x-0 中的 x
  2. yum install -y kubeadm-1.19.x-0 --disableexcludes=kubernetes
  • 验证下载操作正常,并且 kubeadm 版本正确:

    1. kubeadm version
  • 腾空控制平面节点:

    1. # 将 <cp-node-name> 替换为你自己的控制面节点名称
    2. kubectl drain <cp-node-name> --ignore-daemonsets
  • 在控制面节点上,运行:

    1. sudo kubeadm upgrade plan

    你应该可以看到与下面类似的输出:

    1. [upgrade/config] Making sure the configuration is correct:
    2. [upgrade/config] Reading configuration from the cluster...
    3. [upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
    4. [preflight] Running pre-flight checks.
    5. [upgrade] Running cluster health checks
    6. [upgrade] Fetching available versions to upgrade to
    7. [upgrade/versions] Cluster version: v1.18.4
    8. [upgrade/versions] kubeadm version: v1.19.0
    9. [upgrade/versions] Latest stable version: v1.19.0
    10. [upgrade/versions] Latest version in the v1.18 series: v1.18.4
    11. Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
    12. COMPONENT CURRENT AVAILABLE
    13. Kubelet 1 x v1.18.4 v1.19.0
    14. Upgrade to the latest version in the v1.18 series:
    15. COMPONENT CURRENT AVAILABLE
    16. API Server v1.18.4 v1.19.0
    17. Controller Manager v1.18.4 v1.19.0
    18. Scheduler v1.18.4 v1.19.0
    19. Kube Proxy v1.18.4 v1.19.0
    20. CoreDNS 1.6.7 1.7.0
    21. Etcd 3.4.3-0 3.4.7-0
    22. You can now apply the upgrade by executing the following command:
    23. kubeadm upgrade apply v1.19.0
    24. _____________________________________________________________________
    25. The table below shows the current state of component configs as understood by this version of kubeadm.
    26. Configs that have a "yes" mark in the "MANUAL UPGRADE REQUIRED" column require manual config upgrade or
    27. resetting to kubeadm defaults before a successful upgrade can be performed. The version to manually
    28. upgrade to is denoted in the "PREFERRED VERSION" column.
    29. API GROUP CURRENT VERSION PREFERRED VERSION MANUAL UPGRADE REQUIRED
    30. kubeproxy.config.k8s.io v1alpha1 v1alpha1 no
    31. kubelet.config.k8s.io v1beta1 v1beta1 no
    32. _____________________________________________________________________

    此命令检查你的集群是否可以升级,并可以获取到升级的版本。 其中也显示了组件配置版本状态的表格。

说明: kubeadm upgrade 也会自动对它在此节点上管理的证书进行续约。 如果选择不对证书进行续约,可以使用标志 --certificate-renewal=false。 关于更多细节信息,可参见证书管理指南

说明:

如果 kubeadm upgrade plan 显示有任何组件配置需要手动升级,则用户必须 通过命令行参数 --configkubeadm upgrade apply 操作 提供带有替换配置的配置文件。

  • 选择要升级到的版本,然后运行相应的命令。例如:

    1. # 将 x 替换为你为此次升级所选的补丁版本号
    2. sudo kubeadm upgrade apply v1.19.x

    你应该可以看见与下面类似的输出:

    1. [upgrade/config] Making sure the configuration is correct:
    2. [upgrade/config] Reading configuration from the cluster...
    3. [upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
    4. [preflight] Running pre-flight checks.
    5. [upgrade] Running cluster health checks
    6. [upgrade/version] You have chosen to change the cluster version to "v1.19.0"
    7. [upgrade/versions] Cluster version: v1.18.4
    8. [upgrade/versions] kubeadm version: v1.19.0
    9. [upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]: y
    10. [upgrade/prepull] Pulling images required for setting up a Kubernetes cluster
    11. [upgrade/prepull] This might take a minute or two, depending on the speed of your internet connection
    12. [upgrade/prepull] You can also perform this action in beforehand using 'kubeadm config images pull'
    13. [upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.19.0"...
    14. Static pod: kube-apiserver-kind-control-plane hash: b4c8effe84b4a70031f9a49a20c8b003
    15. Static pod: kube-controller-manager-kind-control-plane hash: 9ac092f0ca813f648c61c4d5fcbf39f2
    16. Static pod: kube-scheduler-kind-control-plane hash: 7da02f2c78da17af7c2bf1533ecf8c9a
    17. [upgrade/etcd] Upgrading to TLS for etcd
    18. Static pod: etcd-kind-control-plane hash: 171c56cd0e81c0db85e65d70361ceddf
    19. [upgrade/staticpods] Preparing for "etcd" upgrade
    20. [upgrade/staticpods] Renewing etcd-server certificate
    21. [upgrade/staticpods] Renewing etcd-peer certificate
    22. [upgrade/staticpods] Renewing etcd-healthcheck-client certificate
    23. [upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/etcd.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2020-07-13-16-24-16/etcd.yaml"
    24. [upgrade/staticpods] Waiting for the kubelet to restart the component
    25. [upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
    26. Static pod: etcd-kind-control-plane hash: 171c56cd0e81c0db85e65d70361ceddf
    27. Static pod: etcd-kind-control-plane hash: 171c56cd0e81c0db85e65d70361ceddf
    28. Static pod: etcd-kind-control-plane hash: 59e40b2aab1cd7055e64450b5ee438f0
    29. [apiclient] Found 1 Pods for label selector component=etcd
    30. [upgrade/staticpods] Component "etcd" upgraded successfully!
    31. [upgrade/etcd] Waiting for etcd to become available
    32. [upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests999800980"
    33. [upgrade/staticpods] Preparing for "kube-apiserver" upgrade
    34. [upgrade/staticpods] Renewing apiserver certificate
    35. [upgrade/staticpods] Renewing apiserver-kubelet-client certificate
    36. [upgrade/staticpods] Renewing front-proxy-client certificate
    37. [upgrade/staticpods] Renewing apiserver-etcd-client certificate
    38. [upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2020-07-13-16-24-16/kube-apiserver.yaml"
    39. [upgrade/staticpods] Waiting for the kubelet to restart the component
    40. [upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
    41. Static pod: kube-apiserver-kind-control-plane hash: b4c8effe84b4a70031f9a49a20c8b003
    42. Static pod: kube-apiserver-kind-control-plane hash: b4c8effe84b4a70031f9a49a20c8b003
    43. Static pod: kube-apiserver-kind-control-plane hash: b4c8effe84b4a70031f9a49a20c8b003
    44. Static pod: kube-apiserver-kind-control-plane hash: b4c8effe84b4a70031f9a49a20c8b003
    45. Static pod: kube-apiserver-kind-control-plane hash: f717874150ba572f020dcd89db8480fc
    46. [apiclient] Found 1 Pods for label selector component=kube-apiserver
    47. [upgrade/staticpods] Component "kube-apiserver" upgraded successfully!
    48. [upgrade/staticpods] Preparing for "kube-controller-manager" upgrade
    49. [upgrade/staticpods] Renewing controller-manager.conf certificate
    50. [upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-controller-manager.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2020-07-13-16-24-16/kube-controller-manager.yaml"
    51. [upgrade/staticpods] Waiting for the kubelet to restart the component
    52. [upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
    53. Static pod: kube-controller-manager-kind-control-plane hash: 9ac092f0ca813f648c61c4d5fcbf39f2
    54. Static pod: kube-controller-manager-kind-control-plane hash: b155b63c70e798b806e64a866e297dd0
    55. [apiclient] Found 1 Pods for label selector component=kube-controller-manager
    56. [upgrade/staticpods] Component "kube-controller-manager" upgraded successfully!
    57. [upgrade/staticpods] Preparing for "kube-scheduler" upgrade
    58. [upgrade/staticpods] Renewing scheduler.conf certificate
    59. [upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-scheduler.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2020-07-13-16-24-16/kube-scheduler.yaml"
    60. [upgrade/staticpods] Waiting for the kubelet to restart the component
    61. [upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
    62. Static pod: kube-scheduler-kind-control-plane hash: 7da02f2c78da17af7c2bf1533ecf8c9a
    63. Static pod: kube-scheduler-kind-control-plane hash: 260018ac854dbf1c9fe82493e88aec31
    64. [apiclient] Found 1 Pods for label selector component=kube-scheduler
    65. [upgrade/staticpods] Component "kube-scheduler" upgraded successfully!
    66. [upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
    67. [kubelet] Creating a ConfigMap "kubelet-config-1.19" in namespace kube-system with the configuration for the kubelets in the cluster
    68. [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
    69. [bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
    70. [bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
    71. [bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
    72. [bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
    73. W0713 16:26:14.074656 2986 dns.go:282] the CoreDNS Configuration will not be migrated due to unsupported version of CoreDNS. The existing CoreDNS Corefile configuration and deployment has been retained.
    74. [addons] Applied essential addon: CoreDNS
    75. [addons] Applied essential addon: kube-proxy
    76. [upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.19.0". Enjoy!
    77. [upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.
  • 手动升级你的 CNI 驱动插件。

    你的容器网络接口(CNI)驱动应该提供了程序自身的升级说明。 参阅插件页面查找你 CNI 所提供的程序, 并查看是否需要其他升级步骤。

    如果 CNI 提供程序作为 DaemonSet 运行,则在其他控制平面节点上不需要此步骤。

  • 取消对控制面节点的保护

    1. # 将 <cp-node-name> 替换为你的控制面节点名称
    2. kubectl uncordon <cp-node-name>

升级其他控制面节点

与第一个控制面节点类似,不过使用下面的命令:

  1. sudo kubeadm upgrade node

而不是:

  1. sudo kubeadm upgrade apply

同时,也不需要执行 sudo kubeadm upgrade plan

升级 kubelet 和 kubectl

  1. # 用最新的补丁版本替换 1.19.x-00 中的 x
  2. apt-mark unhold kubelet kubectl && \
  3. apt-get update && apt-get install -y kubelet=1.19.x-00 kubectl=1.19.x-00 && \
  4. apt-mark hold kubelet kubectl
  5. # 从 apt-get 的 1.1 版本开始,你也可以使用下面的方法:
  6. apt-get update && \
  7. apt-get install -y --allow-change-held-packages kubelet=1.19.x-00 kubectl=1.19.x-00

用最新的补丁版本替换 1.19.x-00 中的 x

  1. yum install -y kubelet-1.19.x-0 kubectl-1.19.x-0 --disableexcludes=kubernetes

重启 kubelet

  1. sudo systemctl daemon-reload
  2. sudo systemctl restart kubelet

升级工作节点

工作节点上的升级过程应该一次执行一个节点,或者一次执行几个节点, 以不影响运行工作负载所需的最小容量。

升级 kubeadm

  1. # 将 1.19.x-00 中的 x 替换为最新的补丁版本
  2. apt-mark unhold kubeadm && \
  3. apt-get update && apt-get install -y kubeadm=1.19.x-00 && \
  4. apt-mark hold kubeadm
  5. # 从 apt-get 的 1.1 版本开始,你也可以使用下面的方法:
  6. apt-get update && \
  7. apt-get install -y --allow-change-held-packages kubeadm=1.19.x-00
  1. # 用最新的补丁版本替换 1.19.x-00 中的 x
  2. yum install -y kubeadm-1.19.x-0 --disableexcludes=kubernetes

腾空节点

  • 通过将节点标记为不可调度并逐出工作负载,为维护做好准备。运行:

    1. # 将 <node-to-drain> 替换为你正在腾空的节点的名称
    2. kubectl drain <node-to-drain> --ignore-daemonsets

    你应该可以看见与下面类似的输出:

    1. node/ip-172-31-85-18 cordoned
    2. WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-proxy-dj7d7, kube-system/weave-net-z65qx
    3. node/ip-172-31-85-18 drained

升级 kubelet 配置

  • 升级 kubelet 配置:

    1. sudo kubeadm upgrade node

升级 kubelet 与 kubectl

  1. # 将 1.19.x-00 中的 x 替换为最新的补丁版本
  2. apt-mark unhold kubelet kubectl && \
  3. apt-get update && apt-get install -y kubelet=1.19.x-00 kubectl=1.19.x-00 && \
  4. apt-mark hold kubelet kubectl
  5. # 从 apt-get 的 1.1 版本开始,你也可以使用下面的方法:
  6. apt-get update && \
  7. apt-get install -y --allow-change-held-packages kubelet=1.19.x-00 kubectl=1.19.x-00
  1. # 将 1.18.x-00 中的 x 替换为最新的补丁版本
  2. yum install -y kubelet-1.19.x-0 kubectl-1.19.x-0 --disableexcludes=kubernetes
  • 重启 kubelet

    1. sudo systemctl daemon-reload
    2. sudo systemctl restart kubelet

取消对节点的保护

  • 通过将节点标记为可调度,让节点重新上线:

    1. # 将 <node-to-drain> 替换为当前节点的名称
    2. kubectl uncordon <node-to-drain>

验证集群的状态

在所有节点上升级 kubelet 后,通过从 kubectl 可以访问集群的任何位置运行以下命令,验证所有节点是否再次可用:

  1. kubectl get nodes

STATUS 应显示所有节点为 Ready 状态,并且版本号已经被更新。

从故障状态恢复

如果 kubeadm upgrade 失败并且没有回滚,例如由于执行期间意外关闭,你可以再次运行 kubeadm upgrade。 此命令是幂等的,并最终确保实际状态是你声明的所需状态。 要从故障状态恢复,你还可以运行 kubeadm upgrade --force 而不去更改集群正在运行的版本。

在升级期间,kubeadm 向 /etc/kubernetes/tmp 目录下的如下备份文件夹写入数据:

  • kubeadm-backup-etcd-<date>-<time>
  • kubeadm-backup-manifests-<date>-<time>

kubeadm-backup-etcd 包含当前控制面节点本地 etcd 成员数据的备份。 如果 etcd 升级失败并且自动回滚也无法修复,则可以将此文件夹中的内容复制到 /var/lib/etcd 进行手工修复。如果使用的是外部的 etcd,则此备份文件夹为空。

kubeadm-backup-manifests 包含当前控制面节点的静态 Pod 清单文件的备份版本。 如果升级失败并且无法自动回滚,则此文件夹中的内容可以复制到 /etc/kubernetes/manifests 目录实现手工恢复。 如果由于某些原因,在升级前后某个组件的清单未发生变化,则 kubeadm 也不会为之 生成备份版本。

工作原理

kubeadm upgrade apply 做了以下工作:

  • 检查你的集群是否处于可升级状态:
    • API 服务器是可访问的
    • 所有节点处于 Ready 状态
    • 控制面是健康的
  • 强制执行版本偏差策略。
  • 确保控制面的镜像是可用的或可拉取到服务器上。
  • 如果组件配置要求版本升级,则生成替代配置与/或使用用户提供的覆盖版本配置。
  • 升级控制面组件或回滚(如果其中任何一个组件无法启动)。
  • 应用新的 kube-dnskube-proxy 清单,并强制创建所有必需的 RBAC 规则。
  • 如果旧文件在 180 天后过期,将创建 API 服务器的新证书和密钥文件并备份旧文件。

kubeadm upgrade node 在其他控制平节点上执行以下操作:

  • 从集群中获取 kubeadm ClusterConfiguration
  • 可选地备份 kube-apiserver 证书。
  • 升级控制平面组件的静态 Pod 清单。
  • 为本节点升级 kubelet 配置

kubeadm upgrade node 在工作节点上完成以下工作:

  • 从集群取回 kubeadm ClusterConfiguration
  • 为本节点升级 kubelet 配置