Kubernetes HA

Kubernetes 从 1.5 开始,通过 kops 或者 kube-up.sh 部署的集群会自动部署一个高可用的系统,包括

  • etcd 集群模式
  • apiserver 负载均衡
  • controller manager、scheduler 和 cluster autoscaler 自动选主(有且仅有一个运行实例)

如下图所示

高可用 - 图1

注意:以下步骤假设每台机器上 Kubelet 和 Docker 已配置并处于正常运行状态。

Etcd 集群

安装 cfssl

  1. # On all etcd nodes
  2. curl -o /usr/local/bin/cfssl https://pkg.cfssl.org/R1.2/cfssl_linux-amd64
  3. curl -o /usr/local/bin/cfssljson https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64
  4. chmod +x /usr/local/bin/cfssl*

生成 CA certs:

  1. # SSH etcd0
  2. mkdir -p /etc/kubernetes/pki/etcd
  3. cd /etc/kubernetes/pki/etcd
  4. cat >ca-config.json <<EOF
  5. {
  6. "signing": {
  7. "default": {
  8. "expiry": "43800h"
  9. },
  10. "profiles": {
  11. "server": {
  12. "expiry": "43800h",
  13. "usages": [
  14. "signing",
  15. "key encipherment",
  16. "server auth",
  17. "client auth"
  18. ]
  19. },
  20. "client": {
  21. "expiry": "43800h",
  22. "usages": [
  23. "signing",
  24. "key encipherment",
  25. "client auth"
  26. ]
  27. },
  28. "peer": {
  29. "expiry": "43800h",
  30. "usages": [
  31. "signing",
  32. "key encipherment",
  33. "server auth",
  34. "client auth"
  35. ]
  36. }
  37. }
  38. }
  39. }
  40. EOF
  41. cat >ca-csr.json <<EOF
  42. {
  43. "CN": "etcd",
  44. "key": {
  45. "algo": "rsa",
  46. "size": 2048
  47. }
  48. }
  49. EOF
  50. cfssl gencert -initca ca-csr.json | cfssljson -bare ca -
  51. # generate client certs
  52. cat >client.json <<EOF
  53. {
  54. "CN": "client",
  55. "key": {
  56. "algo": "ecdsa",
  57. "size": 256
  58. }
  59. }
  60. EOF
  61. cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=client client.json | cfssljson -bare client

生成 etcd server/peer certs

  1. # Copy files to other etcd nodes
  2. mkdir -p /etc/kubernetes/pki/etcd
  3. cd /etc/kubernetes/pki/etcd
  4. scp root@<etcd0-ip-address>:/etc/kubernetes/pki/etcd/ca.pem .
  5. scp root@<etcd0-ip-address>:/etc/kubernetes/pki/etcd/ca-key.pem .
  6. scp root@<etcd0-ip-address>:/etc/kubernetes/pki/etcd/client.pem .
  7. scp root@<etcd0-ip-address>:/etc/kubernetes/pki/etcd/client-key.pem .
  8. scp root@<etcd0-ip-address>:/etc/kubernetes/pki/etcd/ca-config.json .
  9. # Run on all etcd nodes
  10. cfssl print-defaults csr > config.json
  11. sed -i '0,/CN/{s/example\.net/'"$PEER_NAME"'/}' config.json
  12. sed -i 's/www\.example\.net/'"$PRIVATE_IP"'/' config.json
  13. sed -i 's/example\.net/'"$PUBLIC_IP"'/' config.json
  14. cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=server config.json | cfssljson -bare server
  15. cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=peer config.json | cfssljson -bare peer

最后运行 etcd,将如下的 yaml 配置写入每台 etcd 节点的 /etc/kubernetes/manifests/etcd.yaml 文件中,注意替换

  • <podname> 为 etcd 节点名称 (比如etcd0, etcd1etcd2
  • <etcd0-ip-address>, <etcd1-ip-address> and <etcd2-ip-address> 为 etcd 节点的内网 IP 地址
  1. cat >/etc/kubernetes/manifests/etcd.yaml <<EOF
  2. apiVersion: v1
  3. kind: Pod
  4. metadata:
  5. labels:
  6. component: etcd
  7. tier: control-plane
  8. name: <podname>
  9. namespace: kube-system
  10. spec:
  11. containers:
  12. - command:
  13. - etcd --name ${PEER_NAME} \
  14. - --data-dir /var/lib/etcd \
  15. - --listen-client-urls https://${PRIVATE_IP}:2379 \
  16. - --advertise-client-urls https://${PRIVATE_IP}:2379 \
  17. - --listen-peer-urls https://${PRIVATE_IP}:2380 \
  18. - --initial-advertise-peer-urls https://${PRIVATE_IP}:2380 \
  19. - --cert-file=/certs/server.pem \
  20. - --key-file=/certs/server-key.pem \
  21. - --client-cert-auth \
  22. - --trusted-ca-file=/certs/ca.pem \
  23. - --peer-cert-file=/certs/peer.pem \
  24. - --peer-key-file=/certs/peer-key.pem \
  25. - --peer-client-cert-auth \
  26. - --peer-trusted-ca-file=/certs/ca.pem \
  27. - --initial-cluster etcd0=https://<etcd0-ip-address>:2380,etcd1=https://<etcd1-ip-address>:2380,etcd1=https://<etcd2-ip-address>:2380 \
  28. - --initial-cluster-token my-etcd-token \
  29. - --initial-cluster-state new
  30. image: gcr.io/google_containers/etcd-amd64:3.1.0
  31. livenessProbe:
  32. httpGet:
  33. path: /health
  34. port: 2379
  35. scheme: HTTP
  36. initialDelaySeconds: 15
  37. timeoutSeconds: 15
  38. name: etcd
  39. env:
  40. - name: PUBLIC_IP
  41. valueFrom:
  42. fieldRef:
  43. fieldPath: status.hostIP
  44. - name: PRIVATE_IP
  45. valueFrom:
  46. fieldRef:
  47. fieldPath: status.podIP
  48. - name: PEER_NAME
  49. valueFrom:
  50. fieldRef:
  51. fieldPath: metadata.name
  52. volumeMounts:
  53. - mountPath: /var/lib/etcd
  54. name: etcd
  55. - mountPath: /certs
  56. name: certs
  57. hostNetwork: true
  58. volumes:
  59. - hostPath:
  60. path: /var/lib/etcd
  61. type: DirectoryOrCreate
  62. name: etcd
  63. - hostPath:
  64. path: /etc/kubernetes/pki/etcd
  65. name: certs
  66. EOF

注意:以上方法需要每个 etcd 节点都运行 kubelet。如果不想使用 kubelet,还可以通过 systemd 的方式来启动 etcd:

  1. export ETCD_VERSION=v3.1.10
  2. curl -sSL https://github.com/coreos/etcd/releases/download/${ETCD_VERSION}/etcd-${ETCD_VERSION}-linux-amd64.tar.gz | tar -xzv --strip-components=1 -C /usr/local/bin/
  3. rm -rf etcd-$ETCD_VERSION-linux-amd64*
  4. touch /etc/etcd.env
  5. echo "PEER_NAME=$PEER_NAME" >> /etc/etcd.env
  6. echo "PRIVATE_IP=$PRIVATE_IP" >> /etc/etcd.env
  7. cat >/etc/systemd/system/etcd.service <<EOF
  8. [Unit]
  9. Description=etcd
  10. Documentation=https://github.com/coreos/etcd
  11. Conflicts=etcd.service
  12. Conflicts=etcd2.service
  13. [Service]
  14. EnvironmentFile=/etc/etcd.env
  15. Type=notify
  16. Restart=always
  17. RestartSec=5s
  18. LimitNOFILE=40000
  19. TimeoutStartSec=0
  20. ExecStart=/usr/local/bin/etcd --name ${PEER_NAME} \
  21. --data-dir /var/lib/etcd \
  22. --listen-client-urls https://${PRIVATE_IP}:2379 \
  23. --advertise-client-urls https://${PRIVATE_IP}:2379 \
  24. --listen-peer-urls https://${PRIVATE_IP}:2380 \
  25. --initial-advertise-peer-urls https://${PRIVATE_IP}:2380 \
  26. --cert-file=/etc/kubernetes/pki/etcd/server.pem \
  27. --key-file=/etc/kubernetes/pki/etcd/server-key.pem \
  28. --client-cert-auth \
  29. --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem \
  30. --peer-cert-file=/etc/kubernetes/pki/etcd/peer.pem \
  31. --peer-key-file=/etc/kubernetes/pki/etcd/peer-key.pem \
  32. --peer-client-cert-auth \
  33. --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem \
  34. --initial-cluster etcd0=https://<etcd0-ip-address>:2380,etcd1=https://<etcd1-ip-address>:2380,etcd2=https://<etcd2-ip-address>:2380 \
  35. --initial-cluster-token my-etcd-token \
  36. --initial-cluster-state new
  37. [Install]
  38. WantedBy=multi-user.target
  39. EOF
  40. systemctl daemon-reload
  41. systemctl start etcd

kube-apiserver

kube-apiserver.yaml 放到每台 Master 节点的 /etc/kubernetes/manifests/,并把相关的配置放到 /srv/kubernetes/,即可由 kubelet 自动创建并启动 apiserver:

  • basic_auth.csv - basic auth user and password
  • ca.crt - Certificate Authority cert
  • known_tokens.csv - tokens that entities (e.g. the kubelet) can use to talk to the apiserver
  • kubecfg.crt - Client certificate, public key
  • kubecfg.key - Client certificate, private key
  • server.cert - Server certificate, public key
  • server.key - Server certificate, private key

apiserver 启动后,还需要为它们做负载均衡,可以使用云平台的弹性负载均衡服务或者使用 haproxy/lvs 等为 master 节点配置负载均衡。

如果使用 kubeadm 来部署集群的话,上述配置可以自动生成

  1. # on master0
  2. # copy etcd certs
  3. mkdir -p /etc/kubernetes/pki/etcd
  4. scp root@<etcd0-ip-address>:/etc/kubernetes/pki/etcd/ca.pem /etc/kubernetes/pki/etcd
  5. scp root@<etcd0-ip-address>:/etc/kubernetes/pki/etcd/client.pem /etc/kubernetes/pki/etcd
  6. scp root@<etcd0-ip-address>:/etc/kubernetes/pki/etcd/client-key.pem /etc/kubernetes/pki/etcd
  7. # deploy master0
  8. cat >config.yaml <<EOF
  9. apiVersion: kubeadm.k8s.io/v1alpha1
  10. kind: MasterConfiguration
  11. api:
  12. advertiseAddress: <private-ip>
  13. etcd:
  14. endpoints:
  15. - https://<etcd0-ip-address>:2379
  16. - https://<etcd1-ip-address>:2379
  17. - https://<etcd2-ip-address>:2379
  18. caFile: /etc/kubernetes/pki/etcd/ca.pem
  19. certFile: /etc/kubernetes/pki/etcd/client.pem
  20. keyFile: /etc/kubernetes/pki/etcd/client-key.pem
  21. networking:
  22. podSubnet: <podCIDR>
  23. apiServerCertSANs:
  24. - <load-balancer-ip>
  25. apiServerExtraArgs:
  26. apiserver-count: "3"
  27. EOF
  28. kubeadm init --config=config.yaml
  29. # on other master nodes
  30. scp root@<master0-ip-address>:/etc/kubernetes/pki/* /etc/kubernetes/pki
  31. rm apiserver.crt
  32. # 然后再执行上述 master0 的所有步骤

kube-controller-manager 和 kube-scheduler

kube-controller manager 和 kube-scheduler 需要保证任何时刻都只有一个实例运行,需要一个选主的过程,所以在启动时要设置 --leader-elect=true,比如

  1. kube-scheduler --master=127.0.0.1:8080 --v=2 --leader-elect=true
  2. kube-controller-manager --master=127.0.0.1:8080 --cluster-cidr=10.245.0.0/16 --allocate-node-cidrs=true --service-account-private-key-file=/srv/kubernetes/server.key --v=2 --leader-elect=true

kube-scheduler.yamlkube-controller-manager.yaml 放到每台 master 节点的 /etc/kubernetes/manifests/ 即可。

kube-dns

kube-dns 可以通过 Deployment 的方式来部署,默认 kubeadm 会自动创建。但在大规模集群的时候,需要放宽资源限制,比如

  1. dns_replicas: 6
  2. dns_cpu_limit: 100m
  3. dns_memory_limit: 512Mi
  4. dns_cpu_requests 70m
  5. dns_memory_requests: 70Mi

另外,也需要给 dnsmasq 增加资源,比如增加缓存大小到 10000,增加并发处理数量 --dns-forward-max=1000 等。

kube-proxy

默认 kube-proxy 使用 iptables 来为 Service 作负载均衡,这在大规模时会产生很大的 Latency,可以考虑使用 IPVS 的替代方式(注意 IPVS 在 v1.9 中还是 beta 状态)。

另外,需要注意配置 kube-proxy 使用 kube-apiserver 负载均衡的 IP 地址:

  1. kubectl get configmap -n kube-system kube-proxy -o yaml > kube-proxym.yaml
  2. sed -i 's#server:.*#server: https://<masterLoadBalancerFQDN>:6443#g' kube-proxy-cm.yaml
  3. kubectl apply -f kube-proxy-cm.yaml --force
  4. # restart all kube-proxy pods to ensure that they load the new configmap
  5. kubectl delete pod -n kube-system -l k8s-app=kube-proxy

kubelet

kubelet 需要配置 kube-apiserver 负载均衡的 IP 地址

  1. sudo sed -i 's#server:.*#server: https://<masterLoadBalancerFQDN>:6443#g' /etc/kubernetes/kubelet.conf
  2. sudo systemctl restart kubelet

数据持久化

除了上面提到的这些配置,持久化存储也是高可用 Kubernetes 集群所必须的。

  • 对于公有云上部署的集群,可以考虑使用云平台提供的持久化存储,比如 aws ebs 或者 gce persistent disk
  • 对于物理机部署的集群,可以考虑使用 iSCSI、NFS、Gluster 或者 Ceph 等网络存储,也可以使用 RAID

参考文档