StatefulSet

StatefulSet 是为了解决有状态服务的问题(对应 Deployments 和 ReplicaSets 是为无状态服务而设计),其应用场景包括

  • 稳定的持久化存储,即 Pod 重新调度后还是能访问到相同的持久化数据,基于 PVC 来实现
  • 稳定的网络标志,即 Pod 重新调度后其 PodName 和 HostName 不变,基于 Headless Service(即没有 Cluster IP 的 Service)来实现
  • 有序部署,有序扩展,即 Pod 是有顺序的,在部署或者扩展的时候要依据定义的顺序依次依序进行(即从 0 到 N-1,在下一个 Pod 运行之前所有之前的 Pod 必须都是 Running 和 Ready 状态),基于 init containers 来实现
  • 有序收缩,有序删除(即从 N-1 到 0)

从上面的应用场景可以发现,StatefulSet 由以下几个部分组成:

  • 用于定义网络标志(DNS domain)的 Headless Service
  • 用于创建 PersistentVolumes 的 volumeClaimTemplates
  • 定义具体应用的 StatefulSet

StatefulSet 中每个 Pod 的 DNS 格式为 statefulSetName-{0..N-1}.serviceName.namespace.svc.cluster.local,其中

  • serviceName 为 Headless Service 的名字
  • 0..N-1 为 Pod 所在的序号,从 0 开始到 N-1
  • statefulSetName 为 StatefulSet 的名字
  • namespace 为服务所在的 namespace,Headless Service 和 StatefulSet 必须在相同的 namespace
  • .cluster.local 为 Cluster Domain

API 版本对照表

Kubernetes 版本 Apps 版本
v1.6-v1.7 apps/v1beta1
v1.8 apps/v1beta2
v1.9 apps/v1

简单示例

以一个简单的 nginx 服务 web.yaml 为例:

  1. apiVersion: v1
  2. kind: Service
  3. metadata:
  4. name: nginx
  5. labels:
  6. app: nginx
  7. spec:
  8. ports:
  9. - port: 80
  10. name: web
  11. clusterIP: None
  12. selector:
  13. app: nginx
  14. ---
  15. apiVersion: apps/v1
  16. kind: StatefulSet
  17. metadata:
  18. name: web
  19. spec:
  20. serviceName: "nginx"
  21. replicas: 2
  22. selector:
  23. matchLabels:
  24. app: nginx
  25. template:
  26. metadata:
  27. labels:
  28. app: nginx
  29. spec:
  30. containers:
  31. - name: nginx
  32. image: k8s.gcr.io/nginx-slim:0.8
  33. ports:
  34. - containerPort: 80
  35. name: web
  36. volumeMounts:
  37. - name: www
  38. mountPath: /usr/share/nginx/html
  39. volumeClaimTemplates:
  40. - metadata:
  41. name: www
  42. spec:
  43. accessModes: ["ReadWriteOnce"]
  44. resources:
  45. requests:
  46. storage: 1Gi
  1. $ kubectl create -f web.yaml
  2. service "nginx" created
  3. statefulset "web" created
  4. # 查看创建的 headless service 和 statefulset
  5. $ kubectl get service nginx
  6. NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  7. nginx None <none> 80/TCP 1m
  8. $ kubectl get statefulset web
  9. NAME DESIRED CURRENT AGE
  10. web 2 2 2m
  11. # 根据 volumeClaimTemplates 自动创建 PVC(在 GCE 中会自动创建 kubernetes.io/gce-pd 类型的 volume)
  12. $ kubectl get pvc
  13. NAME STATUS VOLUME CAPACITY ACCESSMODES AGE
  14. www-web-0 Bound pvc-d064a004-d8d4-11e6-b521-42010a800002 1Gi RWO 16s
  15. www-web-1 Bound pvc-d06a3946-d8d4-11e6-b521-42010a800002 1Gi RWO 16s
  16. # 查看创建的 Pod,他们都是有序的
  17. $ kubectl get pods -l app=nginx
  18. NAME READY STATUS RESTARTS AGE
  19. web-0 1/1 Running 0 5m
  20. web-1 1/1 Running 0 4m
  21. # 使用 nslookup 查看这些 Pod 的 DNS
  22. $ kubectl run -i --tty --image busybox dns-test --restart=Never --rm /bin/sh
  23. / # nslookup web-0.nginx
  24. Server: 10.0.0.10
  25. Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local
  26. Name: web-0.nginx
  27. Address 1: 10.244.2.10
  28. / # nslookup web-1.nginx
  29. Server: 10.0.0.10
  30. Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local
  31. Name: web-1.nginx
  32. Address 1: 10.244.3.12
  33. / # nslookup web-0.nginx.default.svc.cluster.local
  34. Server: 10.0.0.10
  35. Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local
  36. Name: web-0.nginx.default.svc.cluster.local
  37. Address 1: 10.244.2.10

还可以进行其他的操作

  1. # 扩容
  2. $ kubectl scale statefulset web --replicas=5
  3. # 缩容
  4. $ kubectl patch statefulset web -p '{"spec":{"replicas":3}}'
  5. # 镜像更新(目前还不支持直接更新 image,需要 patch 来间接实现)
  6. $ kubectl patch statefulset web --type='json' -p='[{"op":"replace","path":"/spec/template/spec/containers/0/image","value":"gcr.io/google_containers/nginx-slim:0.7"}]'
  7. # 删除 StatefulSet 和 Headless Service
  8. $ kubectl delete statefulset web
  9. $ kubectl delete service nginx
  10. # StatefulSet 删除后 PVC 还会保留着,数据不再使用的话也需要删除
  11. $ kubectl delete pvc www-web-0 www-web-1

更新 StatefulSet

v1.7 + 支持 StatefulSet 的自动更新,通过 spec.updateStrategy 设置更新策略。目前支持两种策略

  • OnDelete:当 .spec.template 更新时,并不立即删除旧的 Pod,而是等待用户手动删除这些旧 Pod 后自动创建新 Pod。这是默认的更新策略,兼容 v1.6 版本的行为
  • RollingUpdate:当 .spec.template 更新时,自动删除旧的 Pod 并创建新 Pod 替换。在更新时,这些 Pod 是按逆序的方式进行,依次删除、创建并等待 Pod 变成 Ready 状态才进行下一个 Pod 的更新。

Partitions

RollingUpdate 还支持 Partitions,通过 .spec.updateStrategy.rollingUpdate.partition 来设置。当 partition 设置后,只有序号大于或等于 partition 的 Pod 会在 .spec.template 更新的时候滚动更新,而其余的 Pod 则保持不变(即便是删除后也是用以前的版本重新创建)。

  1. # 设置 partition 为 3
  2. $ kubectl patch statefulset web -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":3}}}}'
  3. statefulset "web" patched
  4. # 更新 StatefulSet
  5. $ kubectl patch statefulset web --type='json' -p='[{"op":"replace","path":"/spec/template/spec/containers/0/image","value":"gcr.io/google_containers/nginx-slim:0.7"}]'
  6. statefulset "web" patched
  7. # 验证更新
  8. $ kubectl delete po web-2
  9. pod "web-2" deleted
  10. $ kubectl get po -lapp=nginx -w
  11. NAME READY STATUS RESTARTS AGE
  12. web-0 1/1 Running 0 4m
  13. web-1 1/1 Running 0 4m
  14. web-2 0/1 ContainerCreating 0 11s
  15. web-2 1/1 Running 0 18s

Pod 管理策略

v1.7 + 可以通过 .spec.podManagementPolicy 设置 Pod 管理策略,支持两种方式

  • OrderedReady:默认的策略,按照 Pod 的次序依次创建每个 Pod 并等待 Ready 之后才创建后面的 Pod
  • Parallel:并行创建或删除 Pod(不等待前面的 Pod Ready 就开始创建所有的 Pod)

Parallel 示例

  1. ---
  2. apiVersion: v1
  3. kind: Service
  4. metadata:
  5. name: nginx
  6. labels:
  7. app: nginx
  8. spec:
  9. ports:
  10. - port: 80
  11. name: web
  12. clusterIP: None
  13. selector:
  14. app: nginx
  15. ---
  16. apiVersion: apps/v1beta1
  17. kind: StatefulSet
  18. metadata:
  19. name: web
  20. spec:
  21. serviceName: "nginx"
  22. podManagementPolicy: "Parallel"
  23. replicas: 2
  24. template:
  25. metadata:
  26. labels:
  27. app: nginx
  28. spec:
  29. containers:
  30. - name: nginx
  31. image: gcr.io/google_containers/nginx-slim:0.8
  32. ports:
  33. - containerPort: 80
  34. name: web
  35. volumeMounts:
  36. - name: www
  37. mountPath: /usr/share/nginx/html
  38. volumeClaimTemplates:
  39. - metadata:
  40. name: www
  41. spec:
  42. accessModes: ["ReadWriteOnce"]
  43. resources:
  44. requests:
  45. storage: 1Gi

可以看到,所有 Pod 是并行创建的

  1. $ kubectl create -f webp.yaml
  2. service "nginx" created
  3. statefulset "web" created
  4. $ kubectl get po -lapp=nginx -w
  5. NAME READY STATUS RESTARTS AGE
  6. web-0 0/1 Pending 0 0s
  7. web-0 0/1 Pending 0 0s
  8. web-1 0/1 Pending 0 0s
  9. web-1 0/1 Pending 0 0s
  10. web-0 0/1 ContainerCreating 0 0s
  11. web-1 0/1 ContainerCreating 0 0s
  12. web-0 1/1 Running 0 10s
  13. web-1 1/1 Running 0 10s

zookeeper

另外一个更能说明 StatefulSet 强大功能的示例为 zookeeper.yaml

  1. ---
  2. apiVersion: v1
  3. kind: Service
  4. metadata:
  5. name: zk-headless
  6. labels:
  7. app: zk-headless
  8. spec:
  9. ports:
  10. - port: 2888
  11. name: server
  12. - port: 3888
  13. name: leader-election
  14. clusterIP: None
  15. selector:
  16. app: zk
  17. ---
  18. apiVersion: v1
  19. kind: ConfigMap
  20. metadata:
  21. name: zk-config
  22. data:
  23. ensemble: "zk-0;zk-1;zk-2"
  24. jvm.heap: "2G"
  25. tick: "2000"
  26. init: "10"
  27. sync: "5"
  28. client.cnxns: "60"
  29. snap.retain: "3"
  30. purge.interval: "1"
  31. ---
  32. apiVersion: policy/v1beta1
  33. kind: PodDisruptionBudget
  34. metadata:
  35. name: zk-budget
  36. spec:
  37. selector:
  38. matchLabels:
  39. app: zk
  40. minAvailable: 2
  41. ---
  42. apiVersion: apps/v1beta1
  43. kind: StatefulSet
  44. metadata:
  45. name: zk
  46. spec:
  47. serviceName: zk-headless
  48. replicas: 3
  49. template:
  50. metadata:
  51. labels:
  52. app: zk
  53. annotations:
  54. pod.alpha.kubernetes.io/initialized: "true"
  55. scheduler.alpha.kubernetes.io/affinity: >
  56. {
  57. "podAntiAffinity": {
  58. "requiredDuringSchedulingRequiredDuringExecution": [{
  59. "labelSelector": {
  60. "matchExpressions": [{
  61. "key": "app",
  62. "operator": "In",
  63. "values": ["zk-headless"]
  64. }]
  65. },
  66. "topologyKey": "kubernetes.io/hostname"
  67. }]
  68. }
  69. }
  70. spec:
  71. containers:
  72. - name: k8szk
  73. imagePullPolicy: Always
  74. image: gcr.io/google_samples/k8szk:v1
  75. resources:
  76. requests:
  77. memory: "4Gi"
  78. cpu: "1"
  79. ports:
  80. - containerPort: 2181
  81. name: client
  82. - containerPort: 2888
  83. name: server
  84. - containerPort: 3888
  85. name: leader-election
  86. env:
  87. - name : ZK_ENSEMBLE
  88. valueFrom:
  89. configMapKeyRef:
  90. name: zk-config
  91. key: ensemble
  92. - name : ZK_HEAP_SIZE
  93. valueFrom:
  94. configMapKeyRef:
  95. name: zk-config
  96. key: jvm.heap
  97. - name : ZK_TICK_TIME
  98. valueFrom:
  99. configMapKeyRef:
  100. name: zk-config
  101. key: tick
  102. - name : ZK_INIT_LIMIT
  103. valueFrom:
  104. configMapKeyRef:
  105. name: zk-config
  106. key: init
  107. - name : ZK_SYNC_LIMIT
  108. valueFrom:
  109. configMapKeyRef:
  110. name: zk-config
  111. key: tick
  112. - name : ZK_MAX_CLIENT_CNXNS
  113. valueFrom:
  114. configMapKeyRef:
  115. name: zk-config
  116. key: client.cnxns
  117. - name: ZK_SNAP_RETAIN_COUNT
  118. valueFrom:
  119. configMapKeyRef:
  120. name: zk-config
  121. key: snap.retain
  122. - name: ZK_PURGE_INTERVAL
  123. valueFrom:
  124. configMapKeyRef:
  125. name: zk-config
  126. key: purge.interval
  127. - name: ZK_CLIENT_PORT
  128. value: "2181"
  129. - name: ZK_SERVER_PORT
  130. value: "2888"
  131. - name: ZK_ELECTION_PORT
  132. value: "3888"
  133. command:
  134. - sh
  135. - -c
  136. - zkGenConfig.sh && zkServer.sh start-foreground
  137. readinessProbe:
  138. exec:
  139. command:
  140. - "zkOk.sh"
  141. initialDelaySeconds: 15
  142. timeoutSeconds: 5
  143. livenessProbe:
  144. exec:
  145. command:
  146. - "zkOk.sh"
  147. initialDelaySeconds: 15
  148. timeoutSeconds: 5
  149. volumeMounts:
  150. - name: datadir
  151. mountPath: /var/lib/zookeeper
  152. securityContext:
  153. runAsUser: 1000
  154. fsGroup: 1000
  155. volumeClaimTemplates:
  156. - metadata:
  157. name: datadir
  158. annotations:
  159. volume.alpha.kubernetes.io/storage-class: anything
  160. spec:
  161. accessModes: ["ReadWriteOnce"]
  162. resources:
  163. requests:
  164. storage: 20Gi
  1. kubectl create -f zookeeper.yaml

详细的使用说明见 zookeeper stateful application

StatefulSet 注意事项

  1. 推荐在 Kubernetes v1.9 或以后的版本中使用
  2. 所有 Pod 的 Volume 必须使用 PersistentVolume 或者是管理员事先创建好
  3. 为了保证数据安全,删除 StatefulSet 时不会删除 Volume
  4. StatefulSet 需要一个 Headless Service 来定义 DNS domain,需要在 StatefulSet 之前创建好