PodUnavailableBudget
FEATURE STATE: Kruise v0.10.0
在诸多Voluntary Disruption 场景中 Kubernetes Pod Disruption Budget 通过限制同时中断的Pod数量,来保证应用的高可用性。然而,PDB只能防控通过 Eviction API 来触发的Pod Disruption,例如:kubectl drain驱逐node上面的所有Pod。
但在如下voluntary disruption场景中,即便有kubernetes PDB防护依然将会导致业务中断、服务降级:
- 应用owner通过deployment正在进行版本升级,与此同时集群管理员由于机器资源利用率过低正在进行node缩容。
- 中间件团队利用sidecarSet正在原地升级集群中的sidecar版本(例如:ServiceMesh envoy),同时HPA正在对同一批应用进行缩容。
- 应用owner和中间件团队利用cloneSet、sidecarSet原地升级的能力,正在对同一批Pod进行升级。
在上面这些 kubernetes PDB 无法很好防护的场景中,Kruise PodUnavailableBudget 通过对Pod Mutating Webhook的拦截,能够覆盖更多的Voluntary Disruption场景,进而提供应用更加强大的防护能力。
API定义
apiVersion: apps.kruise.io/v1alpha1kind: PodUnavailableBudgetmetadata:name: web-server-pubnamespace: webspec:targetRef:apiVersion: apps.kruise.io/v1alpha1# cloneset, deployment, statefulset etc.kind: CloneSetname: web-server# selector label query over pods managed by the budget# selector and TargetReference are mutually exclusive, targetRef is priority to take effect.# selector is commonly used in scenarios where applications are deployed using multiple workloads,# and targetRef is used for protection against a single workload.# selector:# matchLabels:# app: web-server# maximum number of Pods unavailable for the current cloneset, the example is cloneset.replicas(5) * 60% = 3# maxUnavailable and minAvailable are mutually exclusive, maxUnavailable is priority to take effectmaxUnavailable: 60%# Minimum number of Pods available for the current cloneset, the example is cloneset.replicas(5) * 40% = 2# minAvailable: 40%-----------------------apiVersion: apps.kruise.io/v1alpha1kind: CloneSetmetadata:labels:app: web-servername: web-servernamespace: webspec:replicas: 5selector:matchLabels:app: web-servertemplate:metadata:labels:app: web-serverspec:containers:- name: nginximage: nginx:alpine
支持自定义Workload
FEATURE STATE: Kruise v1.2.0
很多公司为满足复杂性更高的应用部署需求,往往会通过实现定制化Workload的方式来管理业务Pod。从kruise v1.2.0开始,pub能够防护实现了scale子资源的自定义Workload,如下防护Argo-Rollout:
apiVersion: policy.kruise.io/v1alpha1kind: PodUnavailableBudgetmetadata:name: rollouts-demospec:targetRef:apiVersion: argoproj.io/v1alpha1kind: Rolloutname: rollouts-demominAvailable: 80%
Implementation
PUB实现原理如下,详细设计请参考:Pub Proposal

Comparison with Kubernetes native PDB
Kubernetes PDB是通过Eviction API接口来实现Pod安全防护,而Kruise PDB则是拦截了Pod Validating Request来实现诸多Voluntary Disruption场景的防护能力。 Kruise PUB包含了PDB的所有能力(防护Pod Eviction),业务可以根据需要两者同时使用,也可以单独使用Kruise PUB(推荐方式)。
feature-gates
PodUnavailableBudget Pod安全防护默认是关闭的,如果要开启请通过设置 feature-gates PodUnavailableBudgetDeleteGate 和 PodUnavailableBudgetUpdateGate.
$ helm install kruise https://... --set featureGates="PodUnavailableBudgetDeleteGate=true\,PodUnavailableBudgetUpdateGate=true"
PodUnavailableBudget Status
# kubectl describe podunavailablebudgets web-server-pubName: web-server-pubKind: PodUnavailableBudgetStatus:unavailableAllowed: 3 # unavailableAllowed number of pod unavailable that are currently allowedcurrentAvailable: 5 # currentAvailable current number of available podsdesiredAvailable: 2 # desiredAvailable minimum desired number of available podstotalReplicas: 5 # totalReplicas total number of pods counted by this PUB