Pod Scheduling Readiness

FEATURE STATE: Kubernetes v1.26 [alpha]

Pods were considered ready for scheduling once created. Kubernetes scheduler does its due diligence to find nodes to place all pending Pods. However, in a real-world case, some Pods may stay in a “miss-essential-resources” state for a long period. These Pods actually churn the scheduler (and downstream integrators like Cluster AutoScaler) in an unnecessary manner.

By specifying/removing a Pod’s .spec.schedulingGates, you can control when a Pod is ready to be considered for scheduling.

Configuring Pod schedulingGates

The schedulingGates field contains a list of strings, and each string literal is perceived as a criteria that Pod should be satisfied before considered schedulable. This field can be initialized only when a Pod is created (either by the client, or mutated during admission). After creation, each schedulingGate can be removed in arbitrary order, but addition of a new scheduling gate is disallowed.

stateDiagram-v2 s1: pod created s2: pod scheduling gated s3: pod scheduling ready s4: pod running if: empty scheduling gates? [*] —> s1 s1 —> if s2 —> if: scheduling gate removed if —> s2: no if —> s3: yes s3 —> s4 s4 —> [*]

JavaScript must be enabled to view this content

Usage example

To mark a Pod not-ready for scheduling, you can create it with one or more scheduling gates like this:

pods/pod-with-scheduling-gates.yaml Pod Scheduling Readiness - 图1

  1. apiVersion: v1
  2. kind: Pod
  3. metadata:
  4. name: test-pod
  5. spec:
  6. schedulingGates:
  7. - name: foo
  8. - name: bar
  9. containers:
  10. - name: pause
  11. image: registry.k8s.io/pause:3.6

After the Pod’s creation, you can check its state using:

  1. kubectl get pod test-pod

The output reveals it’s in SchedulingGated state:

  1. NAME READY STATUS RESTARTS AGE
  2. test-pod 0/1 SchedulingGated 0 7s

You can also check its schedulingGates field by running:

  1. kubectl get pod test-pod -o jsonpath='{.spec.schedulingGates}'

The output is:

  1. [{"name":"foo"},{"name":"bar"}]

To inform scheduler this Pod is ready for scheduling, you can remove its schedulingGates entirely by re-applying a modified manifest:

pods/pod-without-scheduling-gates.yaml Pod Scheduling Readiness - 图2

  1. apiVersion: v1
  2. kind: Pod
  3. metadata:
  4. name: test-pod
  5. spec:
  6. containers:
  7. - name: pause
  8. image: registry.k8s.io/pause:3.6

You can check if the schedulingGates is cleared by running:

  1. kubectl get pod test-pod -o jsonpath='{.spec.schedulingGates}'

The output is expected to be empty. And you can check its latest status by running:

  1. kubectl get pod test-pod -o wide

Given the test-pod doesn’t request any CPU/memory resources, it’s expected that this Pod’s state get transited from previous SchedulingGated to Running:

  1. NAME READY STATUS RESTARTS AGE IP NODE
  2. test-pod 1/1 Running 0 15s 10.0.0.4 node-2

Observability

The metric scheduler_pending_pods comes with a new label "gated" to distinguish whether a Pod has been tried scheduling but claimed as unschedulable, or explicitly marked as not ready for scheduling. You can use scheduler_pending_pods{queue="gated"} to check the metric result.

What’s next