Determine cluster failures

Karmada supports both Push and Pull modes to manage member clusters. More details about cluster registration please refer to Cluster Registration.

For clusters there are two forms of heartbeats:

  • updates to the .status of a Cluster(include both Push and Pull mode).
  • Lease objects within the karmada-cluster namespace in karmada control plane. Each Pull mode cluster has an associated Lease object.

Collect cluster status

For Push mode clusters, the clusterStatus controller in karmada control plane will continually collect cluster’s status for a configured interval. For Pull mode clusters, the karmada-agent is responsible for creating and updating the .status of clusters with configured interval.

The interval for .status updates to Cluster can be configured via --cluster-status-update-frequency flag(default is 10 seconds).

Cluster’s Ready condition might be set to the False with following conditions:

  • cluster is unreachable for a period of configured interval.
  • cluster’s health endpoint responded without ok persists for a period of configured interval.

The above two intervals can be configured via --cluster-failure-threshold flag(default is 30 seconds).

Update cluster Lease object

Karmada will create a Lease object and a lease controller for each Pull mode cluster when clusters are joined.

Each lease controller is responsible for updating the related Leases. The lease renewing time can be configured via --cluster-lease-duration and --cluster-lease-renew-interval-fraction flags(default is 10 seconds).

Lease’s updating process is independent with cluster’s status updating process, since cluster’s .status field is maintained by clusterStatus controller.

The cluster controller in Karmada control plane would check the state of each pull mode cluster every --cluster-monitor-period period(default is 5 seconds).

The cluster’s Ready condition would be changed to Unknown when cluster controller has not heard from the cluster in the last --cluster-monitor-grace-period(default is 40 seconds).

Check cluster status

You can use kubectl to check a Cluster’s status and other details:

  1. kubectl describe cluster <cluster-name>

The Ready condition in Status field indicates the cluster is healthy and ready to accept workloads. It will be set to False if the cluster is not healthy and is not accepting workloads, and Unknown if the cluster controller has not heard from the cluster in the last cluster-monitor-grace-period.

The following example describes an unhealthy cluster:

unfold me to see the yaml

  1. kubectl describe cluster member1
  2. Name: member1
  3. Namespace:
  4. Labels: <none>
  5. Annotations: <none>
  6. API Version: cluster.karmada.io/v1alpha1
  7. Kind: Cluster
  8. Metadata:
  9. Creation Timestamp: 2021-12-29T08:49:35Z
  10. Finalizers:
  11. karmada.io/cluster-controller
  12. Resource Version: 152047
  13. UID: 53c133ab-264e-4e8e-ab63-a21611f7fae8
  14. Spec:
  15. API Endpoint: https://172.23.0.7:6443
  16. Impersonator Secret Ref:
  17. Name: member1-impersonator
  18. Namespace: karmada-cluster
  19. Secret Ref:
  20. Name: member1
  21. Namespace: karmada-cluster
  22. Sync Mode: Push
  23. Status:
  24. Conditions:
  25. Last Transition Time: 2021-12-31T03:36:08Z
  26. Message: cluster is not reachable
  27. Reason: ClusterNotReachable
  28. Status: False
  29. Type: Ready
  30. Events: <none>