Citadel Health Checking

You can enable Citadel’s health checking featureto detect the failures of the Citadel CSR (Certificate Signing Request) service.When a failure is detected, Kubelet automatically restarts the Citadel container.

When the health checking feature is enabled,the prober client module in Citadel periodically checks the health status of Citadel’s CSR gRPC server.It does this by sending CSRs to the gRPC server and verifies the responses.If Citadel is healthy, the prober client updates the modification time of the health status file.Otherwise, it does nothing. Citadel relies on aKubernetes liveness and readiness probewith command line to check the modification time of the health status file on the pod.If the file is not updated for a period, Kubelet will restart the Citadel container.

Since Citadel health checking currently only monitors the health status of CSR service API,this feature is not needed if the production setup is not usingSDS or adding virtual machines.

Before you begin

Follow the Istio installation guide to install Istio with mutual TLS enabled.

Deploying Citadel with health checking

To enable health checking, redeploy Citadel:

  1. $ istioctl manifest generate --set values.global.mtls.enabled=true,values.security.citadelHealthCheck=true > citadel-health-check.yaml
  2. $ kubectl apply -f citadel-health-check.yaml

Verify that health checking works

Citadel will log the health checking results. Run the following in command line:

  1. $ kubectl logs `kubectl get po -n istio-system | grep istio-citadel | awk '{print $1}'` -n istio-system | grep "CSR signing service"

You will see the output similar to:

  1. ... CSR signing service is healthy (logged every 100 times).

The log above indicates the periodic health checking is working.The default health checking interval is 15 seconds and is logged once every 100 checks.

(Optional) Configuring the health checking

This section talks about how to modify the health checking configuration. Open the filecitadel-health-check.yaml, and locate the following lines.

  1. ...
  2. - --liveness-probe-path=/tmp/ca.liveness # path to the liveness health checking status file
  3. - --liveness-probe-interval=60s # interval for health checking file update
  4. - --probe-check-interval=15s # interval for health status check
  5. livenessProbe:
  6. exec:
  7. command:
  8. - /usr/local/bin/istio_ca
  9. - probe
  10. - --probe-path=/tmp/ca.liveness # path to the liveness health checking status file
  11. - --interval=125s # the maximum time gap allowed between the file mtime and the current sys clock.
  12. initialDelaySeconds: 60
  13. periodSeconds: 60
  14. ...

The paths to the health status files are liveness-probe-path and probe-path.You should update the paths in Citadel and in the livenessProbe at the same time.If Citadel is healthy, the value of the liveness-probe-interval entry determines the interval used to update thehealth status file.The Citadel health checking controller uses the value of the probe-check-interval entry to determine the interval tocall the Citadel CSR service.The interval is the maximum time elapsed since the last update of the health status file, for the prober to considerCitadel as healthy.The values in the initialDelaySeconds and periodSecondsentries determine the initial delay and the interval betweeneach activation of the livenessProbe.

Prolonging probe-check-interval will reduce the health checking overhead, but there will be a greater lagging for theprober to get notified on the unhealthy status.To avoid the prober restarting Citadel due to temporary unavailability, the interval on the prober can beconfigured to be more than N times of the liveness-probe-interval. This will allow the prober to tolerate N-1continuously failed health checks.

Cleanup

  • To disable health checking on Citadel:
  1. $ istioctl manifest apply --set values.global.mtls.enabled=true

See also

Health Checking of Istio Services

Shows how to do health checking for Istio services.

DNS Certificate Management

Provision and manage DNS certificates in Istio.

Introducing the Istio v1beta1 Authorization Policy

Introduction, motivation and design principles for the Istio v1beta1 Authorization Policy.

Secure Webhook Management

A more secure way to manage Istio webhooks.

Multi-Mesh Deployments for Isolation and Boundary Protection

Deploy environments that require isolation into separate meshes and enable inter-mesh communication by mesh federation.

App Identity and Access Adapter

Using Istio to secure multi-cloud Kubernetes applications with zero code changes.