BroadcastJob + Advanced CronJob Help You Maintain Kubernetes Nodes

Kubernetes node operation and maintenance is always a tedious work. For example, the available storage space in node is basically in a nearly monotonous decreasing trend in the native Kubernetes system. However, excessive disk pressure may lead to a series of problems, such as un-schedule of the nodes, and the eviction of pods, affecting the stability of the cluster.

Kubernetes job is obviously very suitable for this kind of one-time temporary work, such as cleaning up disk, because unlike the agent process running in host, Kubernetes job only needs to temporarily use some resources, and it will be automatically released the resources after the task is completed. But, Kubernetes native jobs have the following limitations in the scenarios of node operation and maintenance:

  1. Its default scheduling rule is unsuitable. Multiple pods may be scheduled to the same node, causing the problem of repeated execution of jobs;
  2. It cannot automatically perceive the scale of cluster nodes. When a node is added/deleted to/from the cluster, the job configuration must be updated manually.

Openkruise provides BroadcastJob and Advanced CronJob features to solve such problems. BroadcastJob allows users to schedule the pods in a way similar to DaemonSet. When a user apply a BroadcastJob, it will create pods for each worker node of the cluster by default, and these pods will be cleaned up automatically when the task is completed. Furthermore, Advanced CronJob can create the BroadcastJob periodically. This article will demonstrate how to use Advanced CronJob and BroadcastJob to periodically clean up useless images stored in Kubernetes nodes to help you understand these features.

Environment

We deployed a kind cluster on an ECS (host), and all kind nodes adopt containerd as container runtime. The kind cluster consists of three nodes, including one master node and two worker nodes:

  1. $ k get node
  2. NAME STATUS ROLES AGE VERSION
  3. control-plane Ready control-plane,master 42d v1.21.1
  4. worker1 Ready <none> 42d v1.21.1
  5. worker2 Ready <none> 42d v1.21.1

Before the demonstration, we should take a look at the disk pressure of ECS (host), to compare with the effect after demonstration:

  1. root@kruise:~# df -h
  2. Filesystem Size Used Avail Use% Mounted on
  3. udev 7.7G 0 7.7G 0% /dev
  4. tmpfs 1.6G 1.4M 1.6G 1% /run
  5. /dev/vda1 79G 63G 13G 84% /
  6. tmpfs 7.7G 0 7.7G 0% /dev/shm
  7. tmpfs 5.0M 0 5.0M 0% /run/lock
  8. tmpfs 7.7G 0 7.7G 0% /sys/fs/cgroup
  9. tmpfs 1.6G 0 1.6G 0% /run/user/0
  10. overlay 79G 63G 13G 84% /var/lib/docker/overlay2/94e3ec1c3a45a43e4ffa34c654bc3639007eb2fb5d4e9724fed056c6bb8d119f/merged
  11. overlay 79G 63G 13G 84% /var/lib/docker/overlay2/7718d5a17be239ade398f907f82acf2c90fb7752a90a667114a573c60757d23b/merged
  12. overlay 79G 63G 13G 84% /var/lib/docker/overlay2/0f78036c619c03fb37ec8029e5718bb206472971169bb2711bee06af21228763/merged
  13. overlay 79G 63G 13G 84% /var/lib/docker/overlay2/029e008a7c5b754e4246c8fc55bf189c83a0b8b1df50c2ecb67d1734095b935b/merged
  14. overlay 79G 63G 13G 84% /var/lib/docker/overlay2/899a50ca07b4e2de08d627dbb1e6f1cc9e1eb0c048a71c4905854f31bf51f056/merged
  15. overlay 79G 63G 13G 84% /var/lib/docker/overlay2/c72de0669810b5dcbf4b2726c0c32765fbbb1e4c21826f59533414fb474c826a/merged
  16. overlay 79G 63G 13G 84% /var/lib/docker/overlay2/af8c22b65e7ae64f15f0132baed91550adfe81cd4e088e2bb84e01476619340a/merged
  17. overlay 79G 63G 13G 84% /var/lib/docker/overlay2/454a7e90cb3c723dc6b22b0d54e60714700b4c0bcf947b29206d882c6a2c25fe/merged

Also, Let’s take a look at the images in the worker1 node. We can see that this node currently has 125 images:

  1. root@kruise:~# docker exec -it worker1 /bin/sh
  2. $ crictl images | wc -l
  3. 125
  4. $ crictl images
  5. REPOSITORY TAG IMAGE ID SIZE
  6. docker.io/minchou/cleaner v1 7e36ca8e9d40 68.6MB
  7. docker.io/minchou/rollout v0.7.3 120dc8c670ef 57MB
  8. docker.io/minchou/rollout v0.7.2 2f1f320cd94a 57MB
  9. docker.io/minchou/rollout v0.7.1 c90679a2e4ff 57MB
  10. docker.io/minchou/rollout v0.7.0 a81db48ec891 57MB
  11. docker.io/minchou/rollout v0.6.2 af5ef616c30e 55.9MB
  12. docker.io/minchou/rollout v0.6.1 71ba2e84e92e 55.9MB
  13. docker.io/minchou/rollout v0.6.0 3fe9eb8f0144 55.9MB
  14. ... .... ... ....

Advanced Cron Job Configuration

job.yaml

  1. apiVersion: apps.kruise.io/v1alpha1
  2. kind: AdvancedCronJob
  3. metadata:
  4. name: acj-test
  5. spec:
  6. schedule: "*/5 * * * *"
  7. startingDeadlineSeconds: 60
  8. template:
  9. broadcastJobTemplate:
  10. spec:
  11. template:
  12. spec:
  13. containers:
  14. - name: node-cleaner
  15. image: minchou/cleaner:v1
  16. imagePullPolicy: IfNotPresent
  17. env:
  18. # crictl use this env to find conatiner runtime socket.
  19. # this value should consistent with the path of mounted
  20. # container runtime socket file.
  21. - name: CONTAINER_RUNTIME_ENDPOINT
  22. value: unix:///var/run/containerd/containerd.sock
  23. volumeMounts:
  24. # mount container runtime socket file to this path.
  25. - name: containerd
  26. mountPath: /var/run/containerd
  27. volumes:
  28. - name: containerd
  29. hostPath:
  30. path: /var/run/containerd
  31. restartPolicy: OnFailure
  32. completionPolicy:
  33. type: Always
  34. ttlSecondsAfterFinished: 90
  35. failurePolicy:
  36. type: Continue
  37. restartLimit: 3

Because we need to get the containerd.socket to execute image cleaning commands such as crictl rmi in the pod. Therefore, the containerd socket file of host must be mounted to the pod in the way of hostPath. If other types of containers are used on your host, you also need to mount them to the pods in this way.

Similarly, if your application log is also written directly under the host path, you can also mount it in this way and clean it together.

In order to make it easier for us to observe the operation of Advanced CronJob, we define its schedule period 5 minutes, that is, the schedule field is defined as * / 5 * * *. In fact, in the real scene, we can clean it every few days or weeks instead of 5 minutes. You can refer to cron expression to customize the schedule.

Build Image

File directory structure:

  1. $ tree
  2. .
  3. ├── Dockerfile
  4. ├── cleaner.sh
  5. └── crictl-v1.23.0-linux-amd64.tar.gz

In order to build the image faster, we downloaded crictl-v1.23.0-linux-amd64.tar.gz and put it in the same directory as Dockerfile.

Script Sample

Note: if it is used in the production, please strictly verify your script!

cleaner.sh

  1. #!/bin/sh
  2. echo "container runtime endpoint:" $CONTAINER_RUNTIME_ENDPOINT
  3. # clean up docker resources if have
  4. crictl ps > /dev/null
  5. if [ $? -eq 0 ]
  6. then
  7. # Implement your customized script here, such as:
  8. # get the images that is used, these images cannot be deleted
  9. crictl ps | awk '{if(NR>1){print $2}}' > used-images.txt
  10. # @@ You can choose the images you want to clean according to your requirement @@
  11. # ** Here, we will clean all images from my docker.io/minchou repo! **
  12. crictl images | grep -i "docker.io/minchou"| awk '{print $3}' > target-images.txt
  13. # filter out the used images and delete thoese unused images
  14. sort target-images.txt used-images.txt used-images.txt| uniq -u | xargs -r crictl rmi
  15. else
  16. echo "crictl does not exist"
  17. fi
  18. exit 0

Dockerfile Sample

  1. FROM alpine
  2. COPY crictl-v1.23.0-linux-amd64.tar.gz ./
  3. RUN tar zxvf crictl-v1.23.0-linux-amd64.tar.gz -C /bin && rm crictl-v1.23.0-linux-amd64.tar.gz
  4. COPY cleaner.sh /bin/
  5. RUN chmod +x /bin/cleaner.sh
  6. CMD ["bash", "/bin/cleaner.sh"]

Results Show

Build the image and upload it to your own image repo. Here, take my own docker hub repo as an example:

  1. $ docker build . -t minchou/cleaner:v1 && docker push minchou/cleaner:v1

Then apply the Advanced CronJob configuration:

  1. $ kubectl apply -f job.yaml
  2. advancedcronjob.apps.kruise.io/acj-test created

We can see that the next execution time is 2022-03-24 08:50:00 +0000 UTC in kruise log:

  1. $ kubectl -n kruise-system logs kruise-controller-manager-745594ff76-9nwwx --tail 1000 | grep "no upcoming scheduled times, sleeping until next now"
  2. I0324 08:45:08.131928 1 advancedcronjob_broadcastjob_controller.go:290] no upcoming scheduled times, sleeping until next now 2022-03-24 08:45:08.131896998 +0000 UTC m=+535162.957711312 and next run 2022-03-24 08:50:00 +0000 UTC default/acj-test

When the time is up, the advanced cronjob applied a BroadcastJob, and let’s take a look at the log of the pod that is created by BroadcastJob for worker1 node:

  1. $ kubectl logs acj-test-1648111800-8t8bx
  2. container runtime endpoint: unix:///var/run/containerd/containerd.sock
  3. Deleted: docker.io/minchou/rollout:v0.2.7
  4. Deleted: docker.io/minchou/rollout:v0.4.1
  5. Deleted: docker.io/minchou/rollout:v0.7.3
  6. Deleted: docker.io/minchou/rollout:br-5
  7. Deleted: docker.io/minchou/rollout:v0.4.2
  8. Deleted: docker.io/minchou/kruiserollout:br-f
  9. Deleted: docker.io/minchou/rollout:v0.7.2
  10. Deleted: docker.io/minchou/rollout:v0.4.0
  11. Deleted: docker.io/minchou/rollout:v0.3.8
  12. Deleted: docker.io/minchou/rollout:v0.3.0
  13. Deleted: docker.io/minchou/kruiserollout:br-2
  14. Deleted: docker.io/minchou/rollout:br-3
  15. ... ... ... ...

we can see that cleaner.sh script works, the target image has been deleted. Then, let’s take a look at the disk pressure of ECS (host):

  1. root@kruise011162126109:~# df -h
  2. Filesystem Size Used Avail Use% Mounted on
  3. udev 7.7G 0 7.7G 0% /dev
  4. tmpfs 1.6G 1.4M 1.6G 1% /run
  5. /dev/vda1 79G 44G 32G 59% /
  6. tmpfs 7.7G 0 7.7G 0% /dev/shm
  7. tmpfs 5.0M 0 5.0M 0% /run/lock
  8. tmpfs 7.7G 0 7.7G 0% /sys/fs/cgroup
  9. tmpfs 1.6G 0 1.6G 0% /run/user/0
  10. overlay 79G 44G 32G 59% /var/lib/docker/overlay2/94e3ec1c3a45a43e4ffa34c654bc3639007eb2fb5d4e9724fed056c6bb8d119f/merged
  11. overlay 79G 44G 32G 59% /var/lib/docker/overlay2/7718d5a17be239ade398f907f82acf2c90fb7752a90a667114a573c60757d23b/merged
  12. overlay 79G 44G 32G 59% /var/lib/docker/overlay2/0f78036c619c03fb37ec8029e5718bb206472971169bb2711bee06af21228763/merged
  13. overlay 79G 44G 32G 59% /var/lib/docker/overlay2/029e008a7c5b754e4246c8fc55bf189c83a0b8b1df50c2ecb67d1734095b935b/merged
  14. overlay 79G 44G 32G 59% /var/lib/docker/overlay2/899a50ca07b4e2de08d627dbb1e6f1cc9e1eb0c048a71c4905854f31bf51f056/merged
  15. overlay 79G 44G 32G 59% /var/lib/docker/overlay2/c72de0669810b5dcbf4b2726c0c32765fbbb1e4c21826f59533414fb474c826a/merged
  16. overlay 79G 44G 32G 59% /var/lib/docker/overlay2/af8c22b65e7ae64f15f0132baed91550adfe81cd4e088e2bb84e01476619340a/merged
  17. overlay 79G 44G 32G 59% /var/lib/docker/overlay2/454a7e90cb3c723dc6b22b0d54e60714700b4c0bcf947b29206d882c6a2c25fe/merged

It can be seen that the disk pressure has decreased from 84% to 59%, which is very significant. Finally, we also can find out the next execution time from kruise’s log, the next execution is really 5 minutes later (2022-03-24 08:55:00 + 0000 UTC):

  1. $ kubectl -n kruise-system logs kruise-controller-manager-745594ff76-9nwwx --tail 1000 | grep "no upcoming scheduled times, sleeping until next now"
  2. I0324 08:50:02.226008 1 advancedcronjob_broadcastjob_controller.go:290] no upcoming scheduled times, sleeping until next now 2022-03-24 08:50:02.225973654 +0000 UTC m=+535457.051787976 and next run 2022-03-24 08:55:00 +0000 UTC default/acj-test

Conclusion

From the above demonstration, we can see that the Advanced Cronjob + BroadcastJob + Customized Script can help you clean up useless images of nodes periodically. Of course, this is just a simple example of node operation and maintenance. If you encounter the similar problems, I hope this article can help and inspire you.