Performance Test Setup for Karmada

Abstract

As Karmada is being implemented in more and more enterprises and organizations, scalability and scale of Karmada is gradually becoming new concerns for the community. In this article, we will introduce how to conduct large-scale testing for Karmada and how to monitor metrics from Karmada control plane.

Build large scale environment

Create member clusters using kind

Why kind

Kind is a tool for running local Kubernetes clusters using Docker containers. Kind was primarily designed for testing Kubernetes itself, so it play a good role in simulating member clusters.

Usage

Follow the kind installation guide.

Create 10 member clusters:

  1. for ((i=1; i<=10; i ++)); do
  2. kind create cluster --name member$i
  3. done;

Simulate a large number of fake nodes using fake-kubelet

Why fake-kubelet

Compare to Kubemark

Kubemark is directly implemented with the code of kubelet, replacing the runtime part, except that it does not actually start the container, other behaviors are exactly the same as kubelet, mainly used for Kubernetes own e2e test, simulating a large number of nodes and pods will occupy the same memory as the real scene.

Fake-kubelet is a tool used to simulate any number of nodes and maintain pods on those nodes. It only does the minimum work of maintaining nodes and pods, so that it is very suitable for simulating a large number of nodes and pods for pressure testing on the control plane.

Usage

Deploy the fake-kubelet:

Note: Set container ENV GENERATE_REPLICAS in fake-kubelet deployment to set node replicas you want to create

  1. export GENERATE_REPLICAS=your_replicas
  2. curl https://raw.githubusercontent.com/wzshiming/fake-kubelet/master/deploy.yaml > fakekubelet.yml
  3. # GENERATE_REPLICAS default value is 5
  4. sed -i "s/5/$GENERATE_REPLICAS/g" fakekubelet.yml
  5. kubectl apply -f fakekubelet.yml

kubectl get node You will find fake nodes.

  1. > kubectl get node -o wide
  2. NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
  3. fake-0 Ready agent 10s fake 10.88.0.136 <none> <unknown> <unknown> <unknown>
  4. fake-1 Ready agent 10s fake 10.88.0.136 <none> <unknown> <unknown> <unknown>
  5. fake-2 Ready agent 10s fake 10.88.0.136 <none> <unknown> <unknown> <unknown>
  6. fake-3 Ready agent 10s fake 10.88.0.136 <none> <unknown> <unknown> <unknown>
  7. fake-4 Ready agent 10s fake 10.88.0.136 <none> <unknown> <unknown> <unknown>

Deploy an sample deployment to test:

  1. > kubectl apply -f - <<EOF
  2. apiVersion: apps/v1
  3. kind: Deployment
  4. metadata:
  5. name: fake-pod
  6. namespace: default
  7. spec:
  8. replicas: 4
  9. selector:
  10. matchLabels:
  11. app: fake-pod
  12. template:
  13. metadata:
  14. labels:
  15. app: fake-pod
  16. spec:
  17. affinity:
  18. nodeAffinity:
  19. requiredDuringSchedulingIgnoredDuringExecution:
  20. nodeSelectorTerms:
  21. - matchExpressions:
  22. - key: type
  23. operator: In
  24. values:
  25. - fake-kubelet
  26. tolerations: # A taints was added to an automatically created Node. You can remove taints of Node or add this tolerations
  27. - key: "fake-kubelet/provider"
  28. operator: "Exists"
  29. effect: "NoSchedule"
  30. # nodeName: fake-0 # Or direct scheduling to a fake node
  31. containers:
  32. - name: fake-pod
  33. image: fake
  34. EOF

kubectl get pod You will find that it has been started, although the image does not exist.

  1. > kubectl get pod -o wide
  2. NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
  3. fake-pod-78884479b7-52qcx 1/1 Running 0 6s 10.0.0.23 fake-4 <none> <none>
  4. fake-pod-78884479b7-bd6nk 1/1 Running 0 6s 10.0.0.13 fake-2 <none> <none>
  5. fake-pod-78884479b7-dqjtn 1/1 Running 0 6s 10.0.0.15 fake-2 <none> <none>
  6. fake-pod-78884479b7-h2fv6 1/1 Running 0 6s 10.0.0.31 fake-0 <none> <none>

Distribute resources using ClusterLoader2

ClusterLoader2

ClusterLoader2 is an open source Kubernetes cluster testing tool. It tests against Kubernetes-defined SLIs/SLOs metrics to verify that clusters meet various quality of service standards. ClusterLoader2 is a tool oriented single cluster, it is complex to test karmada control plane meanwhile distribute resouces to member clusters. Therefore, we just use the ClusterLoader2 to distribute resources to clusters managed by karmada.

Prepare a simple config

Let’s prepare our config (config.yaml) to distribute resources. This config will:

  • Create 10 namespace

  • Create 20 deployments with 1000 pods inside that namespace

We will create file config.yaml that describes this test. First we need to start with defining test name:

  1. name: test

ClusterLoader2 will create namespaces automatically, but we need to specify how many namespaces we want and whether delete the namespaces after distributing resources:

  1. namespace:
  2. number: 10
  3. deleteAutomanagedNamespaces: false

Next, we need to specify TuningSets. TuningSet describes how actions are executed. The qps means 1/qps s per action interval. In order to distribute resources slowly to relieve the pressure on the apiserver, the qps of Uniformtinyqps is set to 0.1, which means that after distributing a deployment, we wait 10s before continuing to distribute the next deployment.

  1. tuningSets:
  2. - name: Uniformtinyqps
  3. qpsLoad:
  4. qps: 0.1
  5. - name: Uniform1qps
  6. qpsLoad:
  7. qps: 1

Finally, we will create a phase that creates deployment and propagation policy. We need to specify in which namespaces we want the deployment and propagation policy to be created, how many of these deployments per namespace. Also, we will need to specify template for our deployment and propagation policy , which we will do later. For now, let’s assume that this template allows us to specify numbers of replicas in deployment and propagation policy.

  1. steps:
  2. - name: Create deployment
  3. phases:
  4. - namespaceRange:
  5. min: 1
  6. max: 10
  7. replicasPerNamespace: 20
  8. tuningSet: Uniformtinyqps
  9. objectBundle:
  10. - basename: test-deployment
  11. objectTemplatePath: "deployment.yaml"
  12. templateFillMap:
  13. Replicas: 1000
  14. - namespaceRange:
  15. min: 1
  16. max: 10
  17. replicasPerNamespace: 1
  18. tuningSet: Uniform1qps
  19. objectBundle:
  20. - basename: test-policy
  21. objectTemplatePath: "policy.yaml"
  22. templateFillMap:
  23. Replicas: 1

The whole config.yaml will look like this:

  1. name: test
  2. namespace:
  3. number: 10
  4. deleteAutomanagedNamespaces: false
  5. tuningSets:
  6. - name: Uniformtinyqps
  7. qpsLoad:
  8. qps: 0.1
  9. - name: Uniform1qps
  10. qpsLoad:
  11. qps: 1
  12. steps:
  13. - name: Create deployment
  14. phases:
  15. - namespaceRange:
  16. min: 1
  17. max: 10
  18. replicasPerNamespace: 20
  19. tuningSet: Uniformtinyqps
  20. objectBundle:
  21. - basename: test-deployment
  22. objectTemplatePath: "deployment.yaml"
  23. templateFillMap:
  24. Replicas: 1000
  25. - namespaceRange:
  26. min: 1
  27. max: 10
  28. replicasPerNamespace: 1
  29. tuningSet: Uniform1qps
  30. objectBundle:
  31. - basename: test-policy
  32. objectTemplatePath: "policy.yaml"

Now, we need to specify deployment and propagation template. ClusterLoader2 by default adds parameter Name that you can use in your template. In our config, we also passed Replicas parameter. So our template for deployment and propagation policy will look like following:

  1. # deployment.yaml
  2. apiVersion: apps/v1
  3. kind: Deployment
  4. metadata:
  5. name: {{.Name}}
  6. labels:
  7. group: test-deployment
  8. spec:
  9. replicas: {{.Replicas}}
  10. selector:
  11. matchLabels:
  12. app: fake-pod
  13. template:
  14. metadata:
  15. labels:
  16. app: fake-pod
  17. spec:
  18. affinity:
  19. nodeAffinity:
  20. requiredDuringSchedulingIgnoredDuringExecution:
  21. nodeSelectorTerms:
  22. - matchExpressions:
  23. - key: type
  24. operator: In
  25. values:
  26. - fake-kubelet
  27. tolerations: # A taints was added to an automatically created Node. You can remove taints of Node or add this tolerations
  28. - key: "fake-kubelet/provider"
  29. operator: "Exists"
  30. effect: "NoSchedule"
  31. containers:
  32. - image: fake-pod
  33. name: {{.Name}}
  1. # policy.yaml
  2. apiVersion: policy.karmada.io/v1alpha1
  3. kind: PropagationPolicy
  4. metadata:
  5. name: test
  6. spec:
  7. resourceSelectors:
  8. - apiVersion: apps/v1
  9. kind: Deployment
  10. placement:
  11. replicaScheduling:
  12. replicaDivisionPreference: Weighted
  13. replicaSchedulingType: Divided

Start Distributing

To distributing resources, run:

  1. export KARMADA_APISERVERCONFIG=your_config
  2. export KARMADA_APISERVERIP=your_ip
  3. cd clusterloader2/
  4. go run cmd/clusterloader.go --testconfig=config.yaml --provider=local --kubeconfig=$KARMADA_APISERVERCONFIG --v=2 --k8s-clients-number=1 --skip-cluster-verification=true --masterip=$KARMADA_APISERVERIP --enable-exec-service=false

The meaning of args above shows as following:

  • k8s-clients-number: the number of karmada apiserver client number.
  • skip-cluster-verification: whether to skip the cluster verification, which expects at least one schedulable node in the cluster.
  • enable-exec-service: whether to enable exec service that allows executing arbitrary commands from a pod running in the cluster.

Since the resources of member cluster cannot be accessed in karmada control plane, we have to turn off enable-exec-service and cluster-verification.

Note: If the deleteAutomanagedNamespaces parameter in config file is set to true, when the whole distribution of resources is complete, the resources will be immediately deleted.

Monitor Karmada control plane using Prometheus and Grafana

Deploy Prometheus and Grafana

Follow the Prometheus and Grafana Deploy Guide

Create Grafana DashBoards to observe Karmada control plane metrics

Here’s an example to monitor the mutating api call latency for works and resourcebindings of the karmada apiserver through grafana. Monitor the metrics you want by modifying the Query statement.

Create a dashboard

Follw the Grafana support For Prometheus document.

Modify Query Statement

Enter the following Prometheus expression into the Query field.

  1. histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{verb!="WATCH|GET|LIST", resource~="works|resourcebindings"}[5m])) by (resource, verb, le))

The gragh will show as follow:

grafana-dashboard