Pool Coordinator Performance Test

Background

Pool Coordinator is an important component in edge node pool. OpenYurt uses pool coordinator to select a yurthub master and backup the resources in edge node pool.

In this article, we test the performance of pool-coordinator pod and give a suggestion resource configuration.

Test Environment

Kubernetes Version

Major:"1", Minor:"22", GitVersion:"v1.22.0", GitCommit:"c2b5237ccd9c0f1d600d3072634ca66cefdf272f", GitTreeState:"clean", BuildDate:"2021-08-04T17:57:25Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"

Node Configuration

Master and work node are virtual machines run on VMWare Fusion.

Operating System

MasterNode
LSB Version:core-4.1-amd64:core-4.1-noarch:core-4.1-amd64:core-4.1-noarch
Distributor IDCentOSCentOS
DescriptionCentOS Linux release 8.4.2105CentOS Linux release 8.4.2105
Release8.4.21058.4.2105

CPU

MasterNode
Architecturex86_64x86_64
CPU op-mode (s)32-bit, 64-bit32-bit, 64-bit
Byte OrderLittle EndianLittle Endian
CPU (s)44
On-line CPU(s) list0-30-3
Thread(s) per core11
Core(s) per socket11
Socket(s)44
NUMA node(s)11
Vendor IDGenuineIntelGenuineIntel
CPU family66
Model158158
Model nameIntel(R) Core(TM) i7-9750H CPU @ 2.60GHzIntel(R) Core(TM) i7-9750H CPU @ 2.60GHz
Stepping1010
CPU MHz2592.0002592.000

Memory

MasterNode
Total memory7829472 K7829472 K

Disk

MasterNode
Total Size20GiB20GiB

Test Method

  • Start the pool-coordinator pod and record the beginning resource used.
  • Write a mount of resources and record the resource used of pool-coordinator. In this test, we write 1000 pods and 500 nodes into pool-coordinator. The size of each pod and node both are 8KB.
  • Delete all resources in pool-coordinator. To see whether resource used will go down to the beginning level.
  • Rewrite a mount of resources to pool-coordinator and patch them frequently and randomly. Check the resource used of current situation.
  • Check the result of leader election.

Test Result

Phase one

Start pool-coordinator pod and record the beginning resource used.

  • CPU used: 70m ~ 90m
  • mem used: 370MB.
    • apiserver used 205MB.
    • etcd used 165MB.

Pool Coordinator Performance Test - 图1 Pool Coordinator Performance Test - 图2 Pool Coordinator Performance Test - 图3 Pool Coordinator Performance Test - 图4

Phase two

Write a mount of resources and record the resource used of pool-coordinator. In this test, we write 1000 pods and 500 nodes into pool-coordinator. The size of each pod and node both are 8KB.

  • top CPU used: 310m
  • top mem used: 450MB.
    • apiserver used 240MB.
    • etcd used 210MB.

Pool Coordinator Performance Test - 图5 Pool Coordinator Performance Test - 图6 Pool Coordinator Performance Test - 图7 Pool Coordinator Performance Test - 图8

Phase three

Delete all resources in pool-coordinator. To see whether resource used will go down to the beginning level.

  • top CPU used: 260m
  • top mem used: 590MB.
    • apiserver used 350MB.
    • etcd used 240MB.

Pool Coordinator Performance Test - 图9 Pool Coordinator Performance Test - 图10 Pool Coordinator Performance Test - 图11 Pool Coordinator Performance Test - 图12

Phase four

Rewrite a mount of resources to pool-coordinator and patch them frequently and randomly. Check the resource used of current situation.

  • top CPU used: 640m.
  • mem used rise continually and result in etcd container OOM.

Pool Coordinator Performance Test - 图13 Pool Coordinator Performance Test - 图14 Pool Coordinator Performance Test - 图15 Pool Coordinator Performance Test - 图16

Phase five

Run other program to do leader election and check the result.

Go code run 500 goroutines and every goroutine do the same things to acquire leader. After acquiring leader successfully, go client sleep 1 second and quit.

We can see that program acquired leader successfully. After one client quited, the other go client can aquire leader successfully.

  1. I1212 14:58:43.652733 41875 leaderelection.go:258] successfully acquired lease default/test-lock
  2. I1212 14:58:43.652766 41875 main.go:656] new leader elected: ff43ffde-3551-47d6-b2af-1fa3ef115b86
  3. I1212 14:58:43.652779 41875 main.go:562] Controller loop...
  4. I1212 14:58:44.653060 41875 main.go:564] Controller quit.
  5. I1212 14:58:44.662196 41875 main.go:648] leader lost: ff43ffde-3551-47d6-b2af-1fa3ef115b86
  6. I1212 14:58:44.679782 41875 leaderelection.go:258] successfully acquired lease default/test-lock
  7. I1212 14:58:44.679826 41875 main.go:656] new leader elected: 76870bb5-eaa0-44b0-a8a8-203c36a2d373
  8. I1212 14:58:44.679915 41875 main.go:562] Controller loop...
  9. I1212 14:58:45.680211 41875 main.go:564] Controller quit.
  10. I1212 14:58:45.686105 41875 main.go:648] leader lost: 76870bb5-eaa0-44b0-a8a8-203c36a2d373
  11. I1212 14:58:45.697108 41875 leaderelection.go:258] successfully acquired lease default/test-lock
  12. I1212 14:58:45.697131 41875 main.go:656] new leader elected: b127e7bc-beeb-474a-b0e9-5023b1563d94
  13. I1212 14:58:45.697210 41875 main.go:562] Controller loop...
  14. I1212 14:58:46.698199 41875 main.go:564] Controller quit.
  15. I1212 14:58:46.702313 41875 main.go:648] leader lost: b127e7bc-beeb-474a-b0e9-5023b1563d94
  16. I1212 14:58:46.733931 41875 leaderelection.go:258] successfully acquired lease default/test-lock
  17. I1212 14:58:46.733953 41875 main.go:656] new leader elected: 7a4dd5d7-5e25-4f69-a882-d32e17bb703a
  18. I1212 14:58:46.734007 41875 main.go:562] Controller loop...
  19. I1212 14:58:47.739147 41875 main.go:564] Controller quit.
  20. I1212 14:58:47.743684 41875 main.go:648] leader lost: 7a4dd5d7-5e25-4f69-a882-d32e17bb703a
  21. ...

Check the lease resource in pool-coordinator. We can see the lease resource and the holder of lease changes periodically.

  1. $ kubectl get lease
  2. NAME HOLDER AGE
  3. test-lock ff43ffde-3551-47d6-b2af-1fa3ef115b86 5m
  4. $ kubectl get lease
  5. NAME HOLDER AGE
  6. test-lock 76870bb5-eaa0-44b0-a8a8-203c36a2d373 5m
  7. $ kubectl get lease
  8. NAME HOLDER AGE
  9. test-lock b127e7bc-beeb-474a-b0e9-5023b1563d94 5m
  10. $ kubectl get lease
  11. NAME HOLDER AGE
  12. test-lock 7a4dd5d7-5e25-4f69-a882-d32e17bb703a 5m

Conclusion

After testing, we got the minimum resource limit that pool-coordinator needed: CPU 310m、memory 450MB.

And we see that deleting resource in etcd will not let the resource used of pool-coordinator goes down.

It is caused by the storage mechanism of etcd, which will add tombstone revision instead of deleting resource immediately.

When patching resource frequently, the memory used of etcd go up. If we set the memory limit of etcd, it will cause etcd container OOM.

Finally, we got that CPU is not the limit resource of pool-coordinator. And the memory used of etcd container should be limited to an acceptable level to protect other pod in edge node pool

So, in an edge node pool, when node number less than 500 and pod number less then 1000, we recommend the resource configuration below:

  1. apiserverResources:
  2. requests:
  3. cpu: 250m
  4. ---
  5. etcdResources:
  6. limits:
  7. cpu: 200m
  8. memory: 512Mi
  9. requests:
  10. cpu: 100m
  11. memory: 256Mi