Job Scheduling

How to schedule a job with gang-scheduling

Alpha

This Kubeflow component has alpha status with limited support. See the Kubeflow versioning policies. The Kubeflow team is interested in your feedback about the usability of the feature.

This guide describes how to use volcano scheduler to support gang-scheduling in Kubeflow, to allow jobs to run multiple pods at the same time.

Running jobs with gang-scheduling

To use gang-scheduling, you have to install volcano scheduler in your cluster first as a secondary scheduler of Kubernetes and configure operator to enable gang-scheduling.

Note: Volcano scheduler and operator in Kubeflow achieve gang-scheduling by using PodGroup. operator will create the PodGroup of the job automatically.

The yaml to use volcano scheduler to schedule your job as a gang is the same as non-gang-scheduler, for example.

  1. apiVersion: "kubeflow.org/v1beta1"
  2. kind: "TFJob"
  3. metadata:
  4. name: "tfjob-gang-scheduling"
  5. spec:
  6. tfReplicaSpecs:
  7. Worker:
  8. replicas: 1
  9. template:
  10. spec:
  11. containers:
  12. - args:
  13. - python
  14. - tf_cnn_benchmarks.py
  15. - --batch_size=32
  16. - --model=resnet50
  17. - --variable_update=parameter_server
  18. - --flush_stdout=true
  19. - --num_gpus=1
  20. - --local_parameter_device=cpu
  21. - --device=gpu
  22. - --data_format=NHWC
  23. image: gcr.io/kubeflow/tf-benchmarks-gpu:v20171202-bdab599-dirty-284af3
  24. name: tensorflow
  25. resources:
  26. limits:
  27. nvidia.com/gpu: 1
  28. workingDir: /opt/tf-benchmarks/scripts/tf_cnn_benchmarks
  29. restartPolicy: OnFailure
  30. PS:
  31. replicas: 1
  32. template:
  33. spec:
  34. containers:
  35. - args:
  36. - python
  37. - tf_cnn_benchmarks.py
  38. - --batch_size=32
  39. - --model=resnet50
  40. - --variable_update=parameter_server
  41. - --flush_stdout=true
  42. - --num_gpus=1
  43. - --local_parameter_device=cpu
  44. - --device=cpu
  45. - --data_format=NHWC
  46. image: gcr.io/kubeflow/tf-benchmarks-cpu:v20171202-bdab599-dirty-284af3
  47. name: tensorflow
  48. resources:
  49. limits:
  50. cpu: '1'
  51. workingDir: /opt/tf-benchmarks/scripts/tf_cnn_benchmarks
  52. restartPolicy: OnFailure

About volcano scheduler and gang-scheduling

With using volcano scheduler to apply gang-scheduling, a job can run only if there are enough resources for all the pods of the job. Otherwise, all the pods will be in pending state waiting for enough resources. For example, if a job requiring N pods is created and there are only enough resources to schedule N-2 pods, then N pods of the job will stay pending.

Note: when in a high workload, if a pod of the job dies when the job is still running, it might give other pods chance to occupied the resources and cause deadlock.

Troubleshooting

If you keep getting problems related to RBAC in your volcano scheduler.

You can try to add the following rules into your clusterrole of scheduler used by volcano scheduler.

  1. - apiGroups:
  2. - '*'
  3. resources:
  4. - '*'
  5. verbs:
  6. - '*'

Last modified 24.02.2021: Move job-scheduling under /training (#2508) (455b7ec6)