GPU support

  1. kops create cluster gpu.example.com --zones us-east-1c --node-size p2.xlarge --node-count 1 --kubernetes-version 1.6.1

(Note that the p2.xlarge instance type is not cheap, but no GPU instances are)

You can use the experimental hooks feature to install the nvidia drivers:

> kops edit cluster gpu.example.com

  1. spec:
  2. ...
  3. hooks:
  4. - execContainer:
  5. image: kopeio/nvidia-bootstrap:1.6

(TODO: Only on instance groups, or have nvidia-bootstrap detect if GPUs are present..)

In addition, you will likely want to set the Accelerators=true feature-flag to kubelet:

> kops edit cluster gpu.example.com

  1. spec:
  2. ...
  3. kubelet:
  4. featureGates:
  5. Accelerators: "true"

> kops update cluster gpu.example.com --yes

Here is an example pod that runs tensorflow; note that it mounts libcuda from the host:

(TODO: Is there some way to have a well-known volume or similar?)

  1. apiVersion: v1
  2. kind: Pod
  3. metadata:
  4. name: tf
  5. spec:
  6. containers:
  7. - image: gcr.io/tensorflow/tensorflow:1.0.1-gpu
  8. imagePullPolicy: IfNotPresent
  9. name: gpu
  10. command:
  11. - /bin/bash
  12. - -c
  13. - "cp -d /rootfs/usr/lib/x86_64-linux-gnu/libcuda.* /usr/lib/x86_64-linux-gnu/ && cp -d /rootfs/usr/lib/x86_64-linux-gnu/libnvidia* /usr/lib/x86_64-linux-gnu/ &&/run_jupyter.sh"
  14. resources:
  15. limits:
  16. cpu: 2000m
  17. alpha.kubernetes.io/nvidia-gpu: 1
  18. volumeMounts:
  19. - name: rootfs-usr-lib
  20. mountPath: /rootfs/usr/lib
  21. volumes:
  22. - name: rootfs-usr-lib
  23. hostPath:
  24. path: /usr/lib

To use this particular tensorflow image, you should port-forward and get the URL from the log:

  1. kubectl port-forward tf 8888 &
  2. kubectl logs tf

And browse to the URL printed