Customizing Kubeflow on AWS

Tailoring a deployment of Kubeflow on AWS and Amazon EKS

This guide describes how to customize your deployment of Kubeflow on AWS and Amazon EKS. These steps can optionally be done before you run kfctl apply command with your local configuration file in place. For information and instructions on the deployment process, please see the Deploy guide for details.

Customize your Amazon EKS cluster

The first thing to consider for customization is your cluster configuration. Both the Amazon EKS User Guide and the eksctl docs site have information on various methods and options. You can configure options in the cluster as well as specify a large number of customizations in the cluster nodes. For example, you may consider provisioning GPU-based compute resources and using the EKS-optimized AMI with GPU support built in.

The easiest way to manage configurations for cluster creation as well as ongoing maintenance operations is to use eksctl. It provides an easy-to-use command-line utility which also provides support for a cluster configuration file.

For example, the following is a cluster manifest with one node group which has 2 p2.xlarge instances. Note this also provides a simple method for enabling remote node access via SSH with a configured public key.

  1. apiVersion: eksctl.io/v1alpha5
  2. kind: ClusterConfig
  3. metadata:
  4. # AWS_CLUSTER_NAME and AWS_REGION will override `name` and `region` here.
  5. name: kubeflow-example
  6. region: us-west-2
  7. version: "1.18"
  8. # If your region has multiple availability zones, you can specify 3 of them.
  9. #availabilityZones: ["us-west-2b", "us-west-2c", "us-west-2d"]
  10. # NodeGroup holds all configuration attributes that are specific to a nodegroup
  11. # You can have several node groups in your cluster.
  12. nodeGroups:
  13. - name: eks-gpu
  14. instanceType: p2.xlarge
  15. availabilityZones: ["us-west-2b"]
  16. desiredCapacity: 2
  17. minSize: 0
  18. maxSize: 2
  19. volumeSize: 30
  20. ssh:
  21. allow: true
  22. publicKeyPath: '~/.ssh/id_rsa.pub'
  23. # Example of GPU node group
  24. # - name: Tesla-V100
  25. # Choose your Instance type for the node group.
  26. # instanceType: p3.2xlarge
  27. # GPU cluster can use single availability zone to improve network performance
  28. # availabilityZones: ["us-west-2b"]
  29. # Autoscaling Groups settings
  30. # desiredCapacity: 0
  31. # minSize: 0
  32. # maxSize: 4
  33. # Node Root Disk
  34. # volumeSize: 50
  35. # Enable SSH out side your VPC.
  36. # allowSSH: true
  37. # publicKeyPath: '~/.ssh/id_rsa.pub'
  38. # Customize Labels
  39. # labels:
  40. # 'k8s.amazonaws.com/accelerator': 'nvidia-tesla-k80'
  41. # Setup pre-defined iam roles to node group.
  42. # iam:
  43. # withAddonPolicies:
  44. # autoScaler: true

Customize Authentication

Basic authentication is the default configuration when using the standard configuration file. When using this method, remember to change the default password in this section:

  1. spec:
  2. auth:
  3. basicAuth:
  4. password: 12341234
  5. username: admin@kubeflow.org

You can optionally configure your deployment to use AWS Cognito or OpenID Connect. For more information on these options, see Authentication.

Further customizations

Please refer to the following configuration sections for more information and options.

  • For information on Control Plane and Kubernetes Node logging, refer to Logging.
  • For information on using a custom domain with your Kubeflow deployment, refer to Custom Domain.
  • For information on configuring Private Clusters in Amazon EKS, refer to Private Access.
  • For information and instructions on using different types of storage, refer to Storage Options.
  • For information on using Amazon RDS for pipeline and metadata store, refer to Configure External Database Using Amazon RDS.

Last modified 04.05.2021: refactor and refresh aws docs (#2688) (ef4cda60)