Configure Kubeflow with Amazon RDS

Using Amazon RDS for storing pipelines and metadata

This guide describes how to use Amazon RDS as your pipelines and metadata store.

Amazon Relational Database Service (Amazon RDS)

Amazon RDS is a managed service that makes it easy to set up, operate, and scale a relational database in the AWS Cloud. It provides cost-efficient, resizable capacity for an industry-standard relational database and manages common database administration tasks. It has support for several engines such as Amazon Aurora, MySQL, MariaDB, PostgreSQL, Oracle Database, and SQL Server.

Deploy Amazon RDS MySQL

To get started deploying a MySQL database using Amazon RDS, you’ll need to retrieve some configuration parameters that are needed.

  1. # Use these commands to find VpcId, SubnetId and SecurityGroupId if you create your EKS cluster using eksctl
  2. # For clusters created in other ways, retrieve these values before moving on to deploying your database
  3. export AWS_CLUSTER_NAME=<your_cluster_name>
  4. # Retrieve your VpcId
  5. aws ec2 describe-vpcs \
  6. --filters Name=tag:alpha.eksctl.io/cluster-name,Values=$AWS_CLUSTER_NAME \
  7. | jq -r '.Vpcs[].VpcIdaws ec2 describe-vpcs \
  8. --filters Name=tag:alpha.eksctl.io/cluster-name,Values=$AWS_CLUSTER_NAME \
  9. | jq -r '.Vpcs[].VpcId''
  10. # Retrieve the list of SubnetId's of your cluster's Private subnets, select at least two
  11. aws ec2 describe-subnets \
  12. --filters Name=tag:alpha.eksctl.io/cluster-name,Values=$AWS_CLUSTER_NAME Name=tag:aws:cloudformation:logical-id,Values=SubnetPrivate* \
  13. | jq -r '.Subnets[].SubnetId'
  14. # Retrieve the SecurityGroupId for your nodes
  15. # Note, this assumes your nodes share the same SecurityGroup
  16. INSTANCE_IDS=$(aws ec2 describe-instances --query 'Reservations[*].Instances[*].InstanceId' --filters "Name=tag-key,Values=eks:cluster-name" "Name=tag-value,Values=$AWS_CLUSTER_NAME" --output text)
  17. for i in "${INSTANCE_IDS[@]}"
  18. do
  19. echo "SecurityGroup for EC2 instance $i ..."
  20. aws ec2 describe-instances --instance-ids $i | jq -r '.Reservations[].Instances[].SecurityGroups[].GroupId'
  21. done

With this information in hand, you can now use either the Amazon RDS console or use the attached CloudFormation template to deploy your database.

Warning

The CloudFormation template deploys Amazon RDS for MySQL that is intended for Dev/Test environment. We highly recommend deploying a Multi-AZ database for Production use. Please review the Amazon RDS documentation to learn more.

Configure Kubeflow with Amazon RDS - 图1

Select your desired Region in the AWS CloudFormation management console then click Next. We recommend you change the DBPassword, if not it will default to Kubefl0w. Select VpcId, Subnets and SecurityGroupId then click Next. Take the rest of the defaults by clicking Next, then clicking Create Stack.

Once the CloudFormation stack creation is complete, click on Outputs to get the RDS endpoint.

dashboard

If you didn’t use CloudFormation, you can retrieve the RDS endpoint through the RDS console on the Connectivity & Security tab under the Endpoint & Port section. We will use it in the next step while installing Kubeflow.

Deploy Kubeflow Pipeline and Metadata using Amazon RDS

  1. Follow the install documentation up until the Deploy Kubeflow section. Modify the ${CONFIG_FILE} file to add external-mysql in both pipeline and metadata kustomizeConfigs and remove the mysql database as shown in the diff below.

    dashboard

  2. Run the following commands to build additional Kubeflow installation configuration:

    1. cd ${KF_DIR}
    2. kfctl build -V -f ${CONFIG_FILE}

    This will create two folders aws_config and kustomize in your environment. Edit the params.env file for the external-mysql pipeline service (kustomize/api-service/overlays/external-mysql/params.env) and update the values based on your configuration:

    1. mysqlHost=<$RDSEndpoint>
    2. mysqlUser=<$DBUsername>
    3. mysqlPassword=<$DBPassword>

    Edit the params.env file for the external-mysql metadata service (kustomize/metadata/overlays/external-mysql/params.env) and update the values based on your configuration:

    1. MYSQL_HOST=external_host
    2. MYSQL_DATABASE=<$RDSEndpoint>
    3. MYSQL_PORT=3306
    4. MYSQL_ALLOW_EMPTY_PASSWORD=true

    Edit the secrets.env file for the external-mysql metadata service (kustomize/metadata/overlays/external-mysql/secrets.env) and update the values based on your configuration:

    1. MYSQL_USERNAME=<$DBUsername>
    2. MYSQ_PASSWORD=<$DBPassword>
  3. Invoke the Kubeflow installation:

    1. cd ${KF_DIR}
    2. kfctl apply -V -f ${CONFIG_FILE}

Your pipeline and metadata will now using Amazon RDS. Review troubleshooting section if you run into any issues.

Last modified 04.05.2021: refactor and refresh aws docs (#2688) (ef4cda60)