Istio Usage in Kubeflow

Managing access to Kubeflow applications and resources via Istio

Kubeflow v0.6 onwards deploys Istio along with configuration to enableend-to-end authentication and access control. This setup is the foundationof multi-tenancy support in Kubeflow. A Kubeflow deployment without Istio isnot possible.

A gentle introduction to Istio

Most modern applications are built using a distributed microservicesarchitecture. This ensures that each individual service is simple and has awell defined responsibility. Complex systems and platforms are generallybuilt by combining many such microservices. Each microservice defines its ownAPIs and the services interact with each other using these APIs in order toserve end-user requests.

The term service mesh is used to describe the network of microservices thatmake up such applications and the interactions between them. As a service meshgrows in size and complexity, it can become harder to understand and manage.Its requirements can include discovery, load balancing, failure recovery,metrics, and monitoring. A service mesh also often has more complex operationalrequirements, like A/B testing, canary rollouts, rate limiting, access control,and end-to-end authentication.

Istio is a pioneering and highly performant open sourceimplementation of service mesh by Google. For further details, you can read theconceptual overview of Istio.

Why Kubeflow needs Istio

Kubeflow is a collection of tools, frameworks and services that are deployedtogether into a single Kubernetes cluster to enable end-to-end ML workflows.Most of these components or services are developed independently and help withdifferent parts of the workflow. Developing a complete ML workflow or an MLdevelopment environment requires combining multiple services and components.Kubeflow provides the underlying infrastructure that makes it possible to putsuch disparate components together.

Kubeflow uses Istio as a uniform way to secure, connect, and monitor microservices. Specifically:

  • Securing service-to-service communication in a Kubeflow deployment withstrong identity-based authentication and authorization.
  • A policy layer for supporting access controls and quotas.
  • Automatic metrics, logs, and traces for traffic within the deploymentincluding cluster ingress and egress.

Istio in Kubeflow

The following diagram illustrates how user requests interact with services inKubeflow. It walks through the process when a user requests to create a newnotebook server via the Notebooks Servers UI accessible through the Kubeflow Central Dashboard.Select active profile

  • The user request is intercepted by an identification proxy which talks to a SSO service provider such as IAM on Cloud Services Provider or Active Directory/LDAP on-premises.
  • When the user is authenticated, the request is modified by the Istio Gateway to include a JWT Header token containing the identity of the user. All requests throughout the service mesh carry this token along.
  • The Istio RBAC policies are applied on the incoming request to validate the access to the service and the requested namespace. If either of those are inaccessible to the user, an error response is sent back.
  • If the request is validated, it is forwarded to the appropriate controller (Notebooks Controller in this case).
  • Notebooks Controller validates authorization with Kubernetes RBAC and creates the notebook pod in the namespace that the user requested.Further actions by the user with the notebook to create training jobs or otherresources in the namespace go through a similar process. Profiles Controllermanages the creation of profiles, and creates and applies appropriate Istiopolicies. For more details, please see multi-userisolation.

Deploying Kubeflow without Istio

Currently it is not possible to deploy Kubeflow without Istio. Kubeflow needs the IstioCustom Resource Definitions (CRDs) to express the new route to access thecreated Notebook from the Gateway.

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified 18.02.2020: Refactor multiuser guides (#1682) (688286b9)