Introduction to Katib
- Hyperparameters and hyperparameter tuning
- Neural architecture search
  - Alpha version
The Katib project
Katib interfaces
Katib concepts
Next steps

Introduction to Katib

Overview of Katib for hyperparameter tuning and neural architecture search

Use Katib for automated tuning of your machine learning (ML) model’shyperparameters and architecture.

This page introduces the concepts of hyperparameter tuning, neuralarchitecture search, and the Katib system as a component of Kubeflow.

Hyperparameters and hyperparameter tuning

Hyperparameters are the variables that control the model training process.For example:

Learning rate.
Number of layers in a neural network.
Number of nodes in each layer.

Hyperparameter values are not learned. In other words, in contrast to thenode weights and other training parameters, the model training process doesnot adjust the hyperparameter values.

Hyperparameter tuning is the process of optimizing the hyperparameter valuesto maximize the predictive accuracy of the model. If you don’t use Katib or asimilar system for hyperparameter tuning, you need run many training jobsyourself, manually adjusting the hyperparameters to find the optimal values.

Automated hyperparameter tuning works by optimizing a target variable,also called the objective metric, that you specify in the configuration forthe hyperparameter tuning job. A common metric is the model’s accuracyin the validation pass of the training job (validation-accuracy). You alsospecify whether you want the hyperparameter tuning job to maximize orminimize the metric.

For example, the following graph from Katib shows the level of accuracyfor various combinations of hyperparameter values (learning rate, number oflayers, and optimizer):

(To run the example that produced this graph, follow the getting-startedguide.)

Katib runs several training jobs (known as trials) within eachhyperparameter tuning job (experiment). Each trial tests a different set ofhyperparameter configurations. At the end of the experiment, Katib outputsthe optimized values for the hyperparameters.

Neural architecture search

Alpha version

Neural architecture search is currently in Alpha with limited support.The Kubeflow team is interested in any feedback you may have, in particular withregards to usability of the feature. You can log issues and comments inthe Katib issue tracker.

In addition to hyperparameter tuning, Katib offers a neural architecturesearch (NAS) feature. You can use the NAS to designyour artificial neural network, with a goal of maximizing the predictiveaccuracy and performance of your model.

NAS is closely related to hyperparameter tuning. Both are subsets of automatedmachine learning (AutoML). While hyperparameter tuningoptimizes the model’s hyperparameters, a NAS system optimizes the model’sstructure, node weights, and hyperparameters.

NAS technology in general uses various techniques to find the optimal neuralnetwork design. The NAS in Katib uses the reinforcement learning technique.

You can submit Katib jobs from the command line or from the UI. (Read moreabout the Katib interfaces later on this page.) The following screenshot showspart of the form for submitting a NAS job from the Katib UI:

The Katib project

Katib is a Kubernetes-based system for hyperparameter tuning and neuralarchitecture search. Katib supports a number of ML frameworks, includingTensorFlow, MXNet, PyTorch, XGBoost, and others.

The Katib project is open source.The developer guideis a good starting point for developers who want to contribute to the project.

Katib interfaces

You can use the following interaces to interact with Katib:

A web UI that you can use to submit experiments and to monitor your results.See the getting-startedguidefor information on how to access the UI.The Katib home page within Kubeflow looks like this:

A REST API. See the API reference onGitHub.
Command-line interfaces (CLIs):
- Kfctl is the Kubeflow CLI that you can use to install and configureKubeflow. Read about kfctl in the guide toconfiguring Kubeflow.
- The Kubernetes CLI, kubectl, is useful for running commands against yourKubeflow cluster. Read about kubectl in the Kubernetesdocumentation.

Katib concepts

This section describes the terms used in Katib.

Experiment

An experiment is a single tuning run, also called an optimization run.

You specify configuration settings to define the experiment. The following arethe main configurations:

Objective: What you want to optimize. This is the objective metric, alsocalled the target variable. A common metric is the model’s accuracyin the validation pass of the training job (validation-accuracy). You alsospecify whether you want the hyperparameter tuning job to maximize orminimize the metric.
Search space: The set of all possible hyperparameter values that thehyperparameter tuning job should consider for optimization, and theconstraints for each hyperparameter. Other names for search space includefeasible set and solution space. For example, you may provide thenames of the hyperparameters that you want to optimize. For eachhyperparameter, you may provide a minimum and maximum value or a _list_of allowable values.
Search algorithm: The algorithm to use when searching for the optimalhyperparameter values.

For details of how to define your experiment, see the guide to running anexperiment.

Suggestion

A suggestion is a set of hyperparameter values that the hyperparametertuning process has proposed. Katib creates a trial to evaluate the suggestedset of values.

Trial

A trial is one iteration of the hyperparameter tuning process. A trialcorresponds to one worker job instance with a list of parameter assignments.The list of parameter assignments corresponds to a suggestion.

Each experiment runs several trials. The experiment runs the trials until itreaches either the objective or the configured maximum number of trials.

Worker job

The worker job is the process that runs to evaluate a trial and calculateits objective value.

The worker job can be one of the following types:

Kubernetes Job(does not support distributed execution).
Kubeflow TFJob (supportsdistributed execution).
Kubeflow PyTorchJob (supportsdistributed execution).

By offering the above worker job types, Katib supports multiple ML frameworks.

Next steps

Follow the getting-startedguide to set upKatib and run some hyperparameter tuning examples.