TFJob TensorFlow

Reference documentation for TFJob

Out of date

This guide contains outdated information pertaining to Kubeflow 1.0. This guide needs to be updated for Kubeflow 1.1.

Packages:

kubeflow.org

Package v1 is the v1 version of the API.

Resource Types:

TFJob

Represents a TFJob resource.

FieldDescription
apiVersion
string
kubeflow.org/v1
kind
string
TFJob
metadata
Kubernetes meta/v1.ObjectMeta

Standard Kubernetes object’s metadata.

Refer to the Kubernetes API documentation for the fields of the metadata field.
spec
TFJobSpec

Specification of the desired state of the TFJob.



activeDeadlineSeconds
int64
(Optional)

Specifies the duration (in seconds) since startTime during which the job can remain active before it is terminated. Must be a positive integer. This setting applies only to pods where restartPolicy is OnFailure or Always.

backoffLimit
int32
(Optional)

Number of retries before marking this job as failed.

cleanPodPolicy
common/v1.CleanPodPolicy

Defines the policy for cleaning up pods after the TFJob completes. Defaults to Running.

ttlSecondsAfterFinished
int32

Defines the TTL for cleaning up finished TFJobs (temporary before kubernetes adds the cleanup controller). It may take extra ReconcilePeriod seconds for the cleanup, since reconcile gets called periodically. Defaults to infinite.

tfReplicaSpecs
map[github.com/kubeflow/tf-operator/pkg/apis/tensorflow/v1.TFReplicaType]*github.com/kubeflow/tf-operator/pkg/apis/common/v1.ReplicaSpec

A map of TFReplicaType (type) to ReplicaSpec (value). Specifies the TF cluster configuration. For example, { “PS”: ReplicaSpec, “Worker”: ReplicaSpec, }

status
common/v1.JobStatus

Most recently observed status of the TFJob. Read-only (modified by the system).

TFJobSpec

(Appears on: TFJob)

TFJobSpec is a desired state description of the TFJob.

FieldDescription
activeDeadlineSeconds
int64
(Optional)

Specifies the duration (in seconds) since startTime during which the job can remain active before it is terminated. Must be a positive integer. This setting applies only to pods where restartPolicy is OnFailure or Always.

backoffLimit
int32
(Optional)

Number of retries before marking this job as failed.

cleanPodPolicy
common/v1.CleanPodPolicy

Defines the policy for cleaning up pods after the TFJob completes. Defaults to Running.

ttlSecondsAfterFinished
int32

Defines the TTL for cleaning up finished TFJobs (temporary before kubernetes adds the cleanup controller). It may take extra ReconcilePeriod seconds for the cleanup, since reconcile gets called periodically. Defaults to infinite.

tfReplicaSpecs
map[github.com/kubeflow/tf-operator/pkg/apis/tensorflow/v1.TFReplicaType]*github.com/kubeflow/tf-operator/pkg/apis/common/v1.ReplicaSpec

A map of TFReplicaType (type) to ReplicaSpec (value). Specifies the TF cluster configuration. For example, { “PS”: ReplicaSpec, “Worker”: ReplicaSpec, }

TFReplicaType (string alias)

TFReplicaType is the type for TFReplica. Can be one of: “Chief”/“Master” (semantically equivalent), “Worker”, “PS”, or “Evaluator”.


Generated with gen-crd-api-reference-docs on git commit fd76deec.

Last modified 26.04.2021: fix broken link (#2652) (9f7d687f)