Python Based Visualizations

Predefined and custom visualizations of pipeline outputs

This page describes Python based visualizations, how to create them, and how touse them to visualize results within the Kubeflow Pipelines UI. Python basedvisualizations are available in Kubeflow Pipelines version0.1.29 and later, andin Kubeflow version 0.7.0 and later.

While Python based visualizations are intended to be the main method ofvisualizing data within the Kubeflow Pipelines UI, they do not replace theprevious method of visualizing data within the Kubeflow Pipelines UI. Whenconsidering which method of visualization to use within your pipeline, check thelimitations of Python based visualizations in the section below and comparethem with the requirements of your visualizations.

Introduction

Python based visualizations are a new method to visualize results within theKubeflow Pipelines UI. This new method of visualizing results is done throughthe usage of nbconvert. Alongside theusage of nbconvert, results of a pipeline can now be visualized without acomponent being included within the pipeline itself because the process ofvisualizing results is now decoupled from a pipeline.

Python based visualizations provide two categories of visualizations. The firstbeing predefined visualizations. These visualizations are provided bydefault in Kubeflow Pipelines and serve as a way for you and your customers toeasily and quickly generate powerful visualizations. The second category iscustom visualizations. Custom visualizations allow for you and yourcustomers to provided Python visualization code to be used to generatevisualizations. These visualizations allow for rapid development,experimentation, and customization when visualizing results.

Confusion matrix visualization from a pipeline component

Using predefined visualizations

Predefined matrix visualization

Confusion matrix visualization from a pipeline component

  • Open the details of a run.
  • Select a component.
    • The component that is selected does not matter. But, if you want tovisualize the output of a specific component, it is easier to do that withinthat component.
  • Select the Artifacts tab.
  • At the top of the tab you should see a card named Visualization Creator.
  • Within the card, provide a visualization type, a source, and any necessaryarguments.
    • Any required or optional arguments will be shown as a placeholder.
  • Click Generate Visualization.
  • View generated visualization by scrolling down.

Predefined TFX visualization

  • On the Pipelines page, click [Sample] Unified DSL - Taxi Tip Prediction Model Trainer to open the Pipeline Details page.
  • On the Pipeline Details page, click Create run.
  • On the Create run page,
    • Use a run name and an experiment name of your choice.
    • In the pipeline-root field, specify a storage bucket that you have permission to write to. For example, enter the path to a Google Cloud Storage bucket or an Amazon S3 bucket.
    • Click Start to create the run.
  • After the run is complete, on the Run Details page, click any step. For example, click the first step csvexamplegen as shown in the video above.
  • In the side panel of the selected step,
    • Click the Artifacts tab.
    • In the Visualization Creator section, choose TFDV from the drop down menu.
    • In the Source field, use gs://ml-pipeline-playground/tfx_taxi_simple/data/data.csv, which is the input data used for this run.
    • Click Generate Visualization and wait.
  • Move to the bottom of the Artifacts tab to find the generated visualization.

Using custom visualizations

Confusion matrix visualization from a pipeline component

  • Enable custom visualizations within Kubeflow Pipelines.

    • If you have not yet deployed Kubeflow Pipelines to your cluster,you can edit the frontend deployment YAMLfile to include the following YAML that specifies that custom visualizationsare allowed via environment variables.
  1. - env:
  2. - name: ALLOW_CUSTOM_VISUALIZATIONS
  3. value: true
  • If you already have Kubeflow Pipelines deployed within a cluster, you canedit the frontend deployment YAML to specify that custom visualizations areallowed in the same way described above. Details about updatingdeployments can be found in the Kubernetes documentation aboutupdating a deployment.
  • Open the details of a run.

  • Select a component.

    • The component that is selected does not matter. But, if you want tovisualize the output of a specific component, it is easier to do that withinthat component.
  • Select the Artifacts tab.

  • At the top of the tab you should see a card named Visualization Creator.

  • Within the card, select the CUSTOM visualization type then provide asource, and any necessary arguments (the source and argument variables areoptional for custom visualizations).

  • Provide the custom visualization code.

  • Click Generate Visualization.

  • View generated visualization by scrolling down.

A demo of the above instructions is as follows.

  • On the Pipelines page, click [Sample] Unified DSL - Taxi Tip Prediction Model Trainer to open the Pipeline Details page.
  • On the Pipeline Details page, click Create run.
  • On the Create run page,
    • Use a run name and an experiment name of your choice or simply use the default names choosen for you.
    • In the pipeline-root field, specify a storage bucket that you have permission to write to. For example, enter the path to a Google Cloud Storage bucket or an Amazon S3 bucket.
    • Click Start to create the run.
  • After the run is complete, on the Run Details page, click the step of statisticsgen. This step’s output is statistics data generated by Tensorflow Data Validation.
  • In the side panel of the selected step,
    • Click the Input/Output tab to find out the mlpipeline-ui-metadata item and click the minio link there. This will open a new browser tab with information on output file path. Copy the output file path as shown in the demo video.
    • Get back to the Run Details page, and click the Artifacts tab.
    • At the top of the tab you should see a card named Visualization Creator, choose Custom from the drop down menu.
    • In the Custom Visualization Code field, fill in the following code snippet and replace [output file path] with the output file path you just copied from mlpipeline-ui-metadata.
  1. import tensorflow_data_validation as tfdv
  2. stats = tfdv.load_statistics(’[output file path]/stats_tfrecord’)
  3. tfdv.visualize_statistics(stats)
  • Click Generate Visualization and wait.
    • Move to the bottom of the Artifacts tab to find the generated visualization.

Known limitations

  • Multiple visualizations cannot be generated concurrently.
    • This is because a single Python kernel is used to generate visualizations.
    • If visualizations are a major part of your workflow, it is recommended toincrease the number of replicas within thevisualization deployment YAMLfile or within the visualization service deployment itself.
      • Please note that this does not directly solve the issue, instead itdecreases the likelihood of experiencing delays when generatingvisualizations.
  • Visualizations that take longer than 30 seconds will fail to generate.

    • For visualizations where the 30 second timeout is reached, you can add theTimeoutValue header to the request made by the frontend, specifying apositive integer as ASCII string of at most 8 digits for the length oftime required to generate a visualization as specified by thegrpc documentation.
    • For visualizations that take longer than 100 seconds, you will have tospecify a TimeoutValue within the request headers AND change thedefault kernel timeout of the visualization service. To change the defaultkernel timeout of the visualization service, set the KERNEL_TIMEOUTenvironment variable of the visualization service deployment to be the newtimeout length in seconds within thevisualization deployment YAMLfile or within the visualization service deployment itself.
  1. - env:
  2. - name: KERNEL_TIMEOUT
  3. value: 100
  • The HTML content of the generated visualizations cannot be larger than 4MB.

    • gRPC by default imposes a limit of 4MB as the maximum size that can besent and received by a server. To allow for visualizations that are largerthan 4MB in size to be generated, you must manually setMaxCallRecvMsgSize for gRPC. This can be done by editing the providedoptions given to the gRPC server within main.goto
  1. var maxCallRecvMsgSize = 4 * 1024 * 1024
  2. if serviceName == "Visualization" {
  3. // Only change the maxCallRecvMesSize if it is for visualizations
  4. maxCallRecvMsgSize = 50 * 1024 * 1024
  5. }
  6. opts := []grpc.DialOption{
  7. grpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(maxCallRecvMsgSize)),
  8. grpc.WithInsecure(),
  9. }

Next steps

If you’d like to add a predefined visualization to Kubeflow, take a look at thedeveloper docs.

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified 27.01.2020: typo fix (#1565) (6a6b35bd)