Common Questions

This page describes the solutions to some common questions for PyFlink users.

Preparing Python Virtual Environment

You can download a convenience script to prepare a Python virtual env zip which can be used on Mac OS and most Linux distributions. You can specify the PyFlink version to generate a Python virtual environment required for the corresponding PyFlink version, otherwise the most recent version will be installed.

  1. $ sh setup-pyflink-virtual-env.sh 1.11.0

Execute PyFlink jobs with Python virtual environment

After setting up a python virtual environment, as described in the previous section, you should activate the environment before executing the PyFlink job.

Local

  1. # activate the conda python virtual environment
  2. $ source venv/bin/activate
  3. $ python xxx.py

Cluster

  1. $ # specify the Python virtual environment
  2. $ table_env.add_python_archive("venv.zip")
  3. $ # specify the path of the python interpreter which is used to execute the python UDF workers
  4. $ table_env.get_config().set_python_executable("venv.zip/venv/bin/python")

For details on the usage of add_python_archive and set_python_executable, you can refer to the relevant documentation.

Adding Jar Files

A PyFlink job may depend on jar files, i.e. connectors, Java UDFs, etc. You can specify the dependencies with the following Python Table APIs or through command-line arguments directly when submitting the job.

  1. # NOTE: Only local file URLs (start with "file:") are supported.
  2. table_env.get_config().get_configuration().set_string("pipeline.jars", "file:///my/jar/path/connector.jar;file:///my/jar/path/udf.jar")
  3. # NOTE: The Paths must specify a protocol (e.g. "file") and users should ensure that the URLs are accessible on both the client and the cluster.
  4. table_env.get_config().get_configuration().set_string("pipeline.classpaths", "file:///my/jar/path/connector.jar;file:///my/jar/path/udf.jar")

For details about the APIs of adding Java dependency, you can refer to the relevant documentation

Adding Python Files

You can use the command-line arguments pyfs or the API add_python_file of TableEnvironment to add python file dependencies which could be python files, python packages or local directories. For example, if you have a directory named myDir which has the following hierarchy:

  1. myDir
  2. ├──utils
  3. ├──__init__.py
  4. ├──my_util.py

You can add the Python files of directory myDir as following:

  1. table_env.add_python_file('myDir')
  2. def my_udf():
  3. from utils import my_util