Hadoop Integration

Flink will use the environment variable HADOOP_CLASSPATH to augment theclasspath that is used when starting Flink components such as the Client,JobManager, or TaskManager. Most Hadoop distributions and cloud environmentswill not set this variable by default so if the Hadoop classpath should bepicked up by Flink the environment variable must be exported on all machinesthat are running Flink components.

When running on YARN, this is usually not a problem because the componentsrunning inside YARN will be started with the Hadoop classpaths, but it canhappen that the Hadoop dependencies must be in the classpath when submitting ajob to YARN. For this, it’s usually enough to do a

  1. export HADOOP_CLASSPATH=`hadoop classpath`

in the shell. Note that hadoop is the hadoop binary and that classpath is an argument that will make it print the configured Hadoop classpath.