MapR Setup

This documentation provides instructions on how to prepare Flink for YARNexecutions on a MapR cluster.

The instructions below assume MapR version 5.2.0. They will guide youto be able to start submitting Flink on YARNjobs or sessions to a MapR cluster.

In order to run Flink on MapR, Flink needs to be built with MapR’s ownHadoop and Zookeeper distribution. Simply build Flink using Maven withthe following command from the project root directory:

  1. mvn clean install -DskipTests -Pvendor-repos,mapr \
  2. -Dhadoop.version=2.7.0-mapr-1607 \
  3. -Dzookeeper.version=3.4.5-mapr-1604

The vendor-repos build profile adds MapR’s repository to the build so thatMapR’s Hadoop / Zookeeper dependencies can be fetched. The mapr buildprofile additionally resolves some dependency clashes between MapR andFlink, as well as ensuring that the native MapR libraries on the clusternodes are used. Both profiles must be activated.

By default the mapr profile builds with Hadoop / Zookeeper dependenciesfor MapR version 5.2.0, so you don’t need to explicitly overridethe hadoop.version and zookeeper.version properties.For different MapR versions, simply override these properties to appropriatevalues. The corresponding Hadoop / Zookeeper distributions for each MapR versioncan be found on MapR documentations such ashere.

Job Submission Client Setup

The client submitting Flink jobs to MapR also needs to be prepared with the below setups.

Ensure that MapR’s JAAS config file is picked up to avoid login failures:

  1. export JVM_ARGS=-Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf

Make sure that the yarn.nodemanager.resource.cpu-vcores property is set in yarn-site.xml:

  1. <!-- in /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/yarn-site.xml -->
  2. <configuration>
  3. ...
  4. <property>
  5. <name>yarn.nodemanager.resource.cpu-vcores</name>
  6. <value>...</value>
  7. </property>
  8. ...
  9. </configuration>

Also remember to set the YARN_CONF_DIR or HADOOP_CONF_DIR environmentvariables to the path where yarn-site.xml is located:

  1. export YARN_CONF_DIR=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/
  2. export HADOOP_CONF_DIR=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/

Make sure that the MapR native libraries are picked up in the classpath:

  1. export FLINK_CLASSPATH=/opt/mapr/lib/*

If you’ll be starting Flink on YARN sessions with yarn-session.sh, thebelow is also required:

  1. export CC_CLASSPATH=/opt/mapr/lib/*

Note: In Flink 1.2.0, Flink’s Kerberos authentication for YARN execution hasa bug that forbids it to work with MapR Security. Please upgrade to later Flinkversions in order to use Flink with a secured MapR cluster. For more details,please see FLINK-5949.

Flink’s Kerberos authentication is independent ofMapR’s Security authentication.With the above build procedures and environment variable setups, Flinkdoes not require any additional configuration to work with MapR Security.

Users simply need to login by using MapR’s maprlogin authenticationutility. Users that haven’t acquired MapR login credentials would not beable to submit Flink jobs, erroring with:

  1. java.lang.Exception: unable to establish the security context
  2. Caused by: o.a.f.r.security.modules.SecurityModule$SecurityInstallException: Unable to set the Hadoop login user
  3. Caused by: java.io.IOException: failure to login: Unable to obtain MapR credentials