MapReduce Node

Overview

MapReduce(MR) task type used for executing MapReduce programs. For MapReduce nodes, the worker submits the task by using the Hadoop command hadoop jar. See Hadoop Command Manual for more details.

Create Task

  • Click Project Management -> Project Name -> Workflow Definition, and click the Create Workflow button to enter the DAG editing page.
  • Drag from the toolbar MapReduce - 图1 to the canvas.

Task Parameters

General

ParameterDescription
Custom parametersIt is a local user-defined parameter for MapReduce, and will replace the content with ${variable} in the script.

JAVA or SCALA Program

ParameterDescription
Program typeSelect JAVA or SCALA program.
The class of the main functionThe full path of Main Class, the entry point of the MapReduce program.
Main jar packageThe jar package of the MapReduce program.
Task nameMapReduce task name.
Command line parametersSet the input parameters of the MapReduce program and support the substitution of custom parameter variables.
Other parametersSupport -D, -files, -libjars, -archives format.
User-defined parameterIt is a local user-defined parameter for MapReduce, and will replace the content with ${variable} in the script.

Python Program

ParameterDescription
Program typeSelect Python language.
Main jar packageThe Python jar package for running MapReduce.
Other parametersSupport -D, -mapper, -reducer, -input -output format, and you can set the input of user-defined parameters, such as:
  • -mapper “mapper.py 1” -file mapper.py -reducer reducer.py -file reducer.py –input /journey/words.txt -output /journey/out/mr/${currentTimeMillis}
  • The mapper.py 1 after -mapper is two parameters, the first parameter is mapper.py, and the second parameter is 1.
User-defined parameterIt is a local user-defined parameter for MapReduce, and will replace the content with ${variable} in the script.

Task Example

Execute the WordCount Program

This example is a common introductory type of MapReduce application, which used to count the number of identical words in the input text.

Configure the MapReduce Environment in DolphinScheduler

If you are using the MapReduce task type in a production environment, it is necessary to configure the required environment first. The following is the configuration file: bin/env/dolphinscheduler_env.sh.

mr_configure

Upload the Main Package

When using the MapReduce task node, you need to use the Resource Centre to upload the jar package for the execution. Refer to the resource centre.

After finish the Resource Centre configuration, upload the required target files directly by dragging and dropping.

resource_upload

Configure MapReduce Nodes

Configure the required content according to the parameter descriptions above.

demo-mr-simple