Amazon EMR

Overview

Amazon EMR task type, for operation EMR clusters on AWS and running computing tasks. Using aws-java-sdk in the background code, to transfer JSON parameters to task object and submit to AWS, Two program types are currently supported:

Create Task

  • Click Project Management -> Project Name -> Workflow Definition, click the Create Workflow button to enter the DAG editing page.
  • Drag AmazonEMR task from the toolbar to the artboard to complete the creation.

Task Parameters

ParameterDescription
Program TypeSelect the program type. If it is RUN_JOB_FLOW, you need to fill in jobFlowDefineJson, if it is ADD_JOB_FLOW_STEPS, you need to fill in stepsDefineJson.
jobFlowDefineJsonJSON corresponding to the RunJobFlowRequest object, for details refer to API_RunJobFlow_Examples.
stepsDefineJsonJSON corresponding to the AddJobFlowStepsRequest object, for details refer to API_AddJobFlowSteps_Examples.

Task Example

Create an EMR cluster and run Steps

This example shows how to create an EMR task node of type RUN_JOB_FLOW. Taking the execution of SparkPi as an example, the task will create an EMR cluster and execute the SparkPi sample program. RUN_JOB_FLOW

jobFlowDefineJson example

  1. {
  2. "Name": "SparkPi",
  3. "ReleaseLabel": "emr-5.34.0",
  4. "Applications": [
  5. {
  6. "Name": "Spark"
  7. }
  8. ],
  9. "Instances": {
  10. "InstanceGroups": [
  11. {
  12. "Name": "Primary node",
  13. "InstanceRole": "MASTER",
  14. "InstanceType": "m4.xlarge",
  15. "InstanceCount": 1
  16. }
  17. ],
  18. "KeepJobFlowAliveWhenNoSteps": false,
  19. "TerminationProtected": false
  20. },
  21. "Steps": [
  22. {
  23. "Name": "calculate_pi",
  24. "ActionOnFailure": "CONTINUE",
  25. "HadoopJarStep": {
  26. "Jar": "command-runner.jar",
  27. "Args": [
  28. "/usr/lib/spark/bin/run-example",
  29. "SparkPi",
  30. "15"
  31. ]
  32. }
  33. }
  34. ],
  35. "JobFlowRole": "EMR_EC2_DefaultRole",
  36. "ServiceRole": "EMR_DefaultRole"
  37. }

Add a Step to a Running EMR Cluster

This example shows how to create an EMR task node of type ADD_JOB_FLOW_STEPS. Taking the execution of SparkPi as an example, the task will add a SparkPi sample program to the running EMR cluster. ADD_JOB_FLOW_STEPS JobFlowId

stepsDefineJson example

  1. {
  2. "JobFlowId": "j-3V628TKAERHP8",
  3. "Steps": [
  4. {
  5. "Name": "calculate_pi",
  6. "ActionOnFailure": "CONTINUE",
  7. "HadoopJarStep": {
  8. "Jar": "command-runner.jar",
  9. "Args": [
  10. "/usr/lib/spark/bin/run-example",
  11. "SparkPi",
  12. "15"
  13. ]
  14. }
  15. }
  16. ]
  17. }

Notice

  • Failover on EMR Task type has not been implemented. In this time, DolphinScheduler only supports failover on yarn task type . Other task type, such as EMR task, k8s task not ready yet.
  • stepsDefineJson A task definition only supports the association of a single step, which can better ensure the reliability of the task state.