Amazon EMR

Overview

Amazon EMR task type, for creating EMR clusters on AWS and running computing tasks. Using aws-java-sdk in the background code, to transfer JSON parameters to RunJobFlowRequest object and submit to AWS.

Create Task

  • Click Project Management -> Project Name -> Workflow Definition, click the Create Workflow button to enter the DAG editing page.
  • Drag AmazonEMR task from the toolbar to the artboard to complete the creation.

Task Parameters

ParameterDescription
Node nameThe node name in a workflow definition is unique.
Run flagIdentifies whether this node schedules normally, if it does not need to execute, select the prohibition execution.
DescriptionDescribe the function of the node.
Task priorityWhen the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order.
Worker groupingAssign tasks to the machines of the worker group to execute. If Default is selected, randomly select a worker machine for execution.
Times of failed retry attemptsThe number of times the task failed to resubmit. You can select from drop-down or fill-in a number.
Failed retry interval: The time interval for resubmitting the task after a failed task. You can select from drop-down or fill-in a number.
Timeout alarmCheck the timeout alarm and timeout failure. When the task runs exceed the “timeout”, an alarm email will send and the task execution will fail.
JSONJSON corresponding to the RunJobFlowRequest object, for details refer to API_RunJobFlow_Examples.

JSON example

  1. {
  2. "Name": "SparkPi",
  3. "ReleaseLabel": "emr-5.34.0",
  4. "Applications": [
  5. {
  6. "Name": "Spark"
  7. }
  8. ],
  9. "Instances": {
  10. "InstanceGroups": [
  11. {
  12. "Name": "Primary node",
  13. "InstanceRole": "MASTER",
  14. "InstanceType": "m4.xlarge",
  15. "InstanceCount": 1
  16. }
  17. ],
  18. "KeepJobFlowAliveWhenNoSteps": false,
  19. "TerminationProtected": false
  20. },
  21. "Steps": [
  22. {
  23. "Name": "calculate_pi",
  24. "ActionOnFailure": "CONTINUE",
  25. "HadoopJarStep": {
  26. "Jar": "command-runner.jar",
  27. "Args": [
  28. "/usr/lib/spark/bin/run-example",
  29. "SparkPi",
  30. "15"
  31. ]
  32. }
  33. }
  34. ],
  35. "JobFlowRole": "EMR_EC2_DefaultRole",
  36. "ServiceRole": "EMR_DefaultRole"
  37. }