Hop Conf - The Hop command line configuration tool

Usage

Hop Conf is a command line tool to manage environments. The hop-conf.sh script can be run with the -h flag (./hop-conf.sh -h) to display available options.

  1. Usage: <main class> [-h] [-ec] [-ed] [-el] [-em] [-ey] [-pc] [-pd] [-pl] [-pm]
  2. [-pn] [-py] [-aza=<account>] [-azi=<blockIncrement>]
  3. [-azk=<key>] [-cfg=<configFile>]
  4. [-dc=<defaultProjectConfigFile>] [-de=<defaultEnvironment>]
  5. [-dp=<defaultProject>] [-dv=<describeVariable>]
  6. [-e=<environmentName>] [-ep=<environmentProject>]
  7. [-eu=<environmentPurpose>] [-fj=<fatJarFilename>]
  8. [-gck=<serviceAccountKeyFile>] [-gdc=<credentialsFile>]
  9. [-gdt=<tokensFolder>] [-p=<projectName>]
  10. [-pa=<projectMetadataBaseFolder>]
  11. [-pb=<projectDataSetsCsvFolder>] [-pf=<projectConfigFile>]
  12. [-ph=<projectHome>] [-pp=<projectCompany>]
  13. [-pr=<projectParent>] [-ps=<projectDescription>]
  14. [-pt=<projectDepartment>] [-pu=<projectUnitTestsBasePath>]
  15. [-px=<projectEnforceExecutionInHome>]
  16. [-sj=<standardProjectsFolder>]
  17. [-sp=<standardParentProject>] [-sv=<setVariable>]
  18. [-xm=<metadataJsonFilename>] [-cfd=<configDescribeVariables>
  19. [,<configDescribeVariables>...]]...
  20. [-cfv=<configSetVariables>[,<configSetVariables>...]]...
  21. [-eg=<environmentConfigFiles>[,
  22. <environmentConfigFiles>...]]... [-pv=<projectVariables>[,
  23. <projectVariables>...]]...
  24. -aza, --azure-account=<account>
  25. The account to use for the Azure VFS
  26. -azi, --azure-block-increment=<blockIncrement>
  27. The block increment size for new files on Azure,
  28. multiples of 512 only.
  29. -azk, --azure-key=<key>
  30. The key to use for the Azure VFS
  31. -cfd, --config-file-describe-variables=<configDescribeVariables>[,
  32. <configDescribeVariables>...]
  33. A list of variable=description combinations separated by
  34. a comma
  35. -cfg, --config-file=<configFile>
  36. Specify the configuration JSON file to manage
  37. -cfv, --config-file-set-variables=<configSetVariables>[,
  38. <configSetVariables>...]
  39. A list of variable=value combinations separated by a
  40. comma
  41. -dc, --default-projects-folder=<defaultProjectConfigFile>
  42. The standard project configuration filename proposed
  43. when creating projects
  44. -de, --default-environment=<defaultEnvironment>
  45. The name of the default environment to use when none is
  46. specified
  47. -dp, --default-project=<defaultProject>
  48. The name of the default project to use when none is
  49. specified
  50. -dv, --describe-variable=<describeVariable>
  51. Describe a variable, use format VARIABLE=Description
  52. -e, --environment=<environmentName>
  53. The name of the lifecycle environment to manage
  54. -ec, --environment-create
  55. Create a new project lifecycle environment. Also specify
  56. its name, purpose, the project name and the
  57. configuration files.
  58. -ed, --environment-delete
  59. Delete a lifecycle environment
  60. -eg, --environment-config-files=<environmentConfigFiles>[,
  61. <environmentConfigFiles>...]
  62. A list of configuration files for this lifecycle
  63. environment, comma separated
  64. -el, --environments-list
  65. List the defined lifecycle environments
  66. -em, --environment-modify
  67. Modify a lifecycle environment
  68. -ep, --environment-project=<environmentProject>
  69. The project for the environment
  70. -eu, --environment-purpose=<environmentPurpose>
  71. The purpose of the environment: Development, Testing,
  72. Production, CI, ...
  73. -ey, --environment-mandatory
  74. Make it mandatory to reference an environment
  75. -fj, --generate-fat-jar=<fatJarFilename>
  76. Specify the filename of the fat jar to generate from
  77. your current software installation
  78. -gck, --google-cloud-service-account-key-file=<serviceAccountKeyFile>
  79. Configure the path to a Google Cloud service account
  80. JSON key file
  81. -gdc, --google-drive-credentials-file=<credentialsFile>
  82. Configure the path to a Google Drive credentials JSON
  83. file
  84. -gdt, --google-drive-tokens-folder=<tokensFolder>
  85. Configure the path to a Google Drive tokens folder
  86. -h, --help Displays this help message and quits.
  87. -p, --project=<projectName>
  88. The name of the project to manage
  89. -pa, --project-metadata-base=<projectMetadataBaseFolder>
  90. The metadata base folder (relative to home)
  91. -pb, --project-datasets-base=<projectDataSetsCsvFolder>
  92. The data sets CSV folder (relative to home)
  93. -pc, --project-create Create a new project. Also specify the name and its home
  94. -pd, --project-delete Delete a project
  95. -pf, --project-config-file=<projectConfigFile>
  96. The configuration file relative to the home folder. The
  97. default value is project-config.json
  98. -ph, --project-home=<projectHome>
  99. The home directory of the project
  100. -pl, -projects-list List the defined projects
  101. -pm, --project-modify Modify a project
  102. -pn, --projects-enabled
  103. Enable or disable the projects plugin
  104. -pp, --project-company=<projectCompany>
  105. The company
  106. -pr, --project-parent=<projectParent>
  107. The name of the parent project to inherit metadata and
  108. variables from
  109. -ps, --project-description=<projectDescription>
  110. The description of the project
  111. -pt, --project-department=<projectDepartment>
  112. The department
  113. -pu, --project-unit-tests-base=<projectUnitTestsBasePath>
  114. The unit tests base folder (relative to home)
  115. -pv, --project-variables=<projectVariables>[,<projectVariables>...]
  116. A list of variable=value combinations separated by a
  117. comma
  118. -px, --project-enforce-execution=<projectEnforceExecutionInHome>
  119. Validate before execution that a workflow or pipeline is
  120. located in the project home folder or a sub-folder
  121. (true/false).
  122. -py, --project-mandatory
  123. Make it mandatory to reference a project
  124. -sj, --standard-projects-folder=<standardProjectsFolder>
  125. GUI: The standard projects folder proposed when creating
  126. projects
  127. -sp, --standard-parent-project=<standardParentProject>
  128. The name of the standard project to use as a parent when
  129. creating new projects
  130. -sv, --set-variable=<setVariable>
  131. Set a variable, use format VAR=Value
  132. -xm, --export-metadata=<metadataJsonFilename>
  133. Export project metadata to a single JSON file which you
  134. can specify with this option. Also specify the -p
  135. option.

The available options are listed below:

Table 1. Hop-conf Options
Short OptionExtended OptionDescription

-h

—help

Displays this help message and quits.

-ec

—environment-create

Create an environment. Also specify the name and its home

-ed

—environment-delete

Delete an environment

-el

—environment-list

List the defined environments

-em

—environment-modify

Modify an environment

-pc

—project-create

Create a new project. Also specify the name and its home

-pd

—prject-delete

Delete a project

-pl

—project-list

List the defined projects

-pm

—project-modify

Modify a project

-dv

—describe-variable=<describeVariable>

Describe a variable

-e

-environment=<environmentName>

The name of the environment to manage

-ep

—environment-project=<environmentProject>

The project for the environment

-eu

—environment-purpose=<environmentPurpose>

The purpose of the environment: Development, Testing, Production, CI, …​

-fj

—generate-fat-jar=<fatJarFilename>

Specify the filename of the fat jar to generate from your current software installation

-xm

—export-metadata=<metadataJsonFilename>

Export project metadata to a single JSON file which you can specify with this option. Also specify the -p option to know which metadata to export.

-p

—project=<projectName>

The project name

-pa

—project-metadata-base=<projectMetadataBaseFolder>

The metadata base folder (relative to home)

-pb

—project-datasets-base-base=<projectDataSetsCsvFolder>

The data sets CSV folder (relative to home)

-pf

—project-config-file=<projectConfigFile>

The configuration file relative to the home folder. The default value is project-config.json

-ph

—project-home=<projectHome>

The home directory of the project

-pp

—project-company=<projectCompany>

The company

-ps

—project-description=<projectDescription>

The description of the project

-pt

—project-department=<projectDepartment>

The department

-pu

—project-unit-tests-base=<projectUnitTestsBasePath>

The unit tests base folder (relative to home)

-px

—project-enforce-execution=<projectEnforceExecutionInHome>

Validate before execution that a workflow or pipeline islocated in the project home folder or a sub-folder (true/false)

-sv

—set-variable=<setVariable>

Set a variable, use format VAR=Value

-sv can be used to unset a variable by specifying a variable without a value, e.g. -sv=myvar=

-cfg

—config-file=<configFile>

Specify the configuration JSON file to manage

-cfd

—config-file-describe-variables=<configDescribeVariables>[,<configDescribeVariables>…​]

A list of variable=description combinations separated by a comma

-cfv

—config-file-set-variables=<configSetVariables> ,<configSetVariables>…​]

A list of variable=value combinations separated by a comma

-eg

—environment-config-files=<environmentConfigFiles>[, <environmentConfigFiles>…​]

A list of configuration files for this lifecycle environment, comma separated

-pv

—project-variables=<projectVariables>[,<projectVariables>…​]

A list of variable=value combinations separated by a comma

Project Usage and Configuration

Configuration on the command line

The `hop-conf` script offers many options to edit environment definitions.

Creating an environment

  1. $ sh hop-conf.sh \
  2. --environment-create \
  3. --environment hop2 \
  4. --environment-project hop2 \
  5. --environment-purpose=Development \
  6. --environment-config-files=/home/user/projects/hop2-conf.json
  7. Creating environment 'hop2'
  8. Environment 'hop2' was created in Hop configuration file <path-to-hop>/config/hop-config.json
  9. 2021/02/01 16:37:02 - General - ERROR: Configuration file '/home/user/projects/hop2-conf.json' does not exist to read variables from.
  10. Created empty environment configuration file : /home/user/projects/hop2-conf.json
  11. hop2
  12. Purpose: Development
  13. Configuration files:
  14. Project name: hop2
  15. Config file: /home/user/projects/hop2-conf.json

As you can see from the log, an empty file was created to set variables in:

  1. { }

Setting variables in an environment

This command adds a variable to the environment configuration file:

  1. $ h hop-conf.sh --config-file /home/user/projects/hop2-conf.json --config-file-set-variables DB_HOSTNAME=localhost,DB_PASSWORD=abcd
  2. Configuration file '/home/user/projects/hop2-conf.json' was modified.

If we look at the file `hop2-conf.json` we’ll see that the variables were added:

  1. {
  2. "variables" : [ {
  3. "name" : "DB_HOSTNAME",
  4. "value" : "localhost",
  5. "description" : ""
  6. }, {
  7. "name" : "DB_PASSWORD",
  8. "value" : "abcd",
  9. "description" : ""
  10. } ]
  11. }

Please note that you can add descriptions for the variables as well with the `--describe-variable` option. Please run hop-conf without options to see all the possibilities.

Deleting an environment

The following deletes an environment from the Hop configuration file:

  1. $ $ sh hop-conf.sh --environment-delete --environment hop2
  2. Lifecycle environment 'hop2' was deleted from Hop configuration file <path-to-hop>/config/hop-config.json

Projects Plugin configuration

There are various options to configure the behavior of the `Projects` plugin itself. In Hop configuration file `hop-config.json` we can find the following options:

  1. {
  2. "projectMandatory" : true,
  3. "environmentMandatory" : false,
  4. "defaultProject" : "default",
  5. "defaultEnvironment" : null,
  6. "standardParentProject" : "default",
  7. "standardProjectsFolder" : "/home/matt/test-stuff/"
  8. }
OptionDescriptionhop-conf option

projectMandatory

This will prevent anyone from using hop-run without specifying a project

—project-mandatory

environmentMandatory

This will prevent anyone from using hop-run without specifying an environment

—environment-mandatory

defaultProject

The default project to use when none is specified

—default-project

defaultEnvironment

The default environment to use when none is specified

—default-environment

standardParentProject

The standard parent project to propose when creating new project

—standard-parent-project

standardProjectsFolder

The folder to which you’ll browse by default in the GUI when creating new projects

—standard-projects-folder

Running Projects and Pipelines

You can specify an environment or a project when executing a pipeline or a workflow. By doing so you are automatically configuring metadata, variables without too much fuss.

The easiest example is shown by executing the “complex” pipeline from the Apache Beam examples:

  1. $ sh hop-run.sh --project samples --file 'beam/pipelines/complex.hpl' --runconfig Direct
  2. 2021/02/01 16:52:15 - HopRun - Enabling project 'samples'
  3. 2021/02/01 16:52:25 - HopRun - Relative path filename specified: config/projects/samples/beam/pipelines/complex.hpl
  4. 2021/02/01 16:52:26 - General - Created Apache Beam pipeline with name 'complex'
  5. 2021/02/01 16:52:27 - General - Handled transform (INPUT) : Customer data
  6. 2021/02/01 16:52:27 - General - Handled transform (INPUT) : State data
  7. 2021/02/01 16:52:27 - General - Handled Group By (STEP) : countPerState, gets data from 1 previous transform(s)
  8. 2021/02/01 16:52:27 - General - Handled transform (STEP) : uppercase state, gets data from 1 previous transform(s), targets=0, infos=0
  9. 2021/02/01 16:52:27 - General - Handled Merge Join (STEP) : Merge join
  10. 2021/02/01 16:52:27 - General - Handled transform (STEP) : Lookup count per state, gets data from 1 previous transform(s), targets=0, infos=1
  11. 2021/02/01 16:52:27 - General - Handled transform (STEP) : name<n, gets data from 1 previous transform(s), targets=2, infos=0
  12. 2021/02/01 16:52:27 - General - Transform Label: N-Z reading from previous transform targeting this one using : name<n - TARGET - Label: N-Z
  13. 2021/02/01 16:52:27 - General - Handled transform (STEP) : Label: N-Z, gets data from 1 previous transform(s), targets=0, infos=0
  14. 2021/02/01 16:52:27 - General - Transform Label: A-M reading from previous transform targeting this one using : name<n - TARGET - Label: A-M
  15. 2021/02/01 16:52:27 - General - Handled transform (STEP) : Label: A-M, gets data from 1 previous transform(s), targets=0, infos=0
  16. 2021/02/01 16:52:27 - General - Handled transform (STEP) : Switch / case, gets data from 2 previous transform(s), targets=4, infos=0
  17. 2021/02/01 16:52:27 - General - Transform CA reading from previous transform targeting this one using : Switch / case - TARGET - CA
  18. 2021/02/01 16:52:27 - General - Handled transform (STEP) : CA, gets data from 1 previous transform(s), targets=0, infos=0
  19. 2021/02/01 16:52:27 - General - Transform NY reading from previous transform targeting this one using : Switch / case - TARGET - NY
  20. 2021/02/01 16:52:27 - General - Handled transform (STEP) : NY, gets data from 1 previous transform(s), targets=0, infos=0
  21. 2021/02/01 16:52:27 - General - Transform FL reading from previous transform targeting this one using : Switch / case - TARGET - FL
  22. 2021/02/01 16:52:27 - General - Handled transform (STEP) : FL, gets data from 1 previous transform(s), targets=0, infos=0
  23. 2021/02/01 16:52:27 - General - Transform Default reading from previous transform targeting this one using : Switch / case - TARGET - Default
  24. 2021/02/01 16:52:27 - General - Handled transform (STEP) : Default, gets data from 1 previous transform(s), targets=0, infos=0
  25. 2021/02/01 16:52:27 - General - Handled transform (STEP) : Collect, gets data from 4 previous transform(s), targets=0, infos=0
  26. 2021/02/01 16:52:27 - General - Handled transform (OUTPUT) : complex, gets data from Collect
  27. 2021/02/01 16:52:27 - General - Executing this pipeline using the Beam Pipeline Engine with run configuration 'Direct'
  28. 2021/02/01 16:52:34 - General - Beam pipeline execution has finished.

To execute an Apache Beam pipeline a lot of information and metadata is needed. Let’s dive into a few fun information tidbits:

  • By referencing the `samples` project Hop knows where the project is located (`config/projects/samples` )

  • Since we know the location of the project, we can specify pipelines and workflows with a relative path

  • The project knows where its metadata is stored (`config/projects/samples/metadata` ) so it knows where to find the `Direct` pipeline run configuration (`config/projects/samples/metadata/pipeline-run-configuration/Direct.json` )

  • This run configuration defines its own pipeline engine specific variables, in this case the output folder : `DATA_OUTPUT=${PROJECT_HOME}/beam/output/`

  • The output of the samples is as such written to `config/projects/samples/beam/output`

To reference an environment you can execute using `-e` or `--environment` . The only difference is that you’ll have a number of extra environment variables set while executing.

Cloud Storage Configuration

Hop Conf can be used to configure your AWS, Azure and Google Cloud (Cloud Storage and Drive) accounts with Hop through VFS

Amazon Web Services S3

N/A

Azure

Set the account, block increment size for new files and your Azure key

  1. -aza, --azure-account=<account>
  2. The account to use for the Azure VFS
  3. -azi, --azure-block-increment=<blockIncrement>
  4. The block increment size for new files on Azure,
  5. multiples of 512 only.
  6. -azk, --azure-key=<key>
  7. The key to use for the Azure VFS

Google

Google Cloud Storage

Set the path to your Google Cloud service account JSON key file

  1. -gck, --google-cloud-service-account-key-file=<serviceAccountKeyFile>
  2. Configure the path to a Google Cloud service account JSON key file

Google Drive

Set the path to your Google Drive credentials JSON file or Google Drive tokens folder.

  1. -gdc, --google-drive-credentials-file=<credentialsFile>
  2. Configure the path to a Google Drive credentials JSON
  3. file
  4. -gdt, --google-drive-tokens-folder=<tokensFolder>
  5. Configure the path to a Google Drive tokens folder