Create a Pipeline

How pipelines work

Pipelines are the fundamental building blocks of your Hop projects.

Pipelines perform the heavy lifting: they read data from a variety of sources, perform a number of operations (combine, clean, enrich, transform etc) and write the data out to some target platform. A pipeline executes all of these operations in a predefined order and in parallel.

In the image below, a very simple pipeline reads data from a database, adds a message to the data and sends out an email. All of these operations are executed in a predefined order (read from the database, add the message, send the mail) and in parallel. The pipelines executes these transforms, Imagine our database table or query contains thousands of rows. The pipeline will start reading results from the query, pass those on to the ‘Add message’ transform. Once the message has been added, we’ll send a mail from the Mail transform. All of these will operate in parallel, so the Mail transform will already be sending mails while the table input is still reading records from the table or query.

Hop - Simple Pipeline

Concepts

Pipelines consist of transforms connected by hops. In the mail example ‘Table input’, ‘Add message’ and ‘Mail’ are all transforms.

  • transforms are the basic operations in your pipeline. A pipeline typically consists of a lot of transforms that are chained together by hops. Transforms are granular, in the sense that each transform is designed and optimized to perform one and only one task. Although one transform by itself may not offer spectacular functionality, the combination of all transforms in a pipeline is makes your pipelines powerful.

  • hops link transforms together. When a transform finishes processing the data set it received, that data set is passed to the next transform through a hop. Hops are uni-directional (data can’t flow backwards). Hops only buffer and pass data around, the hop itself is transform-agnostic, it doesn’t know anything about the transforms it passes data from or to. Some transforms can read from or write to other transforms conditionally to or from a number of other transforms, but this a transform-specific configuration. The hop is unaware of it. Hops can be disabled by clicking on them, or through right-click → disable.

Create a pipeline

Create a new pipeline through the work item dialog. You’ll be presented with the dialog shown below.

Hop - New Pipeline

When you are finished with your pipeline, save it. This can be done via the File menu, the icons or using CTLR s or Command s. For new pipelines a file browser is displayed to navigate towards the location you want to store the file.

Add Transform to your pipelines

Click anywhere in the pipeline canvas, the area where you’ll see the image below.

Hop - Click Anywhere

Upon clicking, you’ll be presented with the dialog shown below. The search box at the top of this dialog works for transform, name, tags (TODO) etc. Once you’ve found the transform you’re looking for, click on it to add it to your pipeline. An alternative to clicking is arrow key navigation + enter. Repeat this step now or whenever you want to add more transforms to your pipeline. Once you’ve added a transform to your pipeline, you can drag to reposition it.

Check the list of transforms to add to your pipeline for more details.

Hop - Add Transform

Add a ‘Generate Rows’ and a ‘Add Sequence’ transform, and your pipeline should like the one below.

Hop - Add two transforms

The transform object can be configured through a single click on the object. The menu displayed below will be shown based on your transform object.

Hop - transforms

ActionDescription

Detach transform

Detach the transform from the pipeline

Edit the transform

Edit the transform’s metadata

Copy transform to clipboard

Copies selected items to clipboard.

Create hop

Creates a new hop between two transforms.

Set the number of transforms

Starts several instances of a transform in parallel.

Preview output

Allows you to preview the results of the transform.

Debug output

Show the fields entering this transform

Shows metadata, like the field name and type for fields coming into the transform.

Show the fields exiting this transform

Shows metadata, like the field name and type for fields coming out of the transform.

Distribute rows

In case of more than one hop the data is distributed between the next transforms.

Copy rows

In case of more than one hop the daya is copied to the next transforms.

Specify transform partitioning

Specify how rows of data need to be grouped into partitions allowing parallel execution where similar rows need to end up on the same transform copy

Edit transform description

Add a description to the transform.

Transform error handling

Set the error handling for the transform, not available for all transforms.

Delete this transform

Delete selected transform from the canvas.

Edit Custom Logging

Edit the custom log settings for this transform. This will change the log level used for this transform.

Clear Custom Logging

Clear custom log settings. This will clear the log level used for this transform.

Sniff output

Take a look at 50 rows coming out of this transform. This will show a real-time table with a continuous output of the selected transform.

Set input data set

Defines which data to use instead of the active input transform, applies to the selected unit test

Clear input data set

Remove a defined data set from the selected unit test

Set golden data set

The input to this transform is taken and compared to the golden data set you are selecting.\nThe transform itself is not executed during testing

Clear golden data set

Remove a defined input data set from this transform unit test

Create data set

Create an empty dataset with the output fields of this transform

Write rows to data set

Run the current pipeline and write the data to a data set

Remove from test

When this unit test is run, do not include this transform

Include in test

Run the current pipeline and write the data to a data set

Bypass in tess

When this unit test is run, bypass this transform (replace with a dummy)

Remove bypass in test

Do not bypass this transform in the current pipeline during testing

Add a Hop between transforms

There are a number of ways to create a hop:

  • shift-drag: while holding down the shift key on your keyboard. Click on a transform, while holding down your primary mouse button, drag to the second transform. Release the primary mouse button and the shift key.

  • scroll-drag: scroll-click on a transform , while holding down your mouse’s scroll button, drag to the second transform. Release the scroll button.

  • click on a transform in your pipeline to open the ‘click anywhere’ dialog. Click the ‘Create hop’ image::getting-started/icons/HOP.svg[Create hop, 25px, align=”bottom”] button and select the transform you want to create the hop to.

Hop - Create Hop

Some transforms result in different types of hops.

HopDescription

Result is TRUE

Specifies that the transform will be executed only when the result from the previous transform is true

Result is FALSE

pecifies that the transform will be executed only when the result from the previous transform is false

Main output of transform

The default hop between two transforms

Pipeline properties

Pipeline properties are a collection of properties that describe the pipeline and configure its behavior.

The properties dialog can be opened by double clicking on the pipeline canvas.

Following properties can be configured:

  • Pipeline

  • Parameters

  • Monitoring

Pipeline properties

The Pipeline tab allows you to specify general properties about the pipeline including:

PropertyDescription

Pipeline name

The name of the pipeline

Synchronize name with filename

If option is enabled the filename and pipeline name are synchronized.

Pipeline filename

The filename of the pipeline

Description

Short description of the pipeline

Extended description

Long extended description of the pipeline

Status

Draft or production status

Version

Description of the version

Created by

Displays the original creator of the pipeline

Created at

Displays the date and time when the pipeline was created.

Last modified by

Displays the last user that modified the pipeline

Last modified at

Displays the date and time when the pipeline was last modified.

The parameters tab allows you to specify parameters specific for the pipeline. Parameters are defined by a name, a default value and a description.

Parameters properties

The monitoring tab allows you to specify the monitoring of the pipeline.

Monitoring properties

The options to set in this tab are:

PropertyDescriptionType

Enable transform performance monitoring

Enable performance monitoring for the transforms in this pipeline

boolean

Transform performance measurement interval (ms)

The interval (milliseconds) to monitor the performance for the transforms in this pipeline

integer

Maximum number of snapshots in memory

the number of performance monitoring snapshots to keep in memory for the transforms in this pipeline

integer