join() function

The join() function merges two or more input streams whose values are equal on a set of common columns into a single output stream. Null values are not considered equal when comparing column values. The resulting schema is the union of the input schemas. The resulting group key is the union of the input group keys.

*Function type: Transformation
**
Output data type:* Record

  1. join(tables: {key1: table1, key2: table2}, on: ["_time", "_field"], method: "inner")

Output schema

The column schema of the output stream is the union of the input schemas. It is also the same for the output group key. Columns are renamed using the pattern <column>_<table> to prevent ambiguity in joined tables.

Example:

If you have two streams of data, data_1 and data_2, with the following group keys:

data_1: [_time, _field]
data_2: [_time, _field]

And join them with:

  1. join(tables: {d1: data_1, d2: data_2}, on: ["_time"])

The resulting group keys for all tables will be: [_time, _field_d1, _field_d2]

Parameters

tables

The map of streams to be joined. Required

*Data type: Record*

join() currently only supports two input streams.

on

The list of columns on which to join. Required

*Data type: Array of strings*

method

The method used to join. Defaults to "inner".

*Data type: String*

Possible Values:
  • inner

Examples

Example join with sample data

Given the following two streams of data:

SF_Temp**
_time_field_value
0001“temp”70
0002“temp”75
0003“temp”72
NY_Temp**
_time_field_value
0001“temp”55
0002“temp”56
0003“temp”55

And the following join query:

  1. join(
  2. tables: {sf: SF_Temp, ny: NY_Temp},
  3. on: ["_time", "_field"]
  4. )

The output will be:

_time_field_value_ny_value_sf
0001“temp”5570
0002“temp”5675
0003“temp”5572

Cross-measurement join

  1. data_1 = from(bucket:"example-bucket")
  2. |> range(start:-15m)
  3. |> filter(fn: (r) =>
  4. r._measurement == "cpu" and
  5. r._field == "usage_system"
  6. )
  7. data_2 = from(bucket:"example-bucket")
  8. |> range(start:-15m)
  9. |> filter(fn: (r) =>
  10. r._measurement == "mem" and
  11. r._field == "used_percent"
  12. )
  13. join(
  14. tables: {d1: data_1, d2: data_2},
  15. on: ["_time", "host"]
  16. )

join() versus union()

join() creates new rows based on common values in one or more specified columns. Output rows also contain the differing values from each of the joined streams. union() does not modify data in rows, but unifies separate streams of tables into a single stream of tables and groups rows of data based on existing group keys.

Given two streams of tables, t1 and t2, the results of join() and union() are illustrated below:

join - 图1

Related articles