JOIN

It lets you combine multiple data sources (subqueries or tables) by equality of values in the specified columns or expressions (the JOIN keys).

Syntax

  1. SELECT ... FROM table_1
  2. -- The first JOIN step:
  3. <Join_Type> JOIN table_2 <Join_Condition>
  4. -- The left-hand selection -- entries in table_1
  5. -- The right-hand selection -- entries in table_2
  6. -- The next JOIN step:
  7. <Join_Type> JOIN table_n <Join_Condition>
  8. -- The left-hand selection is the JOIN result in the previous step
  9. -- The right-hand selection are entries in table_n
  10. -- JOIN can include the following steps
  11. ...
  12. WHERE ...

JOIN - 图1

At each JOIN step, rules are used to establish correspondences between rows in the left-hand and right-hand data selections, creating a new selection that includes every combination of rows that meet the JOIN conditions.

Warning

Since columns in YQL are identified by their names, and you can’t have two columns with the same name in the selection, SELECT * FROM ... JOIN ... can’t be executed if there are columns with identical names in the joined tables.

Types of join (Join_Type)

  • INNER (default): Rows from the joined selections that match no row on the other side won’t be included in the result.
  • LEFT: If there’s no value in the right-hand selection, it adds a row to the result with column values from the left-hand selection, using NULL in columns from the right-hand selection
  • RIGHT: If there’s no value in the left-hand selection, it adds the row to the result, including column values from the right-hand selection, but using NULL in columns from the left-hand selection
  • FULL = LEFT + RIGHT
  • LEFT/RIGHT SEMI: One side of the query is a whitelist of keys, its values are not available. The result includes columns from one table only, no cartesian product is created.
  • LEFT/RIGHT ONLY: Subtracting the sets by keys (blacklist). It’s almost equivalent to adding IS NULL to the key on the opposite side in the regular LEFT/RIGHT JOIN, but with no access to values, same as in SEMI JOIN.
  • CROSS: A full cartesian product of two tables without specifying key columns and no explicit ON/USING.
  • EXCLUSION: Both sides minus the intersection.

JOIN - 图2

Note

NULL is a special value to denote nothing. Hence, NULL values on both sides are NOT treated as equal to each other. This eliminates ambiguity in some types of JOIN and avoids a giant Cartesian product otherwise created.

Conditions for joining (Join_Condition)

For CROSS JOIN, no join condition is specified. The result includes the Cartesian product of the left-hand and right-hand selection, that is, combines everything with everything. The number of rows in the result selection is the product of the number of rows in the left-hand and right-hand selections.

For any other JOIN types, specify the condition using one of the two methods:

  1. USING (column_name). Used if both the left-hand and right-hand selections share a column whose equality of values is a join condition.
  2. ON (equality_conditions). Lets you set a condition of equality for column values or expressions over columns in the left-hand and right-hand selections or use several such conditions combined by and.

Examples:

  1. SELECT a.value as a_value, b.value as b_value
  2. FROM a_table AS a
  3. FULL JOIN b_table AS b USING (key);

JOIN - 图3

  1. SELECT a.value as a_value, b.value as b_value
  2. FROM a_table AS a
  3. FULL JOIN b_table AS b ON a.key = b.key;

JOIN - 图4

  1. SELECT a.value as a_value, b.value as b_value, c.column2
  2. FROM a_table AS a
  3. CROSS JOIN b_table AS b
  4. LEFT JOIN c_table AS c ON c.ref = a.key and c.column1 = b.value;

JOIN - 图5