6.1. SELECT - 6.1.4. Joins - 《Firebird 2.5 Language Reference》

6.1.4. Joins

6.1.4. Joins

Joins combine data from two sources into a single set. This is done on a row-by-row basis and usually involves checking a join condition in order to determine which rows should be merged and appear in the resulting dataset. There are several types (INNER, OUTER) and classes (qualified, natural, etc.) of joins, each with its own syntax and rules.

Since joins can be chained, the datasets involved in a join may themselves be joined sets.

Syntax

SELECT
   ...
   FROM <source>
   [<joins>]
   [...]
<source> ::=
  {   table
    | view
    | selectable-stored-procedure [(<args>)]
    | <derived-table>
    | <common-table-expression>
  } [[AS] alias]
<joins> ::= <join> [<join> ...]
<join> ::=
    [<join-type>] JOIN <source> <join-condition>
  | NATURAL [<join-type>] JOIN <source>
  | {CROSS JOIN | ,} <source>
<join-type> ::= INNER | {LEFT | RIGHT | FULL} [OUTER]
<join-condition> ::= ON <condition> | USING (<column-list>)

Table 65. Arguments for `JOIN` Clauses
Argument	Description
table	Name of a table
view	name of a view
selectable-stored-procedure	Name of a selectable stored procedure
args	Selectable stored procedure input parameter[s]
derived-table	Reference, by name, to a derived table
common-table-expression	Reference, by name, to a common table expression (CTE)
alias	An alias for a data source (table, view, procedure, CTE, derived table)
condition	Join condition (criterion)
column-list	The list of columns used for an equi-join

Inner vs. outer joins

A join always combines data rows from two sets (usually referred to as the left set and the right set). By default, only rows that meet the join condition (i.e., that match at least one row in the other set when the join condition is applied) make it into the result set. This default type of join is called an inner join. Suppose we have the following two tables:

Table A
ID	S
87	Just some text
235	Silence

Table B
CODE	X
-23	56.7735
87	416.0

If we join these tables like this:

select *
  from A
  join B on A.id = B.code;

then the result set will be:

ID	S	CODE	X
87	Just some text	87	416.0

The first row of A has been joined with the second row of B because together they met the condition “A.id = B.code”. The other rows from the source tables have no match in the opposite set and are therefore not included in the join. Remember, this is an INNER join. We can make that fact explicit by writing:

select *
  from A
  inner join B on A.id = B.code;

However, since INNER is the default, this is rarely done.

It is perfectly possible that a row in the left set matches several rows from the right set or vice versa. In that case, all those combinations are included, and we can get results like:

ID	S	CODE	X
87	Just some text	87	416.0
87	Just some text	87	-1.0
-23	Don’t know	-23	56.7735
-23	Still don’t know	-23	56.7735
-23	I give up	-23	56.7735

Sometimes we want (or need) all the rows of one or both of the sources to appear in the joined set, regardless of whether they match a record in the other source. This is where outer joins come in. A LEFT outer join includes all the records from the left set, but only matching records from the right set. In a RIGHT outer join it’s the other way around. FULL outer joins include all the records from both sets. In all outer joins, the “holes” (the places where an included source record doesn’t have a match in the other set) are filled up with NULLs.

In order to make an outer join, you must specify LEFT, RIGHT or FULL, optionally followed by the keyword OUTER.

Below are the results of the various outer joins when applied to our original tables A and B:

select *
  from A
  left [outer] join B on A.id = B.code;

ID	S	CODE	X
87	Just some text	87	416.0
235	Silence	<null>	<null>

select *
  from A
  right [outer] join B on A.id = B.code

ID	S	CODE	X
<null>	<null>	-23	56.7735
87	Just some text	87	416.0

select *
  from A
  full [outer] join B on A.id = B.code

ID	S	CODE	X
<null>	<null>	-23	56.7735
87	Just some text	87	416.0
235	Silence	<null>	<null>

Qualified joins

Qualified joins specify conditions for the combining of rows. This happens either explicitly in an ON clause or implicitly in a USING clause.

Syntax

<qualified-join> ::= [<join-type>] JOIN <source> <join-condition>
<join-type> ::= INNER | {LEFT | RIGHT | FULL} [OUTER]
<join-condition> ::= ON <condition> | USING (<column-list>)

Explicit-condition joins

Most qualified joins have an ON clause, with an explicit condition that can be any valid boolean expression but usually involves some comparison between the two sources involved.

Quite often, the condition is an equality test (or a number of ANDed equality tests) using the “=” operator. Joins like these are called equi-joins. (The examples in the section on inner and outer joins were al equi-joins.)

Examples of joins with an explicit condition:

/* Select all Detroit customers who made a purchase
   in 2013, along with the purchase details: */
select * from customers c
  join sales s on s.cust_id = c.id
  where c.city = 'Detroit' and s.year = 2013;

/* Same as above, but include non-buying customers: */
select * from customers c
  left join sales s on s.cust_id = c.id
  where c.city = 'Detroit' and s.year = 2013;

/* For each man, select the women who are taller than he.
   Men for whom no such woman exists are not included. */
select m.fullname as man, f.fullname as woman
  from males m
  join females f on f.height > m.height;

/* Select all pupils with their class and mentor.
   Pupils without a mentor are also included.
   Pupils without a class are not included. */
select p.firstname, p.middlename, p.lastname,
       c.name, m.name
  from pupils p
  join classes c on c.id = p.class
  left join mentors m on m.id = p.mentor;

Named columns joins

Equi-joins often compare columns that have the same name in both tables. If this is the case, we can also use the second type of qualified join: the named columns join.

Named columns joins are not supported in Dialect 1 databases.

Named columns joins have a USING clause which states just the column names. So instead of this:

select * from flotsam f
  join jetsam j
  on f.sea = j.sea
  and f.ship = j.ship;

we can also write:

select * from flotsam
  join jetsam using (sea, ship)

which is considerably shorter. The result set is a little different though — at least when using “SELECT *”:

The explicit-condition join — with the ON clause — will contain each of the columns SEA and SHIP twice: once from table FLOTSAM, and once from table JETSAM. Obviously, they will have the same values.
The named columns join — with the USING clause — will contain these columns only once.

If you want all the columns in the result set of the named columns join, set up your query like this:

select f.*, j.*
  from flotsam f
  join jetsam j using (sea, ship);

This will give you the exact same result set as the explicit-condition join.

For an OUTER named columns join, there’s an additional twist when using “SELECT *” or an unqualified column name from the USING list:

If a row from one source set doesn’t have a match in the other but must still be included because of the LEFT, RIGHT or FULL directive, the merged column in the joined set gets the non-NULL value. That is fair enough, but now you can’t tell whether this value came from the left set, the right set, or both. This can be especially deceiving when the value came from the right hand set, because “*” always shows combined columns in the left hand part — even in the case of a RIGHT join.

Whether this is a problem or not depends on the situation. If it is, use the “a.*, b.*” approach shown above, with a and b the names or aliases of the two sources. Or better yet, avoid “*” altogether in your serious queries and qualify all column names in joined sets. This has the additional benefit that it forces you to think about which data you want to retrieve and where from.

It is your responsibility to make sure that the column names in the USING list are of compatible types between the two sources. If the types are compatible but not equal, the engine converts them to the type with the broadest range of values before comparing the values. This will also be the data type of the merged column that shows up in the result set if “SELECT *” or the unqualified column name is used. Qualified columns on the other hand will always retain their original data type.

Natural joins

Taking the idea of the named columns join a step further, a natural join performs an automatic equi-join on all the columns that have the same name in the left and right table. The data types of these columns must be compatible.

Natural joins are not supported in Dialect 1 databases.

Syntax

<natural-join> ::= NATURAL [<join-type>] JOIN <source>
<join-type> ::= INNER | {LEFT | RIGHT | FULL} [OUTER]

Given these two tables:

create table TA (
  a bigint,
  s varchar(12),
  ins_date date
);

create table TB (
  a bigint,
  descr varchar(12),
  x float,
  ins_date date
);

a natural join on TA and TB would involve the columns a and ins_date, and the following two statements would have the same effect:

select * from TA
  natural join TB;

select * from TA
  join TB using (a, ins_date);

Like all joins, natural joins are inner joins by default, but you can turn them into outer joins by specifying LEFT, RIGHT or FULL before the JOIN keyword.

Caution: if there are no columns with the same name in the two source relations, a CROSS JOIN is performed. We’ll get to this type of join in a minute.

A Note on Equality

This note about equality and inequality operators applies everywhere in Firebird’s SQL language, not just in JOIN conditions.

The “=” operator, which is explicitly used in many conditional joins and implicitly in named column joins and natural joins, only matches values to values. According to the SQL standard, NULL is not a value and hence two NULLs are neither equal nor unequal to one another. If you need NULLs to match each other in a join, use the IS NOT DISTINCT FROM operator. This operator returns true if the operands have the same value or if they are both NULL.

select *
  from A join B
  on A.id is not distinct from B.code;

Likewise, in the — extremely rare — cases where you want to join on inequality, use IS DISTINCT FROM, not “<>”, if you want NULL to be considered different from any value and two NULLs considered equal:

select *
  from A join B
  on A.id is distinct from B.code;

Cross joins

A cross join produces the full set product of the two data sources. This means that it successfully matches every row in the left source to every row in the right source.

Syntax

<cross-join> ::= {CROSS JOIN | ,} <source>

Please notice that the comma syntax is deprecated! It is only supported to keep legacy code working and may disappear in some future version.

Cross-joining two sets is equivalent to joining them on a tautology (a condition that is always true). The following two statements have the same effect:

select * from TA
  cross join TB;

select * from TA
  join TB on 1 = 1;

Cross joins are inner joins, because they only include matching records – it just so happens that every record matches! An outer cross join, if it existed, wouldn’t add anything to the result, because what outer joins add are non-matching records, and these don’t exist in cross joins.

Cross joins are seldom useful, except if you want to list all the possible combinations of two or more variables. Suppose you are selling a product that comes in different sizes, different colors and different materials. If these variables are each listed in a table of their own, this query would return all the combinations:

select m.name, s.size, c.name
  from materials m
  cross join sizes s
  cross join colors c;

Ambiguous field names in joins

Firebird rejects unqualified field names in a query if these field names exist in more than one dataset involved in a join. This is even true for inner equi-joins where the field name figures in the ON clause like this:

select a, b, c
  from TA
  join TB on TA.a = TB.a;

There is one exception to this rule: with named columns joins and natural joins, the unqualified field name of a column taking part in the matching process may be used legally and refers to the merged column of the same name. For named columns joins, these are the columns listed in the USING clause. For natural joins, they are the columns that have the same name in both relations. But please notice again that, especially in outer joins, plain colname isn’t always the same as left.colname or right.colname. Types may differ, and one of the qualified columns may be NULL while the other isn’t. In that case, the value in the merged, unqualified column may mask the fact that one of the source values is absent.

Joins with stored procedures

If a join is performed with a stored procedure that is not correlated with other data streams via input parameters, there are no oddities. If correlation is involved, an unpleasant quirk reveals itself. The problem is that the optimizer denies itself any way to determine the interrelationships of the input parameters of the procedure from the fields in the other streams:

SELECT *
FROM MY_TAB
JOIN MY_PROC(MY_TAB.F) ON 1 = 1;

Here, the procedure will be executed before a single record has been retrieved from the table, MY_TAB. The isc_no_cur_rec error error (no current record for fetch operation) is raised, interrupting the execution.

The solution is to use syntax that specifies the join order explicitly:

SELECT *
FROM MY_TAB
LEFT JOIN MY_PROC(MY_TAB.F) ON 1 = 1;

This forces the table to be read before the procedure and everything works correctly.

This quirk has been recognised as a bug in the optimizer and will be fixed in the next version of Firebird.