Multi-table Join Queries

In many scenarios, you need to use one query to get data from multiple tables. You can use the JOIN statement to combine the data from two or more tables.

Before you start

Make sure you have already Deployed standalone MatrixOne.

Preparation

  1. Download the dataset:

    1. https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/tpch/tpch-1g.zip
  2. Create the database and tables:

    1. create database d1;
    2. use d1;
    3. CREATE TABLE NATION ( N_NATIONKEY INTEGER NOT NULL,
    4. N_NAME CHAR(25) NOT NULL,
    5. N_REGIONKEY INTEGER NOT NULL,
    6. N_COMMENT VARCHAR(152),
    7. PRIMARY KEY (N_NATIONKEY));
    8. CREATE TABLE REGION ( R_REGIONKEY INTEGER NOT NULL,
    9. R_NAME CHAR(25) NOT NULL,
    10. R_COMMENT VARCHAR(152),
    11. PRIMARY KEY (R_REGIONKEY));
    12. CREATE TABLE PART ( P_PARTKEY INTEGER NOT NULL,
    13. P_NAME VARCHAR(55) NOT NULL,
    14. P_MFGR CHAR(25) NOT NULL,
    15. P_BRAND CHAR(10) NOT NULL,
    16. P_TYPE VARCHAR(25) NOT NULL,
    17. P_SIZE INTEGER NOT NULL,
    18. P_CONTAINER CHAR(10) NOT NULL,
    19. P_RETAILPRICE DECIMAL(15,2) NOT NULL,
    20. P_COMMENT VARCHAR(23) NOT NULL,
    21. PRIMARY KEY (P_PARTKEY));
    22. CREATE TABLE SUPPLIER ( S_SUPPKEY INTEGER NOT NULL,
    23. S_NAME CHAR(25) NOT NULL,
    24. S_ADDRESS VARCHAR(40) NOT NULL,
    25. S_NATIONKEY INTEGER NOT NULL,
    26. S_PHONE CHAR(15) NOT NULL,
    27. S_ACCTBAL DECIMAL(15,2) NOT NULL,
    28. S_COMMENT VARCHAR(101) NOT NULL,
    29. PRIMARY KEY (S_SUPPKEY));
    30. CREATE TABLE PARTSUPP ( PS_PARTKEY INTEGER NOT NULL,
    31. PS_SUPPKEY INTEGER NOT NULL,
    32. PS_AVAILQTY INTEGER NOT NULL,
    33. PS_SUPPLYCOST DECIMAL(15,2) NOT NULL,
    34. PS_COMMENT VARCHAR(199) NOT NULL,
    35. PRIMARY KEY (PS_PARTKEY, PS_SUPPKEY));
    36. CREATE TABLE CUSTOMER ( C_CUSTKEY INTEGER NOT NULL,
    37. C_NAME VARCHAR(25) NOT NULL,
    38. C_ADDRESS VARCHAR(40) NOT NULL,
    39. C_NATIONKEY INTEGER NOT NULL,
    40. C_PHONE CHAR(15) NOT NULL,
    41. C_ACCTBAL DECIMAL(15,2) NOT NULL,
    42. C_MKTSEGMENT CHAR(10) NOT NULL,
    43. C_COMMENT VARCHAR(117) NOT NULL,
    44. PRIMARY KEY (C_CUSTKEY));
    45. CREATE TABLE ORDERS ( O_ORDERKEY BIGINT NOT NULL,
    46. O_CUSTKEY INTEGER NOT NULL,
    47. O_ORDERSTATUS CHAR(1) NOT NULL,
    48. O_TOTALPRICE DECIMAL(15,2) NOT NULL,
    49. O_ORDERDATE DATE NOT NULL,
    50. O_ORDERPRIORITY CHAR(15) NOT NULL,
    51. O_CLERK CHAR(15) NOT NULL,
    52. O_SHIPPRIORITY INTEGER NOT NULL,
    53. O_COMMENT VARCHAR(79) NOT NULL,
    54. PRIMARY KEY (O_ORDERKEY));
    55. CREATE TABLE LINEITEM ( L_ORDERKEY BIGINT NOT NULL,
    56. L_PARTKEY INTEGER NOT NULL,
    57. L_SUPPKEY INTEGER NOT NULL,
    58. L_LINENUMBER INTEGER NOT NULL,
    59. L_QUANTITY DECIMAL(15,2) NOT NULL,
    60. L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL,
    61. L_DISCOUNT DECIMAL(15,2) NOT NULL,
    62. L_TAX DECIMAL(15,2) NOT NULL,
    63. L_RETURNFLAG CHAR(1) NOT NULL,
    64. L_LINESTATUS CHAR(1) NOT NULL,
    65. L_SHIPDATE DATE NOT NULL,
    66. L_COMMITDATE DATE NOT NULL,
    67. L_RECEIPTDATE DATE NOT NULL,
    68. L_SHIPINSTRUCT CHAR(25) NOT NULL,
    69. L_SHIPMODE CHAR(10) NOT NULL,
    70. L_COMMENT VARCHAR(44) NOT NULL,
    71. PRIMARY KEY (L_ORDERKEY, L_LINENUMBER));
  3. Load data into the created tables:

    1. load data infile '/YOUR_TPCH_DATA_PATH/nation.tbl' into table NATION FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    2. load data infile '/YOUR_TPCH_DATA_PATH/region.tbl' into table REGION FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    3. load data infile '/YOUR_TPCH_DATA_PATH/part.tbl' into table PART FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    4. load data infile '/YOUR_TPCH_DATA_PATH/supplier.tbl' into table SUPPLIER FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    5. load data infile '/YOUR_TPCH_DATA_PATH/partsupp.tbl' into table PARTSUPP FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    6. load data infile '/YOUR_TPCH_DATA_PATH/orders.tbl' into table ORDERS FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    7. load data infile '/YOUR_TPCH_DATA_PATH/customer.tbl' into table CUSTOMER FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    8. load data infile '/YOUR_TPCH_DATA_PATH/lineitem.tbl' into table LINEITEM FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';

Then you can query data in MatrixOne with the created table.

Join Types

INNER JOIN

The join result of an inner join returns only rows that match the join condition.

StatementImage
SELECT FROM TableA A INNER JOIN TableB B ON A.Key=B.Keyinnerjoin

There are two ways of writing an inner join that are completely equivalent in results:

  1. mysql> SELECT
  2. l_orderkey,
  3. SUM(l_extendedprice * (1 - l_discount)) AS revenue,
  4. o_orderdate,
  5. o_shippriority
  6. FROM
  7. CUSTOMER,
  8. ORDERS,
  9. LINEITEM
  10. WHERE
  11. c_mktsegment = 'BUILDING'
  12. AND c_custkey = o_custkey
  13. AND l_orderkey = o_orderkey
  14. AND o_orderdate < DATE '1995-03-15'
  15. AND l_shipdate > DATE '1995-03-15'
  16. GROUP BY l_orderkey , o_orderdate , o_shippriority
  17. ORDER BY revenue DESC , o_orderdate
  18. LIMIT 10;
  19. +------------+---------------------+-------------+----------------+
  20. | l_orderkey | revenue | o_orderdate | o_shippriority |
  21. +------------+---------------------+-------------+----------------+
  22. | 2456423 | 406181.011100000000 | 1995-03-05 | 0 |
  23. | 3459808 | 405838.698900000000 | 1995-03-04 | 0 |
  24. | 492164 | 390324.061000000000 | 1995-02-19 | 0 |
  25. | 1188320 | 384537.935900000000 | 1995-03-09 | 0 |
  26. | 2435712 | 378673.055800000000 | 1995-02-26 | 0 |
  27. | 4878020 | 378376.795200000000 | 1995-03-12 | 0 |
  28. | 5521732 | 375153.921500000000 | 1995-03-13 | 0 |
  29. | 2628192 | 373133.309400000000 | 1995-02-22 | 0 |
  30. | 993600 | 371407.459500000000 | 1995-03-05 | 0 |
  31. | 2300070 | 367371.145200000000 | 1995-03-13 | 0 |
  32. +------------+---------------------+-------------+----------------+
  33. 10 rows in set (0.20 sec)

Write as Join, the syntax is as follows:

  1. mysql> SELECT
  2. l_orderkey,
  3. SUM(l_extendedprice * (1 - l_discount)) AS revenue,
  4. o_orderdate,
  5. o_shippriority
  6. FROM
  7. CUSTOMER
  8. join ORDERS on c_custkey = o_custkey
  9. join LINEITEM on l_orderkey = o_orderkey
  10. WHERE
  11. c_mktsegment = 'BUILDING'
  12. AND o_orderdate < DATE '1995-03-15'
  13. AND l_shipdate > DATE '1995-03-15'
  14. GROUP BY l_orderkey , o_orderdate , o_shippriority
  15. ORDER BY revenue DESC , o_orderdate
  16. LIMIT 10;
  17. +------------+---------------------+-------------+----------------+
  18. | l_orderkey | revenue | o_orderdate | o_shippriority |
  19. +------------+---------------------+-------------+----------------+
  20. | 2456423 | 406181.011100000000 | 1995-03-05 | 0 |
  21. | 3459808 | 405838.698900000000 | 1995-03-04 | 0 |
  22. | 492164 | 390324.061000000000 | 1995-02-19 | 0 |
  23. | 1188320 | 384537.935900000000 | 1995-03-09 | 0 |
  24. | 2435712 | 378673.055800000000 | 1995-02-26 | 0 |
  25. | 4878020 | 378376.795200000000 | 1995-03-12 | 0 |
  26. | 5521732 | 375153.921500000000 | 1995-03-13 | 0 |
  27. | 2628192 | 373133.309400000000 | 1995-02-22 | 0 |
  28. | 993600 | 371407.459500000000 | 1995-03-05 | 0 |
  29. | 2300070 | 367371.145200000000 | 1995-03-13 | 0 |
  30. +------------+---------------------+-------------+----------------+
  31. 10 rows in set (0.20 sec)

LEFT JOIN and RIGHT JOIN

Outer joins are further divided into left join and right join, and equivalent semantics can be achieved between the two:

  • LEFT JOIN

The LEFT JOIN returns all the rows in the left table and the values ​​in the right table that match the join condition. If no rows are matched in the right table, it will be filled with NULL.

StatementImage
SELECT FROM TableA A LEFT JOIN TableB B ON A.Key=B.Keyleftjoin
SELECT FROM TableA A LEFT JOIN TableB B ON A.Key=B.Key WHERE B.Key IS NULLleftjoinwhere
  • RIGHT JOIN

A RIGHT JOIN returns all the records in the right table and the values ​​in the left table that match the join condition. If there is no matching value, it is filled with NULL.

StatementImage
SELECT FROM TableA A RIGHT JOIN TableB B ON A.Key=B.Keyleftjoinwhere
SELECT FROM TableA A RIGHT JOIN TableB B ON A.Key=B.Key WHERE A.Key IS NULLleftjoinwhere

The example is as below:

  1. SELECT
  2. c_custkey, COUNT(o_orderkey) AS c_count
  3. FROM
  4. CUSTOMER
  5. LEFT OUTER JOIN ORDERS ON (c_custkey = o_custkey
  6. AND o_comment NOT LIKE '%special%requests%')
  7. GROUP BY c_custkey limit 10;
  8. +-----------+---------+
  9. | c_custkey | c_count |
  10. +-----------+---------+
  11. | 147457 | 16 |
  12. | 147458 | 7 |
  13. | 147459 | 0 |
  14. | 147460 | 16 |
  15. | 147461 | 7 |
  16. | 147462 | 0 |
  17. | 147463 | 14 |
  18. | 147464 | 11 |
  19. | 147465 | 0 |
  20. | 147466 | 17 |
  21. +-----------+---------+
  22. 10 rows in set (0.93 sec)

Or:

  1. SELECT
  2. c_custkey, COUNT(o_orderkey) AS c_count
  3. FROM
  4. ORDERS
  5. RIGHT OUTER JOIN CUSTOMER ON (c_custkey = o_custkey
  6. AND o_comment NOT LIKE '%special%requests%')
  7. GROUP BY c_custkey limit 10;
  8. +-----------+---------+
  9. | c_custkey | c_count |
  10. +-----------+---------+
  11. | 147457 | 16 |
  12. | 147458 | 7 |
  13. | 147459 | 0 |
  14. | 147460 | 16 |
  15. | 147461 | 7 |
  16. | 147462 | 0 |
  17. | 147463 | 14 |
  18. | 147464 | 11 |
  19. | 147465 | 0 |
  20. | 147466 | 17 |
  21. +-----------+---------+
  22. 10 rows in set (0.93 sec)

FULL JOIN

A full join is the union of left and right outer joins. The join table contains all records from the joined tables or is filled with NULL if a matching record is missing.

  1. SELECT
  2. c_custkey, COUNT(o_orderkey) AS c_count
  3. FROM
  4. CUSTOMER
  5. FULL JOIN ORDERS ON (c_custkey = o_custkey
  6. AND o_comment NOT LIKE '%special%requests%')
  7. GROUP BY c_custkey limit 10;
  8. +-----------+---------+
  9. | c_custkey | c_count |
  10. +-----------+---------+
  11. | 1 | 6 |
  12. | 2 | 7 |
  13. | 4 | 20 |
  14. | 5 | 4 |
  15. | 7 | 16 |
  16. | 8 | 13 |
  17. | 10 | 20 |
  18. | 11 | 13 |
  19. | 13 | 18 |
  20. | 14 | 9 |
  21. +-----------+---------+
  22. 10 rows in set (0.77 sec)

The full join can also be rewritten to obtain the same semantics:

  1. SELECT
  2. c_custkey, COUNT(o_orderkey) AS c_count
  3. FROM
  4. CUSTOMER
  5. LEFT OUTER JOIN ORDERS ON (c_custkey = o_custkey
  6. AND o_comment NOT LIKE '%special%requests%')
  7. GROUP BY c_custkey
  8. UNION
  9. SELECT
  10. c_custkey, COUNT(o_orderkey) AS c_count
  11. FROM
  12. CUSTOMER
  13. LEFT OUTER JOIN ORDERS ON (c_custkey = o_custkey
  14. AND o_comment NOT LIKE '%special%requests%')
  15. WHERE c_custkey IS NULL
  16. GROUP BY c_custkey
  17. limit 10;
  18. +-----------+---------+
  19. | c_custkey | c_count |
  20. +-----------+---------+
  21. | 147457 | 16 |
  22. | 147458 | 7 |
  23. | 147459 | 0 |
  24. | 147460 | 16 |
  25. | 147461 | 7 |
  26. | 147462 | 0 |
  27. | 147463 | 14 |
  28. | 147464 | 11 |
  29. | 147465 | 0 |
  30. | 147466 | 17 |
  31. +-----------+---------+
  32. 10 rows in set (1.09 sec)

Implicit join

Before the JOIN statement that explicitly declared a join was added to the SQL standard, it was possible to join two or more tables in a SQL statement using the FROM t1, t2 clause, and specify the conditions for the join using the WHERE t1.id = t2.id clause. You can understand it as an implicit join, which uses the inner join to join tables.