多表连接查询

一些使用数据库的场景中,需要一个查询当中使用到多张表的数据,你可以通过 JOIN 语句将两张或多张表的数据组合在一起。

开始前准备

你需要确认在开始之前,已经完成了以下任务:

已完成单机部署 MatrixOne

数据准备

  1. 下载数据集:

    1. https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/tpch/tpch-1g.zip
  2. 创建数据库和数据表:

    1. create database d1;
    2. use d1;
    3. CREATE TABLE NATION ( N_NATIONKEY INTEGER NOT NULL,
    4. N_NAME CHAR(25) NOT NULL,
    5. N_REGIONKEY INTEGER NOT NULL,
    6. N_COMMENT VARCHAR(152),
    7. PRIMARY KEY (N_NATIONKEY));
    8. CREATE TABLE REGION ( R_REGIONKEY INTEGER NOT NULL,
    9. R_NAME CHAR(25) NOT NULL,
    10. R_COMMENT VARCHAR(152),
    11. PRIMARY KEY (R_REGIONKEY));
    12. CREATE TABLE PART ( P_PARTKEY INTEGER NOT NULL,
    13. P_NAME VARCHAR(55) NOT NULL,
    14. P_MFGR CHAR(25) NOT NULL,
    15. P_BRAND CHAR(10) NOT NULL,
    16. P_TYPE VARCHAR(25) NOT NULL,
    17. P_SIZE INTEGER NOT NULL,
    18. P_CONTAINER CHAR(10) NOT NULL,
    19. P_RETAILPRICE DECIMAL(15,2) NOT NULL,
    20. P_COMMENT VARCHAR(23) NOT NULL,
    21. PRIMARY KEY (P_PARTKEY));
    22. CREATE TABLE SUPPLIER ( S_SUPPKEY INTEGER NOT NULL,
    23. S_NAME CHAR(25) NOT NULL,
    24. S_ADDRESS VARCHAR(40) NOT NULL,
    25. S_NATIONKEY INTEGER NOT NULL,
    26. S_PHONE CHAR(15) NOT NULL,
    27. S_ACCTBAL DECIMAL(15,2) NOT NULL,
    28. S_COMMENT VARCHAR(101) NOT NULL,
    29. PRIMARY KEY (S_SUPPKEY));
    30. CREATE TABLE PARTSUPP ( PS_PARTKEY INTEGER NOT NULL,
    31. PS_SUPPKEY INTEGER NOT NULL,
    32. PS_AVAILQTY INTEGER NOT NULL,
    33. PS_SUPPLYCOST DECIMAL(15,2) NOT NULL,
    34. PS_COMMENT VARCHAR(199) NOT NULL,
    35. PRIMARY KEY (PS_PARTKEY, PS_SUPPKEY));
    36. CREATE TABLE CUSTOMER ( C_CUSTKEY INTEGER NOT NULL,
    37. C_NAME VARCHAR(25) NOT NULL,
    38. C_ADDRESS VARCHAR(40) NOT NULL,
    39. C_NATIONKEY INTEGER NOT NULL,
    40. C_PHONE CHAR(15) NOT NULL,
    41. C_ACCTBAL DECIMAL(15,2) NOT NULL,
    42. C_MKTSEGMENT CHAR(10) NOT NULL,
    43. C_COMMENT VARCHAR(117) NOT NULL,
    44. PRIMARY KEY (C_CUSTKEY));
    45. CREATE TABLE ORDERS ( O_ORDERKEY BIGINT NOT NULL,
    46. O_CUSTKEY INTEGER NOT NULL,
    47. O_ORDERSTATUS CHAR(1) NOT NULL,
    48. O_TOTALPRICE DECIMAL(15,2) NOT NULL,
    49. O_ORDERDATE DATE NOT NULL,
    50. O_ORDERPRIORITY CHAR(15) NOT NULL,
    51. O_CLERK CHAR(15) NOT NULL,
    52. O_SHIPPRIORITY INTEGER NOT NULL,
    53. O_COMMENT VARCHAR(79) NOT NULL,
    54. PRIMARY KEY (O_ORDERKEY));
    55. CREATE TABLE LINEITEM ( L_ORDERKEY BIGINT NOT NULL,
    56. L_PARTKEY INTEGER NOT NULL,
    57. L_SUPPKEY INTEGER NOT NULL,
    58. L_LINENUMBER INTEGER NOT NULL,
    59. L_QUANTITY DECIMAL(15,2) NOT NULL,
    60. L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL,
    61. L_DISCOUNT DECIMAL(15,2) NOT NULL,
    62. L_TAX DECIMAL(15,2) NOT NULL,
    63. L_RETURNFLAG CHAR(1) NOT NULL,
    64. L_LINESTATUS CHAR(1) NOT NULL,
    65. L_SHIPDATE DATE NOT NULL,
    66. L_COMMITDATE DATE NOT NULL,
    67. L_RECEIPTDATE DATE NOT NULL,
    68. L_SHIPINSTRUCT CHAR(25) NOT NULL,
    69. L_SHIPMODE CHAR(10) NOT NULL,
    70. L_COMMENT VARCHAR(44) NOT NULL,
    71. PRIMARY KEY (L_ORDERKEY, L_LINENUMBER));
  3. 把数据导入到数据表中:

    1. load data infile '/YOUR_TPCH_DATA_PATH/nation.tbl' into table NATION FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    2. load data infile '/YOUR_TPCH_DATA_PATH/region.tbl' into table REGION FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    3. load data infile '/YOUR_TPCH_DATA_PATH/part.tbl' into table PART FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    4. load data infile '/YOUR_TPCH_DATA_PATH/supplier.tbl' into table SUPPLIER FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    5. load data infile '/YOUR_TPCH_DATA_PATH/partsupp.tbl' into table PARTSUPP FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    6. load data infile '/YOUR_TPCH_DATA_PATH/orders.tbl' into table ORDERS FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    7. load data infile '/YOUR_TPCH_DATA_PATH/customer.tbl' into table CUSTOMER FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
    8. load data infile '/YOUR_TPCH_DATA_PATH/lineitem.tbl' into table LINEITEM FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';

现在你可以使用这些数据进行查询。

Join 类型

内连接

内连接的连接结果只返回匹配连接条件的行。

语法图示
SELECT FROM TableA A INNER JOIN TableB B ON A.Key=B.Keyinnerjoin

内连接有两种书写方式,在结果上是完全等价的:

  1. mysql> SELECT
  2. l_orderkey,
  3. SUM(l_extendedprice * (1 - l_discount)) AS revenue,
  4. o_orderdate,
  5. o_shippriority
  6. FROM
  7. CUSTOMER,
  8. ORDERS,
  9. LINEITEM
  10. WHERE
  11. c_mktsegment = 'BUILDING'
  12. AND c_custkey = o_custkey
  13. AND l_orderkey = o_orderkey
  14. AND o_orderdate < DATE '1995-03-15'
  15. AND l_shipdate > DATE '1995-03-15'
  16. GROUP BY l_orderkey , o_orderdate , o_shippriority
  17. ORDER BY revenue DESC , o_orderdate
  18. LIMIT 10;
  19. +------------+---------------------+-------------+----------------+
  20. | l_orderkey | revenue | o_orderdate | o_shippriority |
  21. +------------+---------------------+-------------+----------------+
  22. | 2456423 | 406181.011100000000 | 1995-03-05 | 0 |
  23. | 3459808 | 405838.698900000000 | 1995-03-04 | 0 |
  24. | 492164 | 390324.061000000000 | 1995-02-19 | 0 |
  25. | 1188320 | 384537.935900000000 | 1995-03-09 | 0 |
  26. | 2435712 | 378673.055800000000 | 1995-02-26 | 0 |
  27. | 4878020 | 378376.795200000000 | 1995-03-12 | 0 |
  28. | 5521732 | 375153.921500000000 | 1995-03-13 | 0 |
  29. | 2628192 | 373133.309400000000 | 1995-02-22 | 0 |
  30. | 993600 | 371407.459500000000 | 1995-03-05 | 0 |
  31. | 2300070 | 367371.145200000000 | 1995-03-13 | 0 |
  32. +------------+---------------------+-------------+----------------+
  33. 10 rows in set (0.20 sec)

写成 Join 的形式,语法如下:

  1. mysql> SELECT
  2. l_orderkey,
  3. SUM(l_extendedprice * (1 - l_discount)) AS revenue,
  4. o_orderdate,
  5. o_shippriority
  6. FROM
  7. CUSTOMER
  8. join ORDERS on c_custkey = o_custkey
  9. join LINEITEM on l_orderkey = o_orderkey
  10. WHERE
  11. c_mktsegment = 'BUILDING'
  12. AND o_orderdate < DATE '1995-03-15'
  13. AND l_shipdate > DATE '1995-03-15'
  14. GROUP BY l_orderkey , o_orderdate , o_shippriority
  15. ORDER BY revenue DESC , o_orderdate
  16. LIMIT 10;
  17. +------------+---------------------+-------------+----------------+
  18. | l_orderkey | revenue | o_orderdate | o_shippriority |
  19. +------------+---------------------+-------------+----------------+
  20. | 2456423 | 406181.011100000000 | 1995-03-05 | 0 |
  21. | 3459808 | 405838.698900000000 | 1995-03-04 | 0 |
  22. | 492164 | 390324.061000000000 | 1995-02-19 | 0 |
  23. | 1188320 | 384537.935900000000 | 1995-03-09 | 0 |
  24. | 2435712 | 378673.055800000000 | 1995-02-26 | 0 |
  25. | 4878020 | 378376.795200000000 | 1995-03-12 | 0 |
  26. | 5521732 | 375153.921500000000 | 1995-03-13 | 0 |
  27. | 2628192 | 373133.309400000000 | 1995-02-22 | 0 |
  28. | 993600 | 371407.459500000000 | 1995-03-05 | 0 |
  29. | 2300070 | 367371.145200000000 | 1995-03-13 | 0 |
  30. +------------+---------------------+-------------+----------------+
  31. 10 rows in set (0.20 sec)

外连接

外连接又分为左连接右连接,两者之间是可以实现等价语义的:

  • LEFT JOIN

左外连接会返回左表中的所有数据行,以及右表当中能够匹配连接条件的值,如果在右表当中没有找到能够匹配的行,则使用 NULL 填充。

语法图示
SELECT FROM TableA A LEFT JOIN TableB B ON A.Key=B.Keyleftjoin
SELECT FROM TableA A LEFT JOIN TableB B ON A.Key=B.Key WHERE B.Key IS NULLleftjoinwhere
  • RIGHT JOIN

右外连接返回右表中的所有记录,以及左表当中能够匹配连接条件的值,没有匹配的值则使用 NULL 填充。

语法图示
SELECT FROM TableA A RIGHT JOIN TableB B ON A.Key=B.Keyleftjoinwhere
SELECT FROM TableA A RIGHT JOIN TableB B ON A.Key=B.Key WHERE A.Key IS NULLleftjoinwhere

语句示例如下:

  1. SELECT
  2. c_custkey, COUNT(o_orderkey) AS c_count
  3. FROM
  4. CUSTOMER
  5. LEFT OUTER JOIN ORDERS ON (c_custkey = o_custkey
  6. AND o_comment NOT LIKE '%special%requests%')
  7. GROUP BY c_custkey limit 10;
  8. +-----------+---------+
  9. | c_custkey | c_count |
  10. +-----------+---------+
  11. | 147457 | 16 |
  12. | 147458 | 7 |
  13. | 147459 | 0 |
  14. | 147460 | 16 |
  15. | 147461 | 7 |
  16. | 147462 | 0 |
  17. | 147463 | 14 |
  18. | 147464 | 11 |
  19. | 147465 | 0 |
  20. | 147466 | 17 |
  21. +-----------+---------+
  22. 10 rows in set (0.93 sec)

或者:

  1. SELECT
  2. c_custkey, COUNT(o_orderkey) AS c_count
  3. FROM
  4. ORDERS
  5. RIGHT OUTER JOIN CUSTOMER ON (c_custkey = o_custkey
  6. AND o_comment NOT LIKE '%special%requests%')
  7. GROUP BY c_custkey limit 10;
  8. +-----------+---------+
  9. | c_custkey | c_count |
  10. +-----------+---------+
  11. | 147457 | 16 |
  12. | 147458 | 7 |
  13. | 147459 | 0 |
  14. | 147460 | 16 |
  15. | 147461 | 7 |
  16. | 147462 | 0 |
  17. | 147463 | 14 |
  18. | 147464 | 11 |
  19. | 147465 | 0 |
  20. | 147466 | 17 |
  21. +-----------+---------+
  22. 10 rows in set (0.93 sec)

全连接

全连接是左右外连接的并集。连接表包含被连接的表的所有记录,如果缺少匹配的记录,即以 NULL 填充。

  1. SELECT
  2. c_custkey, COUNT(o_orderkey) AS c_count
  3. FROM
  4. CUSTOMER
  5. FULL JOIN ORDERS ON (c_custkey = o_custkey
  6. AND o_comment NOT LIKE '%special%requests%')
  7. GROUP BY c_custkey limit 10;
  8. +-----------+---------+
  9. | c_custkey | c_count |
  10. +-----------+---------+
  11. | 1 | 6 |
  12. | 2 | 7 |
  13. | 4 | 20 |
  14. | 5 | 4 |
  15. | 7 | 16 |
  16. | 8 | 13 |
  17. | 10 | 20 |
  18. | 11 | 13 |
  19. | 13 | 18 |
  20. | 14 | 9 |
  21. +-----------+---------+
  22. 10 rows in set (0.77 sec)

全连接同样可以通过改写的方式获得相同的语义:

  1. SELECT
  2. c_custkey, COUNT(o_orderkey) AS c_count
  3. FROM
  4. CUSTOMER
  5. LEFT OUTER JOIN ORDERS ON (c_custkey = o_custkey
  6. AND o_comment NOT LIKE '%special%requests%')
  7. GROUP BY c_custkey
  8. UNION
  9. SELECT
  10. c_custkey, COUNT(o_orderkey) AS c_count
  11. FROM
  12. CUSTOMER
  13. LEFT OUTER JOIN ORDERS ON (c_custkey = o_custkey
  14. AND o_comment NOT LIKE '%special%requests%')
  15. WHERE c_custkey IS NULL
  16. GROUP BY c_custkey
  17. limit 10;
  18. +-----------+---------+
  19. | c_custkey | c_count |
  20. +-----------+---------+
  21. | 147457 | 16 |
  22. | 147458 | 7 |
  23. | 147459 | 0 |
  24. | 147460 | 16 |
  25. | 147461 | 7 |
  26. | 147462 | 0 |
  27. | 147463 | 14 |
  28. | 147464 | 11 |
  29. | 147465 | 0 |
  30. | 147466 | 17 |
  31. +-----------+---------+
  32. 10 rows in set (1.09 sec)

隐式连接

在 SQL 语句当中,除了使用 JOIN,也可以通过 FROM t1, t2 子句来连接两张或多张表,通过 WHERE t1.id = t2.id 子句来指定连接的条件。