简介

    数据库中的集合操作可以把多个查询的结果组合成一个结果集。集合操作主要包含:

    • UNION

    • INTERSECT

    • EXCEPT/MINUS

    说明

    注意: 在Oceanbase中,我们同时支持EXCEPT与MINUS,这两者的语义和语法都是相同的。

    对与UNION来说用户可以指定UNION的属性为ALL和DISTINCT/UNIQUE。分别代表集合可重复,和集合不可重复。所有的集合操作默认的属性是DISTINCT。在Oceanbase中,DISTINCT和UNIQUE的语义语法是相同的。而其他的几种集合操作是不能指定属性的。

    执行计划

    所有的集合操作在逻辑计划层面的展示都是[UNION|INTERSECT|EXCEPT] [ALL|DISTINCT]的形式。带有distinct关键字时,需要从输出的结果中去掉重复的行,目前OceanBase只支持基于排序的集合distinct去重,在计划层面,排序操作会下压到各分支内部进行。

    1. OceanBase_114 (root@test)> explain select * from t1 union select * from t1\G
    2. *************************** 1. row ***************************
    3. Query Plan: =======================================
    4. |ID|OPERATOR |NAME|EST. ROWS|COST|
    5. ---------------------------------------
    6. |0 |UNION DISTINCT| |2000 |4861|
    7. |1 | SORT | |1000 |2048|
    8. |2 | TABLE SCAN |t1 |1000 |498 |
    9. |3 | SORT | |1000 |2048|
    10. |4 | TABLE SCAN |t1 |1000 |498 |
    11. =======================================
    12. Outputs & filters:
    13. -------------------------------------
    14. 0 - output([UNION(t1.c1, t1.c1)], [UNION(t1.c2, t1.c2)]), filter(nil)
    15. 1 - output([t1.c1], [t1.c2]), filter(nil), sort_keys([t1.c1, ASC], [t1.c2, ASC])
    16. 2 - output([t1.c1], [t1.c2]), filter(nil),
    17. access([t1.c1], [t1.c2]), partitions(p0)
    18. 3 - output([t1.c1], [t1.c2]), filter(nil), sort_keys([t1.c1, ASC], [t1.c2, ASC])
    19. 4 - output([t1.c1], [t1.c2]), filter(nil),
    20. access([t1.c1], [t1.c2]), partitions(p0)
    21. 1 row in set (0.02 sec)
    22. OceanBase_114 (root@test)> explain select * from t1 intersect select * from t1\G
    23. *************************** 1. row ***************************
    24. Query Plan: ===========================================
    25. |ID|OPERATOR |NAME|EST. ROWS|COST|
    26. -------------------------------------------
    27. |0 |INTERSECT DISTINCT| |1000 |4723|
    28. |1 | SORT | |1000 |2048|
    29. |2 | TABLE SCAN |t1 |1000 |498 |
    30. |3 | SORT | |1000 |2048|
    31. |4 | TABLE SCAN |t1 |1000 |498 |
    32. ===========================================
    33. Outputs & filters:
    34. -------------------------------------
    35. 0 - output([INTERSECT(t1.c1, t1.c1)], [INTERSECT(t1.c2, t1.c2)]), filter(nil)
    36. 1 - output([t1.c1], [t1.c2]), filter(nil), sort_keys([t1.c1, ASC], [t1.c2, ASC])
    37. 2 - output([t1.c1], [t1.c2]), filter(nil),
    38. access([t1.c1], [t1.c2]), partitions(p0)
    39. 3 - output([t1.c1], [t1.c2]), filter(nil), sort_keys([t1.c1, ASC], [t1.c2, ASC])
    40. 4 - output([t1.c1], [t1.c2]), filter(nil),
    41. access([t1.c1], [t1.c2]), partitions(p0)
    42. 1 row in set (0.02 sec)
    43. OceanBase_114 (root@test)> explain select * from t1 minus select * from t1\G
    44. *************************** 1. row ***************************
    45. Query Plan: ========================================
    46. |ID|OPERATOR |NAME|EST. ROWS|COST|
    47. ----------------------------------------
    48. |0 |EXCEPT DISTINCT| |1000 |4723|
    49. |1 | SORT | |1000 |2048|
    50. |2 | TABLE SCAN |t1 |1000 |498 |
    51. |3 | SORT | |1000 |2048|
    52. |4 | TABLE SCAN |t1 |1000 |498 |
    53. ========================================
    54. Outputs & filters:
    55. -------------------------------------
    56. 0 - output([EXCEPT(t1.c1, t1.c1)], [EXCEPT(t1.c2, t1.c2)]), filter(nil)
    57. 1 - output([t1.c1], [t1.c2]), filter(nil), sort_keys([t1.c1, ASC], [t1.c2, ASC])
    58. 2 - output([t1.c1], [t1.c2]), filter(nil),
    59. access([t1.c1], [t1.c2]), partitions(p0)
    60. 3 - output([t1.c1], [t1.c2]), filter(nil), sort_keys([t1.c1, ASC], [t1.c2, ASC])
    61. 4 - output([t1.c1], [t1.c2]), filter(nil),
    62. access([t1.c1], [t1.c2]), partitions(p0)
    63. 1 row in set (0.01 sec)

    而带有ALL属性的集合操作不要求下层有序。

    1. OceanBase_114 (root@test)> explain select * from t1 union all select * from t1\G
    2. *************************** 1. row ***************************
    3. Query Plan: ====================================
    4. |ID|OPERATOR |NAME|EST. ROWS|COST|
    5. ------------------------------------
    6. |0 |UNION ALL | |2000 |1761|
    7. |1 | TABLE SCAN|t1 |1000 |498 |
    8. |2 | TABLE SCAN|t1 |1000 |498 |
    9. ====================================
    10. Outputs & filters:
    11. -------------------------------------
    12. 0 - output([UNION(t1.c1, t1.c1)], [UNION(t1.c2, t1.c2)]), filter(nil)
    13. 1 - output([t1.c1], [t1.c2]), filter(nil),
    14. access([t1.c1], [t1.c2]), partitions(p0)
    15. 2 - output([t1.c1], [t1.c2]), filter(nil),
    16. access([t1.c1], [t1.c2]), partitions(p0)
    17. 1 row in set (0.02 sec)

    当集合操作两边出现类型不一致的时候,SQL层会为UNION左右两边的表达式分配CAST转换,这个操作在EXPLAIN中是可见的。

    1. OceanBase_114 (root@test)> create table t1 (c1 int, c2 varchar(5));
    2. Query OK, 0 rows affected (0.12 sec)
    3. OceanBase_114 (root@test)> explain select c1 from t1 union select c2 from t1\G
    4. *************************** 1. row ***************************
    5. Query Plan: ========================================
    6. |ID|OPERATOR |NAME|EST. ROWS|COST |
    7. ----------------------------------------
    8. |0 |UNION DISTINCT| |2000 |11986|
    9. |1 | SORT | |1000 |5664 |
    10. |2 | TABLE SCAN |t1 |1000 |476 |
    11. |3 | SORT | |1000 |5664 |
    12. |4 | TABLE SCAN |t1 |1000 |476 |
    13. ========================================
    14. Outputs & filters:
    15. -------------------------------------
    16. 0 - output([UNION(cast(t1.c1, VARCHAR(11)), cast(t1.c2, VARCHAR(11)))]), filter(nil)
    17. 1 - output([cast(t1.c1, VARCHAR(11))]), filter(nil), sort_keys([cast(t1.c1, VARCHAR(11)), ASC])
    18. 2 - output([cast(t1.c1, VARCHAR(11))]), filter(nil),
    19. access([t1.c1]), partitions(p0)
    20. 3 - output([cast(t1.c2, VARCHAR(11))]), filter(nil), sort_keys([cast(t1.c2, VARCHAR(11)), ASC])
    21. 4 - output([cast(t1.c2, VARCHAR(11))]), filter(nil),
    22. access([t1.c2]), partitions(p0)
    23. 1 row in set (0.01 sec)