common table expression

Common table expression简称CTE,由SQL:1999标准引入, 目前支持CTE的数据库有Teradata, DB2, Firebird, Microsoft SQL Server, Oracle (with recursion since 11g release 2), PostgreSQL (since 8.4), MariaDB (since 10.2), SQLite (since 3.8.3), HyperSQL and H2 (experimental), MySQL8.0.

CTE的语法如下:

  1. WITH [RECURSIVE] with_query [, ...]
  2. SELECT...
  3. with_query:
  4. query_name [ (column_name [,...]) ] AS (SELECT ...)

以下图示来自MariaDB

Non-recursive CTEs
screenshot.png

Recursive CTEs
screenshot.png

CTE的使用

  • CTE使语句更加简洁

例如以下两个语句表达的是同一语义,使用CTE比未使用CTE的嵌套查询更简洁明了。

1) 使用嵌套子查询

  1. SELECT MAX(txt), MIN(txt)
  2. FROM
  3. (
  4. SELECT concat(cte2.txt, cte3.txt) as txt
  5. FROM
  6. (
  7. SELECT CONCAT(cte1.txt,'is a ') as txt
  8. FROM
  9. (
  10. SELECT 'This ' as txt
  11. ) as cte1
  12. ) as cte2,
  13. (
  14. SELECT 'nice query' as txt
  15. UNION
  16. SELECT 'query that rocks'
  17. UNION
  18. SELECT 'query'
  19. ) as cte3
  20. ) as cte4;

2) 使用CTE

  1. WITH cte1(txt) AS (SELECT "This "),
  2. cte2(txt) AS (SELECT CONCAT(cte1.txt,"is a ") FROM cte1),
  3. cte3(txt) AS (SELECT "nice query" UNION
  4. SELECT "query that rocks" UNION
  5. SELECT "query"),
  6. cte4(txt) AS (SELECT concat(cte2.txt, cte3.txt) FROM cte2, cte3)
  7. SELECT MAX(txt), MIN(txt) FROM cte4;
  • CTE 可以进行树形查询
    树 初始化这颗树
  1. create table t1(id int, value char(10), parent_id int);
  2. insert into t1 values(1, 'A', NULL);
  3. insert into t1 values(2, 'B', 1);
  4. insert into t1 values(3, 'C', 1);
  5. insert into t1 values(4, 'D', 1);
  6. insert into t1 values(5, 'E', 2);
  7. insert into t1 values(6, 'F', 2);
  8. insert into t1 values(7, 'G', 4);
  9. insert into t1 values(8, 'H', 6);

1) 层序遍历

  1. with recursive cte as (
  2. select id, value, 0 as level from t1 where parent_id is null
  3. union all
  4. select t1.id, t1.value, cte.level+1 from cte join t1 on t1.parent_id=cte.id)
  5. select * from cte;
  6. +------+-------+-------+
  7. | id | value | level |
  8. +------+-------+-------+
  9. | 1 | A | 0 |
  10. | 2 | B | 1 |
  11. | 3 | C | 1 |
  12. | 4 | D | 1 |
  13. | 5 | E | 2 |
  14. | 6 | F | 2 |
  15. | 7 | G | 2 |
  16. | 8 | H | 3 |
  17. +------+-------+-------+

2) 深度优先遍历

  1. with recursive cte as (
  2. select id, value, 0 as level, CAST(id AS CHAR(200)) AS path from t1 where parent_id is null
  3. union all
  4. select t1.id, t1.value, cte.level+1, CONCAT(cte.path, ",", t1.id) from cte join t1 on t1.parent_id=cte.id)
  5. select * from cte order by path;
  6. +------+-------+-------+---------+
  7. | id | value | level | path |
  8. +------+-------+-------+---------+
  9. | 1 | A | 0 | 1 |
  10. | 2 | B | 1 | 1,2 |
  11. | 5 | E | 2 | 1,2,5 |
  12. | 6 | F | 2 | 1,2,6 |
  13. | 8 | H | 3 | 1,2,6,8 |
  14. | 3 | C | 1 | 1,3 |
  15. | 4 | D | 1 | 1,4 |
  16. | 7 | G | 2 | 1,4,7 |
  17. +------+-------+-------+---------+

Oracle

Oracle从9.2才开始支持CTE, 但只支持non-recursive with, 直到Oracle 11.2才完全支持CTE。但oracle 之前就支持connect by 的树形查询,recursive with 语句可以与connect by语句相互转化。 一些相互转化案例可以参考这里.

Oracle recursive with 语句不需要指定recursive关键字,可以自动识别是否recursive.

Oracle 还支持CTE相关的hint,

  1. WITH dept_count AS (
  2. SELECT /*+ MATERIALIZE */ deptno, COUNT(*) AS dept_count
  3. FROM emp
  4. GROUP BY deptno)
  5. SELECT ...
  6. WITH dept_count AS (
  7. SELECT /*+ INLINE */ deptno, COUNT(*) AS dept_count
  8. FROM emp
  9. GROUP BY deptno)
  10. SELECT ...

“MATERIALIZE”告诉优化器产生一个全局的临时表保存结果,多次引用CTE时直接访问临时表即可。而”INLINE”则表示每次需要解析查询CTE。

PostgreSQL

PostgreSQL从8.4开始支持CTE,PostgreSQL还扩展了CTE的功能, CTE的query中支持DML语句,例如

  1. create table t1 (c1 int, c2 char(10));
  2. insert into t1 values(1,'a'),(2,'b');
  3. select * from t1;
  4. c1 | c2
  5. ----+----
  6. 1 | a
  7. 2 | b
  8. WITH cte AS (
  9. UPDATE t1 SET c1= c1 * 2 where c1=1
  10. RETURNING *
  11. )
  12. SELECT * FROM cte; //返回更新的值
  13. c1 | c2
  14. ----+------------
  15. 2 | a
  16. truncate table t1;
  17. insert into t1 values(1,'a'),(2,'b');
  18. WITH cte AS (
  19. UPDATE t1 SET c1= c1 * 2 where c1=1
  20. RETURNING *
  21. )
  22. SELECT * FROM t1;//返回原值
  23. c1 | c2
  24. ----+------------
  25. 1 | a
  26. 2 | b
  27. truncate table t1;
  28. insert into t1 values(1,'a'),(2,'b');
  29. WITH cte AS (
  30. DELETE FROM t1
  31. WHERE c1=1
  32. RETURNING *
  33. )
  34. SELECT * FROM cte;//返回删除的行
  35. c1 | c2
  36. ----+------------
  37. 1 | a
  38. truncate table t1;
  39. insert into t1 values(1,'a'),(2,'b');
  40. WITH cte AS (
  41. DELETE FROM t1
  42. WHERE c1=1
  43. RETURNING *
  44. )
  45. SELECT * FROM t1;//返回原值
  46. c1 | c2
  47. ----+------------
  48. 1 | a
  49. 2 | b
  50. (2 rows)

MariaDB

MariaDB从10.2开始支持CTE。10.2.1 支持non-recursive CTE, 10.2.2开始支持recursive CTE。 目前的GA的版本是10.1.

MySQL

MySQL从8.0开始支持完整的CTE。MySQL8.0还在development 阶段,RC都没有,GA还需时日。

AliSQL

AliSQL基于mariadb10.2, port了no-recursive CTE的实现,此功能近期会上线。

以下从源码主要相关函数简要介绍其实现,

//解析识别with table引用
find_table_def_in_with_clauses

//检查依赖关系,比如不能重复定义with table名字
With_clause::check_dependencies

// 为每个引用clone一份定义
With_element::clone_parsed_spec

//替换with table指定的列名
With_element::rename_columns_of_derived_unit

此实现对于多次引用CTE,CTE会解析多次,因此此版本CTE有简化SQL的作用,但效率上没有效提高。

  1. select count(*) from t1 where c2 !='z';
  2. +----------+
  3. | count(*) |
  4. +----------+
  5. | 65536 |
  6. +----------+
  7. 1 row in set (0.25 sec)
  8. //从执行时间来看是进行了3次全表扫描
  9. with t as (select count(*) from t1 where c2 !='z')
  10. select * from t union select * from t union select * from t;
  11. +----------+
  12. | count(*) |
  13. +----------+
  14. | 65536 |
  15. +----------+
  16. 1 row in set (0.59 sec)
  17. select count(*) from t1 where c2 !='z'
  18. union
  19. select count(*) from t1 where c2 !='z'
  20. union
  21. select count(*) from t1 where c2 !='z';
  22. +----------+
  23. | count(*) |
  24. +----------+
  25. | 65536 |
  26. +----------+
  27. 1 row in set (0.57 sec)
  28. explain with t as (select count(*) from t1 where c2 !='z')
  29. -> select * from t union select * from t union select * from t;
  30. +------+-----------------+--------------+------+---------------+------+---------+------+-------+-------------+
  31. | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
  32. +------+-----------------+--------------+------+---------------+------+---------+------+-------+-------------+
  33. | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 65536 | |
  34. | 2 | SUBQUERY | t1 | ALL | NULL | NULL | NULL | NULL | 65536 | Using where |
  35. | 3 | RECURSIVE UNION | <derived5> | ALL | NULL | NULL | NULL | NULL | 65536 | |
  36. | 5 | SUBQUERY | t1 | ALL | NULL | NULL | NULL | NULL | 65536 | Using where |
  37. | 4 | RECURSIVE UNION | <derived6> | ALL | NULL | NULL | NULL | NULL | 65536 | |
  38. | 6 | SUBQUERY | t1 | ALL | NULL | NULL | NULL | NULL | 65536 | Using where |
  39. | NULL | UNION RESULT | <union1,3,4> | ALL | NULL | NULL | NULL | NULL | NULL | |
  40. +------+-----------------+--------------+------+---------------+------+---------+------+-------+-------------+
  41. 7 rows in set (0.00 sec)
  42. explain select count(*) from t1 where c2 !='z'
  43. union
  44. select count(*) from t1 where c2 !='z'
  45. union
  46. select count(*) from t1 where c2 !='z';
  47. +------+--------------+--------------+------+---------------+------+---------+------+-------+-------------+
  48. | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
  49. +------+--------------+--------------+------+---------------+------+---------+------+-------+-------------+
  50. | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 65536 | Using where |
  51. | 2 | UNION | t1 | ALL | NULL | NULL | NULL | NULL | 65536 | Using where |
  52. | 3 | UNION | t1 | ALL | NULL | NULL | NULL | NULL | 65536 | Using where |
  53. | NULL | UNION RESULT | <union1,2,3> | ALL | NULL | NULL | NULL | NULL | NULL | |
  54. +------+--------------+--------------+------+---------------+------+---------+------+-------+-------------+
  55. 4 rows in set (0.00 sec)

以下是MySQL8.0 只扫描一次的执行计划

  1. mysql> explain select count(*) from t1 where c2 !='z' union select count(*) from t1 where c2 !='z' union select count(*) from t1 where c2 !='z';
  2. +----+--------------+--------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
  3. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
  4. +----+--------------+--------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
  5. | 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 62836 | 90.00 | Using where |
  6. | 2 | UNION | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 62836 | 90.00 | Using where |
  7. | 3 | UNION | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 62836 | 90.00 | Using where |
  8. | NULL | UNION RESULT | <union1,2,3> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary |
  9. +----+--------------+--------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
  10. 4 rows in set, 1 warning (0.00 sec)

以下是PostgreSQL9.4 只扫描一次的执行计划

  1. postgres=# explain with t as (select count(*) from t1 where c2 !='z')
  2. postgres-# select * from t union select * from t union select * from t;
  3. HashAggregate (cost=391366.28..391366.31 rows=3 width=8)
  4. Group Key: t.count
  5. CTE t
  6. -> Aggregate (cost=391366.17..391366.18 rows=1 width=0)
  7. -> Seq Scan on t1 (cost=0.00..384392.81 rows=2789345 width=0)
  8. Filter: ((c2)::text <> 'z'::text)
  9. -> Append (cost=0.00..0.09 rows=3 width=8)
  10. -> CTE Scan on t (cost=0.00..0.02 rows=1 width=8)
  11. -> CTE Scan on t t_1 (cost=0.00..0.02 rows=1 width=8)
  12. -> CTE Scan on t t_2 (cost=0.00..0.02 rows=1 width=8)

AliSQL还有待改进。