WITH (Common Table Expressions)

Description

Common Table Expression (CTE) is a temporary result set defined within the scope of a single statement, valid only during the execution of the query. It can self-reference and can be referenced multiple times within the same query. Its purpose is simplifying complex queries, enhance code readability, and improve maintainability. A CTE can be seen as a temporary view that exists only for the query’s execution and is not visible to external queries.

Once defined, a CTE can be referenced like a SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement.

The WITH clause is used to specify Common Table Expressions, and the WITH clause can include one or more comma-separated clauses. Each clause provides a subquery that generates a result set and associates a name with the subquery.

Use Cases:

  • CTEs can reuse the same subquery in multiple places, avoiding redundant logic.
  • They can simplify recursive queries, such as querying tree-structured data.
  • Complex queries can be broken down into smaller parts using CTEs, making the query logic clearer and more understandable.

Common Table Expressions are divided into two types: non-recursive and recursive:

  • Non-recursive CTE refers to an expression in which the CTE does not reference itself. It is used to build a one-time temporary result set and does not involve recursion.

  • Recursive CTE refers to an expression in which the CTE references itself. It handles data with recursive structures, such as trees or graphs. A recursive CTE includes a base query (initial condition) in its definition, then performs recursive operations on the result of that base query until a stop condition is met.

Non-recursive CTE

Syntax

  1. WITH <query_name> AS (
  2. <query_definition>
  3. )
  4. SELECT ... FROM <query_name>;

Explanations

  • <query_name>: Specifies the temporary name assigned to the CTE result set. It can be any valid identifier, similar to a table or column name.

  • <query_definition>: The query statement defines the CTE result set. It can be any valid SELECT query used to create the result set of the CTE.

  • SELECT ... FROM <query_name>: This is the query executed on the CTE, where you can use the name of the CTE.

Recursive CTE

Syntax

  1. WITH RECURSIVE <query_name> AS (
  2. <query_definition>
  3. )
  4. SELECT ... FROM <query_name>;

Explanations

  • WITH RECURSIVE: Indicates that this is a recursive CTE.

  • <query_name>: Specifies the temporary name assigned to the result set of the recursive CTE. It can be any valid identifier, similar to a table or column name.

  • <query_definition>: This consists of two parts in the context of a recursive CTE:

    • Initial part: Defines the recursion’s initial condition and result set.
    • Recursive function: Defines how to recursively generate the next round of the result set from the initial result set.
  • SELECT ... FROM <query_name>: Use the name of the recursive CTE to query the recursive CTE.

Guidelines for Using Recursive CTEs

Anchor and Recursive Members

A recursive common table expression (CTE) must consist of at least two query definitions: an anchor member and a recursive member. The anchor member should come before the first recursive member, and you can define multiple anchor and recursive members. All CTE query definitions are considered anchor members unless they reference the CTE itself.

Suppose you have a table named Employee that contains employee information, including fields like EmployeeID, Name, and ManagerID, representing the employee’s ID, name, and ID of their manager. You can use a recursive CTE to query the hierarchical relationship between employees and subordinates.

Assuming the table data is as follows:

EmployeeIDNameManagerID
1AliceNULL
2Bob1
3Charlie1
4David2
5Eve2
6Frank3

Here’s an example of using a recursive CTE to query the hierarchical relationship between employees and their subordinates:

  1. WITH RECURSIVE EmployeeHierarchy AS (
  2. -- Anchor member: Find top-level employees
  3. SELECT EmployeeID, Name, ManagerID, 0 AS Level
  4. FROM Employee
  5. WHERE ManagerID IS NULL
  6. UNION ALL
  7. -- Recursive member: Recursively query subordinate employees
  8. SELECT e.EmployeeID, e.Name, e.ManagerID, eh.Level + 1
  9. FROM Employee AS e
  10. JOIN EmployeeHierarchy AS eh ON e.ManagerID = eh.EmployeeID
  11. )
  12. SELECT Name, Level
  13. FROM EmployeeHierarchy;

In the above example:

  • The anchor member selects top-level employees (with ManagerID as NULL) and sets their level (Level) to 0.
  • The recursive member queries subordinate employees based on the previous round’s results (EmployeeHierarchy), incrementing the level.
  • The final query uses SELECT to retrieve employee names and levels from the recursive CTE.

Executing this query will provide information about the hierarchical relationship between employees and their subordinates. Both anchor and recursive members together form the structure of a recursive query. On the other hand, a non-recursive CTE is used to create a temporary result set with a single query definition, and you only need to reference this CTE in your query without concerning anchor and recursive members.

Operators and Statement Requirements
  • Set Operators: Anchor members must be combined using set operators (such as UNION ALL, UNION, INTERSECT, or EXCEPT). Only UNION ALL is allowed between the last anchor member and the first recursive member, as well as when combining multiple recursive members.

  • Column Matching: The number of columns in anchor and recursive members must be the same.

  • Data Types: Columns in the recursive member must have the same data types as the corresponding columns in the anchor member.

  • FROM Clause: The FROM clause of a recursive member can only reference the CTE expression_name once.

  • Unsupported Features: Certain features are not allowed in the CTE_query_definition of a recursive member, including:

    • Using the SELECT DISTINCT keyword for distinct queries.
    • Using GROUP BY to group results.
    • Using HAVING to filter results after grouping.
    • Scalar aggregation applies an aggregate function (like SUM, AVG, etc.) to a set of rows and returns a single value.
    • Outer join operations like LEFT, RIGHT, and OUTER JOIN (though INNER JOIN is allowed).
    • Subqueries.

Examples

  • Non-recursive CTE example:
  1. CREATE TABLE employees (
  2. id INT PRIMARY KEY,
  3. name VARCHAR(50),
  4. salary INT
  5. );
  6. INSERT INTO employees (id, name, salary) VALUES
  7. (1, 'Alice', 50000),
  8. (2, 'Bob', 60000),
  9. (3, 'Charlie', 75000),
  10. (4, 'David', 55000),
  11. (5, 'Eve', 80000);
  12. -- Query employees whose salary is higher than the average salary
  13. mysql> WITH avg_salary AS (
  14. SELECT AVG(salary) AS avg_salary FROM employees)
  15. SELECT name, salary
  16. FROM employees
  17. JOIN avg_salary ON salary > avg_salary.avg_salary;
  18. +---------+--------+
  19. | name | salary |
  20. +---------+--------+
  21. | Charlie | 75000 |
  22. | Eve | 80000 |
  23. +---------+--------+
  24. 2 rows in set (0.00 sec)
  • Recursive CTE example:
  1. CREATE TABLE employees_hierarchy (
  2. id INT PRIMARY KEY,
  3. name VARCHAR(50),
  4. manager_id INT
  5. );
  6. INSERT INTO employees_hierarchy (id, name, manager_id) VALUES
  7. (1, 'Alice', NULL),
  8. (2, 'Bob', 1),
  9. (3, 'Charlie', 1),
  10. (4, 'David', 2),
  11. (5, 'Eve', 2),
  12. (6, 'Frank', 3);
  13. -- Query an employee and all his employees
  14. mysql> WITH RECURSIVE employee_hierarchy_cte (id, name, manager_id, level) AS (
  15. SELECT id, name, manager_id, 0
  16. FROM employees_hierarchy
  17. WHERE name = 'Alice'
  18. UNION ALL
  19. SELECT e.id, e.name, e.manager_id, eh.level + 1
  20. FROM employees_hierarchy AS e
  21. JOIN employee_hierarchy_cte AS eh ON e.manager_id = eh.id
  22. )
  23. SELECT name, level
  24. FROM employee_hierarchy_cte;
  25. +---------+-------+
  26. | name | level |
  27. +---------+-------+
  28. | Alice | 0 |
  29. | Bob | 1 |
  30. | Charlie | 1 |
  31. | David | 2 |
  32. | Eve | 2 |
  33. | Frank | 2 |
  34. +---------+-------+
  35. 6 rows in set (0.00 sec)