Materialized Views

There are several different ways to exploit materialized views in Calcite.

Materialized views maintained by Calcite

For details, see the lattices documentation.

Expose materialized views to Calcite

Some Calcite adapters as well as projects that rely on Calcite have their own notion of materialized views.

For example, Apache Cassandra allows the user to define materialized views based on existing tables which are automatically maintained.The Cassandra adapter automatically exposes these materialized views to Calcite.

Another example is Apache Hive. When a materialized view is created in Hive, the user can specify whether the view may be used in query optimization. If the user chooses to do so, the materialized view will be registered with Calcite.

By registering materialized views in Calcite, the optimizer has the opportunity to automatically rewrite queries to use these views.

View-based query rewriting

View-based query rewriting aims to take an input query which can be answered using a preexisting view and rewrite the query to make use of the view.Currently Calcite has two implementations of view-based query rewriting.

Substitution via rules transformation

The first approach is based on view substitution.SubstitutionVisitor and its extension MaterializedViewSubstitutionVisitor aim to substitute part of the relational algebra tree with an equivalent expression which makes use of a materialized view. The scan over the materialized view and the materialized view definition plan are registered with the planner. Afterwards, transformation rules that try to unify expressions in the plan are triggered. Expressions do not need to be equivalent to be replaced: the visitor might add a residual predicate on top of the expression if needed.

The following example is taken from the documentation of SubstitutionVisitor:

  • Query: SELECT a, c FROM t WHERE x = 5 AND b = 4
  • Target (materialized view definition): SELECT a, b, c FROM t WHERE x = 5
  • Result: SELECT a, c FROM mv WHERE b = 4

Note that result uses the materialized view table mv and a simplified condition b = 4.

While this approach can accomplish a large number of rewritings, it has some limitations. Since the rule relies on transformation rules to create the equivalence between expressions in the query and the materialized view, it might need to enumerate exhaustively all possible equivalent rewritings for a given expression to find a materialized view substitution. However, this is not scalable in the presence of complexviews, e.g., views with an arbitrary number of join operators.

Rewriting using plan structural information

In turn, an alternative rule that attempts to match queries to views by extracting some structural information about the expression to replace has been proposed.

AbstractMaterializedViewRule builds on the ideas presented in [GL01] and introduces some additional extensions.The rule can rewrite expressions containing arbitrary chains of Join, Filter, and Project operators.Additionally, the rule can rewrite expressions rooted at an Aggregate operator, rolling aggregations up if necessary. In turn, it can also produce rewritings using Union operators if the query can be partially answered from a view.

To produce a larger number of rewritings, the rule relies on information exposed as constraints defined over the database tables, e.g., foreign keys, primary keys, unique keys or not null.

Rewriting coverage

Let us illustrate with some examples the coverage of the view rewriting algorithm implemented in AbstractMaterializedViewRule. The examples are based on the following database schema.

  1. CREATE TABLE depts(
  2. deptno INT NOT NULL,
  3. deptname VARCHAR(20),
  4. PRIMARY KEY (deptno)
  5. );
  6. CREATE TABLE locations(
  7. locationid INT NOT NULL,
  8. state CHAR(2),
  9. PRIMARY KEY (locationid)
  10. );
  11. CREATE TABLE emps(
  12. empid INT NOT NULL,
  13. deptno INT NOT NULL,
  14. locationid INT NOT NULL,
  15. empname VARCHAR(20) NOT NULL,
  16. salary DECIMAL (18, 2),
  17. PRIMARY KEY (empid),
  18. FOREIGN KEY (deptno) REFERENCES depts(deptno),
  19. FOREIGN KEY (locationid) REFERENCES locations(locationid)
  20. );
Join rewriting

The rewriting can handle different join orders in the query and the view definition. In addition, the rule tries to detect when a compensation predicate could be used to produce a rewriting using a view.

  • Query:
  1. SELECT empid
  2. FROM depts
  3. JOIN (
  4. SELECT empid, deptno
  5. FROM emps
  6. WHERE empid = 1) AS subq
  7. ON depts.deptno = subq.deptno
  • Materialized view definition:
  1. SELECT empid
  2. FROM emps
  3. JOIN depts USING (deptno)
  • Rewriting:
  1. SELECT empid
  2. FROM mv
  3. WHERE empid = 1
Aggregate rewriting
  • Query:
  1. SELECT deptno
  2. FROM emps
  3. WHERE deptno > 10
  4. GROUP BY deptno
  • Materialized view definition:
  1. SELECT empid, deptno
  2. FROM emps
  3. WHERE deptno > 5
  4. GROUP BY empid, deptno
  • Rewriting:
  1. SELECT deptno
  2. FROM mv
  3. WHERE deptno > 10
  4. GROUP BY deptno
Aggregate rewriting (with aggregation rollup)
  • Query:
  1. SELECT deptno, COUNT(*) AS c, SUM(salary) AS s
  2. FROM emps
  3. GROUP BY deptno
  • Materialized view definition:
  1. SELECT empid, deptno, COUNT(*) AS c, SUM(salary) AS s
  2. FROM emps
  3. GROUP BY empid, deptno
  • Rewriting:
  1. SELECT deptno, SUM(c), SUM(s)
  2. FROM mv
  3. GROUP BY deptno
Query partial rewriting

Through the declared constraints, the rule can detect joins that only append columns without altering the tuples multiplicity and produce correct rewritings.

  • Query:
  1. SELECT deptno, COUNT(*)
  2. FROM emps
  3. GROUP BY deptno
  • Materialized view definition:
  1. SELECT empid, depts.deptno, COUNT(*) AS c, SUM(salary) AS s
  2. FROM emps
  3. JOIN depts USING (deptno)
  4. GROUP BY empid, depts.deptno
  • Rewriting:
  1. SELECT deptno, SUM(c)
  2. FROM mv
  3. GROUP BY deptno
View partial rewriting
  • Query:
  1. SELECT deptname, state, SUM(salary) AS s
  2. FROM emps
  3. JOIN depts ON emps.deptno = depts.deptno
  4. JOIN locations ON emps.locationid = locations.locationid
  5. GROUP BY deptname, state
  • Materialized view definition:
  1. SELECT empid, deptno, state, SUM(salary) AS s
  2. FROM emps
  3. JOIN locations ON emps.locationid = locations.locationid
  4. GROUP BY empid, deptno, state
  • Rewriting:
  1. SELECT deptname, state, SUM(s)
  2. FROM mv
  3. JOIN depts ON mv.deptno = depts.deptno
  4. GROUP BY deptname, state
Union rewriting
  • Query:
  1. SELECT empid, deptname
  2. FROM emps
  3. JOIN depts ON emps.deptno = depts.deptno
  4. WHERE salary > 10000
  • Materialized view definition:
  1. SELECT empid, deptname
  2. FROM emps
  3. JOIN depts ON emps.deptno = depts.deptno
  4. WHERE salary > 12000
  • Rewriting:
  1. SELECT empid, deptname
  2. FROM mv
  3. UNION ALL
  4. SELECT empid, deptname
  5. FROM emps
  6. JOIN depts ON emps.deptno = depts.deptno
  7. WHERE salary > 10000 AND salary <= 12000
Union rewriting with aggregate
  • Query:
  1. SELECT empid, deptname, SUM(salary) AS s
  2. FROM emps
  3. JOIN depts ON emps.deptno = depts.deptno
  4. WHERE salary > 10000
  5. GROUP BY empid, deptname
  • Materialized view definition:
  1. SELECT empid, deptname, SUM(salary) AS s
  2. FROM emps
  3. JOIN depts ON emps.deptno = depts.deptno
  4. WHERE salary > 12000
  5. GROUP BY empid, deptname
  • Rewriting:
  1. SELECT empid, deptname, SUM(s)
  2. FROM (
  3. SELECT empid, deptname, s
  4. FROM mv
  5. UNION ALL
  6. SELECT empid, deptname, SUM(salary) AS s
  7. FROM emps
  8. JOIN depts ON emps.deptno = depts.deptno
  9. WHERE salary > 10000 AND salary <= 12000
  10. GROUP BY empid, deptname) AS subq
  11. GROUP BY empid, deptname
Limitations

This rule still presents some limitations. In particular, the rewriting rule attempts to match all views against each query. We plan to implement more refined filtering techniques such as those described in [GL01].

References