分区裁剪

分区裁剪是只有当目标表为分区表时,才可以进行的一种优化方式。分区裁剪通过分析查询语句中的过滤条件,只选择可能满足条件的分区,不扫描匹配不上的分区,进而显著地减少计算的数据量。

例如:

  1. CREATE TABLE t1 (
  2. id INT NOT NULL PRIMARY KEY,
  3. pad VARCHAR(100)
  4. )
  5. PARTITION BY RANGE COLUMNS(id) (
  6. PARTITION p0 VALUES LESS THAN (100),
  7. PARTITION p1 VALUES LESS THAN (200),
  8. PARTITION p2 VALUES LESS THAN (MAXVALUE)
  9. );
  10. INSERT INTO t1 VALUES (1, 'test1'),(101, 'test2'), (201, 'test3');
  11. EXPLAIN SELECT * FROM t1 WHERE id BETWEEN 80 AND 120;
  1. +----------------------------+---------+-----------+------------------------+------------------------------------------------+
  2. | id | estRows | task | access object | operator info |
  3. +----------------------------+---------+-----------+------------------------+------------------------------------------------+
  4. | PartitionUnion_8 | 80.00 | root | | |
  5. | ├─TableReader_10 | 40.00 | root | | data:TableRangeScan_9 |
  6. | └─TableRangeScan_9 | 40.00 | cop[tikv] | table:t1, partition:p0 | range:[80,120], keep order:false, stats:pseudo |
  7. | └─TableReader_12 | 40.00 | root | | data:TableRangeScan_11 |
  8. | └─TableRangeScan_11 | 40.00 | cop[tikv] | table:t1, partition:p1 | range:[80,120], keep order:false, stats:pseudo |
  9. +----------------------------+---------+-----------+------------------------+------------------------------------------------+
  10. 5 rows in set (0.00 sec)

分区裁剪的使用场景

分区表有 Range 分区和 hash 分区两种形式,分区裁剪对两种分区表也有不同的使用场景。

分区裁剪在 Hash 分区表上的应用

Hash 分区表上可以使用分区裁剪的场景

只有等值比较的查询条件能够支持 Hash 分区表的裁剪。

  1. create table t (x int) partition by hash(x) partitions 4;
  2. explain select * from t where x = 1;
  1. +-------------------------+----------+-----------+-----------------------+--------------------------------+
  2. | id | estRows | task | access object | operator info |
  3. +-------------------------+----------+-----------+-----------------------+--------------------------------+
  4. | TableReader_8 | 10.00 | root | | data:Selection_7 |
  5. | └─Selection_7 | 10.00 | cop[tikv] | | eq(test.t.x, 1) |
  6. | └─TableFullScan_6 | 10000.00 | cop[tikv] | table:t, partition:p1 | keep order:false, stats:pseudo |
  7. +-------------------------+----------+-----------+-----------------------+--------------------------------+

在这条 SQL 中,由条件 x = 1 可以知道所有结果均在一个分区上。数值 1 在经过 Hash 后,可以确定其在分区 p1 中。因此只需要扫描分区 p1 ,而无需访问一定不会出现相关结果的 p2p3p4 分区。从执行计划来看,其中只出现了一个 TableFullScan 算子,且在 access object 中指定了 p1 分区,确认 partition pruning 生效了。

Hash 分区表上不能使用分区裁剪的场景

场景一

不能确定查询结果只在一个分区上的条件:如 in, between, > < >= <= 等查询条件,不能使用分区裁剪的优化。

  1. create table t (x int) partition by hash(x) partitions 4;
  2. explain select * from t where x > 2;
  1. +------------------------------+----------+-----------+-----------------------+--------------------------------+
  2. | id | estRows | task | access object | operator info |
  3. +------------------------------+----------+-----------+-----------------------+--------------------------------+
  4. | Union_10 | 13333.33 | root | | |
  5. | ├─TableReader_13 | 3333.33 | root | | data:Selection_12 |
  6. | └─Selection_12 | 3333.33 | cop[tikv] | | gt(test.t.x, 2) |
  7. | └─TableFullScan_11 | 10000.00 | cop[tikv] | table:t, partition:p0 | keep order:false, stats:pseudo |
  8. | ├─TableReader_16 | 3333.33 | root | | data:Selection_15 |
  9. | └─Selection_15 | 3333.33 | cop[tikv] | | gt(test.t.x, 2) |
  10. | └─TableFullScan_14 | 10000.00 | cop[tikv] | table:t, partition:p1 | keep order:false, stats:pseudo |
  11. | ├─TableReader_19 | 3333.33 | root | | data:Selection_18 |
  12. | └─Selection_18 | 3333.33 | cop[tikv] | | gt(test.t.x, 2) |
  13. | └─TableFullScan_17 | 10000.00 | cop[tikv] | table:t, partition:p2 | keep order:false, stats:pseudo |
  14. | └─TableReader_22 | 3333.33 | root | | data:Selection_21 |
  15. | └─Selection_21 | 3333.33 | cop[tikv] | | gt(test.t.x, 2) |
  16. | └─TableFullScan_20 | 10000.00 | cop[tikv] | table:t, partition:p3 | keep order:false, stats:pseudo |
  17. +------------------------------+----------+-----------+-----------------------+--------------------------------+

在这条 SQL 中,x > 2 条件无法确定对应的 Hash Partition,所以不能使用分区裁剪。

场景二

由于分区裁剪的规则优化是在查询计划的生成阶段,对于执行阶段才能获取到过滤条件的场景,无法利用分区裁剪的优化。

  1. create table t (x int) partition by hash(x) partitions 4;
  2. explain select * from t2 where x = (select * from t1 where t2.x = t1.x and t2.x < 2);
  1. +--------------------------------------+----------+-----------+------------------------+----------------------------------------------+
  2. | id | estRows | task | access object | operator info |
  3. +--------------------------------------+----------+-----------+------------------------+----------------------------------------------+
  4. | Projection_13 | 9990.00 | root | | test.t2.x |
  5. | └─Apply_15 | 9990.00 | root | | inner join, equal:[eq(test.t2.x, test.t1.x)] |
  6. | ├─TableReader_18(Build) | 9990.00 | root | | data:Selection_17 |
  7. | └─Selection_17 | 9990.00 | cop[tikv] | | not(isnull(test.t2.x)) |
  8. | └─TableFullScan_16 | 10000.00 | cop[tikv] | table:t2 | keep order:false, stats:pseudo |
  9. | └─Selection_19(Probe) | 0.80 | root | | not(isnull(test.t1.x)) |
  10. | └─MaxOneRow_20 | 1.00 | root | | |
  11. | └─Union_21 | 2.00 | root | | |
  12. | ├─TableReader_24 | 2.00 | root | | data:Selection_23 |
  13. | └─Selection_23 | 2.00 | cop[tikv] | | eq(test.t2.x, test.t1.x), lt(test.t2.x, 2) |
  14. | └─TableFullScan_22 | 2500.00 | cop[tikv] | table:t1, partition:p0 | keep order:false, stats:pseudo |
  15. | └─TableReader_27 | 2.00 | root | | data:Selection_26 |
  16. | └─Selection_26 | 2.00 | cop[tikv] | | eq(test.t2.x, test.t1.x), lt(test.t2.x, 2) |
  17. | └─TableFullScan_25 | 2500.00 | cop[tikv] | table:t1, partition:p1 | keep order:false, stats:pseudo |
  18. +--------------------------------------+----------+-----------+------------------------+----------------------------------------------+

这个查询每从 t2 读取一行,都会去分区表 t1 上进行查询,理论上这时会满足 t1.x = val 的过滤条件,但实际上由于分区裁剪只作用于查询计划生成阶段,而不是执行阶段,因而不会做裁剪。

分区裁剪在 Range 分区表上的应用

Range 分区表上可以使用分区裁剪的场景

场景一

等值比较的查询条件可以使用分区裁剪。

  1. create table t (x int) partition by range (x) (
  2. partition p0 values less than (5),
  3. partition p1 values less than (10),
  4. partition p2 values less than (15)
  5. );
  6. explain select * from t where x = 3;
  1. +-------------------------+----------+-----------+-----------------------+--------------------------------+
  2. | id | estRows | task | access object | operator info |
  3. +-------------------------+----------+-----------+-----------------------+--------------------------------+
  4. | TableReader_8 | 10.00 | root | | data:Selection_7 |
  5. | └─Selection_7 | 10.00 | cop[tikv] | | eq(test.t.x, 3) |
  6. | └─TableFullScan_6 | 10000.00 | cop[tikv] | table:t, partition:p0 | keep order:false, stats:pseudo |
  7. +-------------------------+----------+-----------+-----------------------+--------------------------------+

使用 in 条件的等值比较查询条件也可以使用分区裁剪。

  1. create table t (x int) partition by range (x) (
  2. partition p0 values less than (5),
  3. partition p1 values less than (10),
  4. partition p2 values less than (15)
  5. );
  6. explain select * from t where x in(1,13);
  1. +-----------------------------+----------+-----------+-----------------------+--------------------------------+
  2. | id | estRows | task | access object | operator info |
  3. +-----------------------------+----------+-----------+-----------------------+--------------------------------+
  4. | Union_8 | 40.00 | root | | |
  5. | ├─TableReader_11 | 20.00 | root | | data:Selection_10 |
  6. | └─Selection_10 | 20.00 | cop[tikv] | | in(test.t.x, 1, 13) |
  7. | └─TableFullScan_9 | 10000.00 | cop[tikv] | table:t, partition:p0 | keep order:false, stats:pseudo |
  8. | └─TableReader_14 | 20.00 | root | | data:Selection_13 |
  9. | └─Selection_13 | 20.00 | cop[tikv] | | in(test.t.x, 1, 13) |
  10. | └─TableFullScan_12 | 10000.00 | cop[tikv] | table:t, partition:p2 | keep order:false, stats:pseudo |
  11. +-----------------------------+----------+-----------+-----------------------+--------------------------------+

在这条 SQL 中,由条件 x in(1,13) 可以知道所有结果只会分布在几个分区上。经过分析,发现所有 x = 1 的记录都在分区 p0 上, 所有 x = 13 的记录都在分区 p2 上,因此只需要访问 p0p2 这两个分区,

场景二

区间比较的查询条件如 between, > < = >= <= 可以使用分区裁剪。

  1. create table t (x int) partition by range (x) (
  2. partition p0 values less than (5),
  3. partition p1 values less than (10),
  4. partition p2 values less than (15)
  5. );
  6. explain select * from t where x between 7 and 14;
  1. +-----------------------------+----------+-----------+-----------------------+-----------------------------------+
  2. | id | estRows | task | access object | operator info |
  3. +-----------------------------+----------+-----------+-----------------------+-----------------------------------+
  4. | Union_8 | 500.00 | root | | |
  5. | ├─TableReader_11 | 250.00 | root | | data:Selection_10 |
  6. | └─Selection_10 | 250.00 | cop[tikv] | | ge(test.t.x, 7), le(test.t.x, 14) |
  7. | └─TableFullScan_9 | 10000.00 | cop[tikv] | table:t, partition:p1 | keep order:false, stats:pseudo |
  8. | └─TableReader_14 | 250.00 | root | | data:Selection_13 |
  9. | └─Selection_13 | 250.00 | cop[tikv] | | ge(test.t.x, 7), le(test.t.x, 14) |
  10. | └─TableFullScan_12 | 10000.00 | cop[tikv] | table:t, partition:p2 | keep order:false, stats:pseudo |
  11. +-----------------------------+----------+-----------+-----------------------+-----------------------------------+
场景三

分区表达式为 fn(col) 的简单形式,查询条件是 > < = >= <= 之一,且 fn 是单调函数,可以使用分区裁剪。

关于 fn 函数,对于任意 x y,如果 x > y,则 fn(x) > fn(y),那么这种是严格递增的单调函数。非严格递增的单调函数也可以符合分区裁剪要求,只要函数 fn 满足:对于任意 x y,如果 x > y,则 fn(x) >= fn(y)。理论上,所有满足单调条件(严格或者非严格)的函数都支持分区裁剪。目前,TiDB 支持的单调函数如下:

  1. unix_timestamp
  2. to_days

例如,分区表达式是 fn(col) 形式,fn 为我们支持的单调函数 to_days,就可以使用分区裁剪:

  1. create table t (id datetime) partition by range (to_days(id)) (
  2. partition p0 values less than (to_days('2020-04-01')),
  3. partition p1 values less than (to_days('2020-05-01')));
  4. explain select * from t where id > '2020-04-18';
  1. +-------------------------+----------+-----------+-----------------------+-------------------------------------------+
  2. | id | estRows | task | access object | operator info |
  3. +-------------------------+----------+-----------+-----------------------+-------------------------------------------+
  4. | TableReader_8 | 3333.33 | root | | data:Selection_7 |
  5. | └─Selection_7 | 3333.33 | cop[tikv] | | gt(test.t.id, 2020-04-18 00:00:00.000000) |
  6. | └─TableFullScan_6 | 10000.00 | cop[tikv] | table:t, partition:p1 | keep order:false, stats:pseudo |
  7. +-------------------------+----------+-----------+-----------------------+-------------------------------------------+

Range 分区表上不能使用分区裁剪的场景

由于分区裁剪的规则优化是在查询计划的生成阶段,对于执行阶段才能获取到过滤条件的场景,无法利用分区裁剪的优化。

  1. create table t1 (x int) partition by range (x) (
  2. partition p0 values less than (5),
  3. partition p1 values less than (10));
  4. create table t2 (x int);
  5. explain select * from t2 where x < (select * from t1 where t2.x < t1.x and t2.x < 2);
  1. +--------------------------------------+----------+-----------+------------------------+-----------------------------------------------------------+
  2. | id | estRows | task | access object | operator info |
  3. +--------------------------------------+----------+-----------+------------------------+-----------------------------------------------------------+
  4. | Projection_13 | 9990.00 | root | | test.t2.x |
  5. | └─Apply_15 | 9990.00 | root | | CARTESIAN inner join, other cond:lt(test.t2.x, test.t1.x) |
  6. | ├─TableReader_18(Build) | 9990.00 | root | | data:Selection_17 |
  7. | └─Selection_17 | 9990.00 | cop[tikv] | | not(isnull(test.t2.x)) |
  8. | └─TableFullScan_16 | 10000.00 | cop[tikv] | table:t2 | keep order:false, stats:pseudo |
  9. | └─Selection_19(Probe) | 0.80 | root | | not(isnull(test.t1.x)) |
  10. | └─MaxOneRow_20 | 1.00 | root | | |
  11. | └─Union_21 | 2.00 | root | | |
  12. | ├─TableReader_24 | 2.00 | root | | data:Selection_23 |
  13. | └─Selection_23 | 2.00 | cop[tikv] | | lt(test.t2.x, 2), lt(test.t2.x, test.t1.x) |
  14. | └─TableFullScan_22 | 2.50 | cop[tikv] | table:t1, partition:p0 | keep order:false, stats:pseudo |
  15. | └─TableReader_27 | 2.00 | root | | data:Selection_26 |
  16. | └─Selection_26 | 2.00 | cop[tikv] | | lt(test.t2.x, 2), lt(test.t2.x, test.t1.x) |
  17. | └─TableFullScan_25 | 2.50 | cop[tikv] | table:t1, partition:p1 | keep order:false, stats:pseudo |
  18. +--------------------------------------+----------+-----------+------------------------+-----------------------------------------------------------+
  19. 14 rows in set (0.00 sec)

这个查询每从 t2 读取一行,都会去分区表 t1 上进行查询,理论上这时会满足 t1.x > val 的过滤条件,但实际上由于分区裁剪只作用于查询计划生成阶段,而不是执行阶段,因而不会做裁剪。