案例:调整查询重写GUC参数rewrite_rule

rewrite_rule包含了多个查询重写规则:magicset、partialpush、uniquecheck、disablerep、intargetlist、predpush。下面简要说明一下其中重要的几个规则的使用场景:

目标列子查询提升参数intargetlist

通过将目标列中子查询提升,转为JOIN,往往可以极大提升查询性能。举例如下查询:

  1. postgres=# set rewrite_rule='none';
  2. SET
  3. postgres=# create table t1(c1 int,c2 int);
  4. CREATE TABLE
  5. postgres=# create table t2(c1 int,c2 int);
  6. CREATE TABLE
  7. postgres=# explain (verbose on, costs off) select c1,(select avg(c2) from t2 where t2.c2=t1.c2) from t1 where t1.c1<100 order by t1.c2;
  8. QUERY PLAN
  9. -----------------------------------------------
  10. Sort
  11. Output: t1.c1, ((SubPlan 1)), t1.c2
  12. Sort Key: t1.c2
  13. -> Seq Scan on public.t1
  14. Output: t1.c1, (SubPlan 1), t1.c2
  15. Filter: (t1.c1 < 100)
  16. SubPlan 1
  17. -> Aggregate
  18. Output: avg(t2.c2)
  19. -> Seq Scan on public.t2
  20. Output: t2.c1, t2.c2
  21. Filter: (t2.c2 = t1.c2)
  22. (12 rows)

由于目标列中的相关子查询(select avg(c2) from t2 where t2.c2=t1.c2)无法提升的缘故,导致每扫描t1的一行数据,就会触发子查询的一次执行,效率低下。如果打开intargetlist参数会把子查询提升转为JOIN,来提升查询的性能:

  1. postgres=# set rewrite_rule='intargetlist';
  2. SET
  3. postgres=# explain (verbose on, costs off) select c1,(select avg(c2) from t2 where t2.c2=t1.c2) from t1 where t1.c1<100 order by t1.c2;
  4. QUERY PLAN
  5. -----------------------------------------------
  6. Sort
  7. Output: t1.c1, (avg(t2.c2)), t1.c2
  8. Sort Key: t1.c2
  9. -> Hash Left Join
  10. Output: t1.c1, (avg(t2.c2)), t1.c2
  11. Hash Cond: (t1.c2 = t2.c2)
  12. -> Seq Scan on public.t1
  13. Output: t1.c1, t1.c2
  14. Filter: (t1.c1 < 100)
  15. -> Hash
  16. Output: (avg(t2.c2)), t2.c2
  17. -> HashAggregate
  18. Output: avg(t2.c2), t2.c2
  19. Group By Key: t2.c2
  20. -> Seq Scan on public.t2
  21. Output: t2.c2
  22. (16 rows)

提升无agg的子查询uniquecheck

子链接提升需要保证对于每个条件只有一行输出,对于有agg的子查询可以自动提升,对于无agg的子查询如:

select t1.c1 from t1 where t1.c1 = (select t2.c1 from t2 where t1.c1=t2.c2) ;

重写为:

select t1.c1 from t1 join (select t2.c1 from t2 where t2.c1 is not null group by t2.c1(unique check)) tt(c1) on tt.c1=t1.c1;

为了保证语义等价,子查询tt必须保证对于每个group by t2.c1只能有一行输出。打开uniquecheck查询重写参数保证可以提升并且等价,如果在运行时输出了多于一行的数据,就会报错。

  1. postgres=# set rewrite_rule='uniquecheck';
  2. SET
  3. postgres=# explain verbose select t1.c1 from t1 where t1.c1 = (select t2.c1 from t2 where t1.c1=t2.c1);
  4. QUERY PLAN
  5. -------------------------------------------------------------------------------------
  6. Hash Join (cost=43.36..104.40 rows=2149 distinct=[200, 200] width=4)
  7. Output: t1.c1
  8. Hash Cond: (t1.c1 = subquery."?column?")
  9. -> Seq Scan on public.t1 (cost=0.00..31.49 rows=2149 width=4)
  10. Output: t1.c1, t1.c2
  11. -> Hash (cost=40.86..40.86 rows=200 width=8)
  12. Output: subquery."?column?", subquery.c1
  13. -> Subquery Scan on subquery (cost=36.86..40.86 rows=200 width=8)
  14. Output: subquery."?column?", subquery.c1
  15. -> HashAggregate (cost=36.86..38.86 rows=200 width=4)
  16. Output: t2.c1, t2.c1
  17. Group By Key: t2.c1
  18. Filter: (t2.c1 IS NOT NULL)
  19. Unique Check Required
  20. -> Seq Scan on public.t2 (cost=0.00..31.49 rows=2149 width=4)
  21. Output: t2.c1
  22. (16 rows)

注意:因为分组group by t2.c1 unique check发生在过滤条件tt.c1=t1.c1之前,可能导致原来不报错的查询重写之后报错。举例:

有t1,t2表,其中的数据为:

  1. postgres=# select * from t1 order by c2;
  2. c1 | c2
  3. ----+----
  4. 1 | 1
  5. 2 | 2
  6. 3 | 3
  7. (3 rows)
  8. postgres=# select * from t2 order by c2;
  9. c1 | c2
  10. ----+----
  11. 1 | 1
  12. 2 | 2
  13. 3 | 3
  14. 4 | 4
  15. 4 | 4
  16. 5 | 5
  17. (6 rows)

分别关闭和打开uniquecheck参数对比,打开之后报错。

  1. postgres=# select t1.c1 from t1 where t1.c1 = (select t2.c1 from t2 where t1.c1=t2.c2) ;
  2. c1
  3. ----
  4. 1
  5. 2
  6. 3
  7. (3 rows)
  8. postgres=# set rewrite_rule='uniquecheck';
  9. SET
  10. postgres=# select t1.c1 from t1 where t1.c1 = (select t2.c1 from t2 where t1.c1=t2.c2) ;
  11. ERROR: more than one row returned by a subquery used as an expression