背景知识

数据库优化器需要具备逻辑推理能力,而且越强越好,为什么呢? 举一些例子, 通过已知的一个人讲的是真话,推理另一个人讲的一定是真话或一定是假话。

例子1: 假设预先提供了 a > 10 是真话 可以推理出 a < 1 一定是假话 例子2: 假设预先提供了 a > 10 是真话 无法推理出 a < 100 一定是真话或假话 例子3: 假设预先提供了 a 是空 是真话 可以推理出 a 不是空 一定是假话 例子4: 假设预先提供了 a <>100 是真话 可以推理出 a =100 一定是假话 例子5: 假设预先提供了 a >100 是真话 可以推理出 a >1 一定是真话 例子6: 假设预先提供了 a 的坐标位置在中国 是真话 可以推理出 a 的坐标位置在浙江杭州 一定是真话 例子7: 假设预先提供了 平面中 坐标A和坐标(1,100)的距离小于100 是真话 是否推理出 坐标A和坐标(100,100)的距离小于1000 一定是真话或假话?

总结一下以上逻辑推理,首先要提供已知真假的一个表达式,然后推理另一个表达式的真假。推理可以得出的结论是真、或者假、或者不知道真假。 对于推理出来的结果一定是真或者一定是假的情况,数据库可以利用它来减少后期的处理。

PostgreSQL 逻辑推理的例子

逻辑推理能力体现在优化器生成查询树之前。例如:

  1. create table tab(id int check (id >=0), info text, crt_time timestamp);
  2. select * from tab where id<0;

以上已知为真的表达式是id>=0,通过这个表达式能推理出SQL中给出的表达式 id<0 一定是假。 优化器在执行这条SQL时,可以省去扫描表然后再过滤id<0的行,而是构造结构,并直接返回0条记录。 执行计划如下:

  1. digoal=# create table ta(id int check (id >=0), info text, crt_time timestamp);
  2. CREATE TABLE
  3. digoal=# explain select * from ta where id=-1;
  4. QUERY PLAN
  5. ----------------------------------------------------
  6. Seq Scan on ta (cost=0.00..24.12 rows=6 width=44)
  7. Filter: (id = '-1'::integer)
  8. (2 rows)

以上查询貌似并没有优化,还是扫描了表,原因是constraint_exclusion参数默认值对UNION ALL和分区表开启这种逻辑推理检查。 将constraint_exclusion 改为ON即可对所有表进行逻辑推理检查。

  1. digoal=# set constraint_exclusion =on;
  2. SET
  3. digoal=# explain select * from ta where id=-1; -- 现在不需要扫描表了
  4. QUERY PLAN
  5. ------------------------------------------
  6. Result (cost=0.00..0.01 rows=1 width=0)
  7. One-Time Filter: false
  8. (2 rows)
  9. digoal=# explain select * from ta where id<-1; -- 现在不需要扫描表了
  10. QUERY PLAN
  11. ------------------------------------------
  12. Result (cost=0.00..0.01 rows=1 width=0)
  13. One-Time Filter: false
  14. (2 rows)

对于无法推理出一定为假的条件,还是需要扫描表的,例如 id<>0。

  1. postgres=# explain select * from ta where id<>0;
  2. QUERY PLAN
  3. -------------------------------------------------------
  4. Seq Scan on ta (cost=0.00..24.12 rows=1124 width=44)
  5. Filter: (id <> 0)
  6. (2 rows)

对于提供的表达式与已知的表达式操作符左侧不一致的,目前PG的优化器没有做到这么智能,例如 id+1<10,id+1<0,优化器不会对这种表达式进行逻辑推理,后面我会在代码中分析这块。 ps: 这里给PG内核爱好者一个题目, 让PG支持以上这种情况的逻辑推理。

  1. postgres=# explain select * from ta where id+1<10; -- 未推理成功
  2. QUERY PLAN
  3. ------------------------------------------------------
  4. Seq Scan on ta (cost=0.00..26.95 rows=377 width=44)
  5. Filter: ((id + 1) < 10)
  6. (2 rows)
  7. postgres=# explain select * from ta where id+1<0; -- 未推理成功
  8. QUERY PLAN
  9. ------------------------------------------------------
  10. Seq Scan on ta (cost=0.00..26.95 rows=377 width=44)
  11. Filter: ((id + 1) < 0)
  12. (2 rows)

id+1<0 是可以转换为 id< 0-1的 ,对于以下表达式,PG进行了推理,原因是-操作符是一个immutable操作符,0-1可以转为常数-1从而可以进行推理。

  1. postgres=# explain select * from ta where id<0-1;
  2. QUERY PLAN
  3. ------------------------------------------
  4. Result (cost=0.00..0.01 rows=1 width=0)
  5. One-Time Filter: false
  6. (2 rows)

PostgreSQL 支持哪些逻辑推理

目前PostgreSQL数据库支持哪些逻辑推理呢?

  1. 约束中包含的表达式的操作符必须是B-tree-indexable operators(或者is null, or , is not null),也就是可以被btree索引用于检索操作符,例如<,<=,=,>,>=以及<> (<>不能直接被索引使用,但是可以转换为< OR >来使用索引);
  2. SQL语句where字句中提供的表达式,同样操作符必须是B-tree-indexable operators;
  3. SQL语句where字句中提供的表达式,操作符左侧的操作数必须与约束中的操作数完全一致。

例如约束为(check mod(id,4) = 0),SQL where字句提供的表达式则必须为 mod(id,4) op? ? 这种形式才会进行推理。 又如约束为(check id*100 > 1000),SQL where字句提供的表达式则必须为 id*100 op? ? 这种形式才会进行推理。 又如约束为(check id+10 between 1000 and 10000),SQL where字句提供的表达式则必须为 id+10 op? ? 这种形式才会进行推理。( PostgreSQL 的 between and 会转换为>= and <=,属于B-tree-indexable operators ) 又如约束为(check id between 1000 and 10000),SQL where字句提供的表达式则必须为 id op? ? 这种形式才会进行推理。

重要的事情说三遍,btree, btree, btree。

例子: 约束为is [not] null类型

  1. postgres=# create table tt1(id int check (id is null));
  2. CREATE TABLE
  3. postgres=# explain select * from tt1 where id=1;
  4. QUERY PLAN
  5. ------------------------------------------
  6. Result (cost=0.00..0.01 rows=1 width=0)
  7. One-Time Filter: false
  8. (2 rows)
  9. postgres=# explain select * from tt1 where id is null;
  10. QUERY PLAN
  11. -----------------------------------------------------
  12. Seq Scan on tt1 (cost=0.00..35.50 rows=13 width=4)
  13. Filter: (id IS NULL)
  14. (2 rows)
  15. postgres=# explain select * from tt1 where id is not null;
  16. QUERY PLAN
  17. ------------------------------------------
  18. Result (cost=0.00..0.01 rows=1 width=0)
  19. One-Time Filter: false
  20. (2 rows)

约束为 mod(id,4) = 0,=为B-tree-indexable operators

  1. postgres=# create table tt2( id int check(mod(id,4) = 0));
  2. CREATE TABLE
  3. postgres=# explain select * from tt2 where id=1;
  4. QUERY PLAN
  5. -----------------------------------------------------
  6. Seq Scan on tt2 (cost=0.00..41.88 rows=13 width=4)
  7. Filter: (id = 1)
  8. (2 rows)
  9. -- 要让PG进行逻辑推理,WHERE中必须包含mod(id,4)表达式,并且由于modimmutable函数,mod(1,4)可以转换为常数,因此以下SQL相当于
  10. explain select * from tt2 where mod(id,4)=1 and id=1; 这样才可以被逻辑推理。
  11. postgres=# explain select * from tt2 where mod(id,4)=mod(1,4) and id=1;
  12. QUERY PLAN
  13. ------------------------------------------
  14. Result (cost=0.00..0.01 rows=1 width=0)
  15. One-Time Filter: false
  16. (2 rows)

约束为 id*100 > 1000,>为B-tree-indexable operators

  1. postgres=# create table tt3( id int check(id*100 > 1000));
  2. CREATE TABLE
  3. postgres=# explain select * from tt3 where id=1;
  4. QUERY PLAN
  5. -----------------------------------------------------
  6. Seq Scan on tt3 (cost=0.00..41.88 rows=13 width=4)
  7. Filter: (id = 1)
  8. (2 rows)
  9. -- 要让PG进行逻辑推理,WHERE中必须包含id*100表达式,并且*是immutable操作符,所以1*100可以替换为常数。从而进行逻辑推理。
  10. postgres=# explain select * from tt3 where id=1 and id*100=1*100;
  11. QUERY PLAN
  12. ------------------------------------------
  13. Result (cost=0.00..0.01 rows=1 width=0)
  14. One-Time Filter: false
  15. (2 rows)

约束为 id+10 between 1000 and 10000,between and 自动转换为>=和and <=。并且WHERE中必须包含id+10表达式,同时>=或<=是B-tree-indexable operators。

  1. postgres=# create table tt4( id int check(id+10 between 1000 and 10000));
  2. CREATE TABLE
  3. postgres=# explain select * from tt4 where id=1;
  4. QUERY PLAN
  5. -----------------------------------------------------
  6. Seq Scan on tt4 (cost=0.00..41.88 rows=13 width=4)
  7. Filter: (id = 1)
  8. (2 rows)
  9. postgres=# explain select * from tt4 where id=1 and id+10=1+10; -- +是immutable操作符1+10将转换为11常数。
  10. QUERY PLAN
  11. ------------------------------------------
  12. Result (cost=0.00..0.01 rows=1 width=0)
  13. One-Time Filter: false
  14. (2 rows)

约束为 check id between 1000 and 10000

  1. postgres=# create table tt5( id int check(id between 1000 and 10000));
  2. CREATE TABLE
  3. postgres=# explain select * from tt5 where id=1;
  4. QUERY PLAN
  5. ------------------------------------------
  6. Result (cost=0.00..0.01 rows=1 width=0)
  7. One-Time Filter: false
  8. (2 rows)
  9. postgres=# explain select * from tt5 where id+1=1;
  10. QUERY PLAN
  11. -----------------------------------------------------
  12. Seq Scan on tt5 (cost=0.00..48.25 rows=13 width=4)
  13. Filter: ((id + 1) = 1)
  14. (2 rows)
  15. postgres=# explain select * from tt5 where 1=id;
  16. QUERY PLAN
  17. ------------------------------------------
  18. Result (cost=0.00..0.01 rows=1 width=0)
  19. One-Time Filter: false
  20. (2 rows)
  21. postgres=# explain select * from tt5 where 1>id;
  22. QUERY PLAN
  23. ------------------------------------------
  24. Result (cost=0.00..0.01 rows=1 width=0)
  25. One-Time Filter: false
  26. (2 rows)
  27. postgres=# explain select * from tt5 where 1<id;
  28. QUERY PLAN
  29. ------------------------------------------------------
  30. Seq Scan on tt5 (cost=0.00..41.88 rows=850 width=4)
  31. Filter: (1 < id)
  32. (2 rows)

PostgreSQL 是如何实现逻辑推理的

PostgreSQL数据库是如何实现这些逻辑推理的呢?

上面的例子,都转换成了?1 op ?2,其中 ?1 是一个表达式或字段,?2是一个常数。但是,数据库是怎么通过一个条件的真伪判断另一个条件的真伪呢?还是回到一个例子: check id > 100,推理 id > 1 是真是假?可以通过比较两个常数来决定,100 >= 1 为真则说明 id>1为真。 为什么要比较这两个常数呢?因为这是优化器排除对表的扫描的一种手段,这时还没有到需要用到id值的阶段。所以此时优化器只能通过常数来推理。

目前PG只实现了对btree索引可以用到的操作符的逻辑推理,使用了两张映射表来描述推理关系。 一张表BT_implic_table 用来推理一定为真,另一张表BT_refute_table 用来推理一定为假。 例如: 已知 ATTR given_op CONST1 为真 如果 CONST2 test_op CONST1 为真 则推理得出 ATTR target_op CONST2 一定为真 其中 test_op = BT_implic_table[given_op-1][target_op-1] 就是通过BT_implic_table 映射表取出的操作符。

已知 ATTR given_op CONST1 为真 如果 CONST2 test_op CONST1 为假 则推理得出 ATTR target_op CONST2 一定为假 其中 test_op = BT_refute_table[given_op-1][target_op-1] 就是通过BT_refute_table 映射表取出的操作符。 代码:

  1. /*
  2. * Define an "operator implication table" for btree operators ("strategies"),
  3. * and a similar table for refutation.
  4. *
  5. * The strategy numbers defined by btree indexes (see access/skey.h) are:
  6. * (1) < (2) <= (3) = (4) >= (5) >
  7. * and in addition we use (6) to represent <>. <> is not a btree-indexable
  8. * operator, but we assume here that if an equality operator of a btree
  9. * opfamily has a negator operator, the negator behaves as <> for the opfamily.
  10. * (This convention is also known to get_op_btree_interpretation().)
  11. *
  12. * The interpretation of:
  13. *
  14. * test_op = BT_implic_table[given_op-1][target_op-1]
  15. *
  16. * where test_op, given_op and target_op are strategy numbers (from 1 to 6)
  17. * of btree operators, is as follows:
  18. *
  19. * If you know, for some ATTR, that "ATTR given_op CONST1" is true, and you
  20. * want to determine whether "ATTR target_op CONST2" must also be true, then
  21. * you can use "CONST2 test_op CONST1" as a test. If this test returns true,
  22. * then the target expression must be true; if the test returns false, then
  23. * the target expression may be false.
  24. *
  25. * For example, if clause is "Quantity > 10" and pred is "Quantity > 5"
  26. * then we test "5 <= 10" which evals to true, so clause implies pred.
  27. *
  28. * Similarly, the interpretation of a BT_refute_table entry is:
  29. *
  30. * If you know, for some ATTR, that "ATTR given_op CONST1" is true, and you
  31. * want to determine whether "ATTR target_op CONST2" must be false, then
  32. * you can use "CONST2 test_op CONST1" as a test. If this test returns true,
  33. * then the target expression must be false; if the test returns false, then
  34. * the target expression may be true.
  35. *
  36. * For example, if clause is "Quantity > 10" and pred is "Quantity < 5"
  37. * then we test "5 <= 10" which evals to true, so clause refutes pred.
  38. *
  39. * An entry where test_op == 0 means the implication cannot be determined.
  40. */
  41. #define BTLT BTLessStrategyNumber
  42. #define BTLE BTLessEqualStrategyNumber
  43. #define BTEQ BTEqualStrategyNumber
  44. #define BTGE BTGreaterEqualStrategyNumber
  45. #define BTGT BTGreaterStrategyNumber
  46. #define BTNE ROWCOMPARE_NE
  47. static const StrategyNumber BT_implic_table[6][6] = {
  48. /*
  49. * The target operator:
  50. *
  51. * LT LE EQ GE GT NE
  52. */
  53. {BTGE, BTGE, 0, 0, 0, BTGE}, /* LT */
  54. {BTGT, BTGE, 0, 0, 0, BTGT}, /* LE */
  55. {BTGT, BTGE, BTEQ, BTLE, BTLT, BTNE}, /* EQ */
  56. {0, 0, 0, BTLE, BTLT, BTLT}, /* GE */
  57. {0, 0, 0, BTLE, BTLE, BTLE}, /* GT */
  58. {0, 0, 0, 0, 0, BTEQ} /* NE */
  59. };
  60. static const StrategyNumber BT_refute_table[6][6] = {
  61. /*
  62. * The target operator:
  63. *
  64. * LT LE EQ GE GT NE
  65. */
  66. {0, 0, BTGE, BTGE, BTGE, 0}, /* LT */
  67. {0, 0, BTGT, BTGT, BTGE, 0}, /* LE */
  68. {BTLE, BTLT, BTNE, BTGT, BTGE, BTEQ}, /* EQ */
  69. {BTLE, BTLT, BTLT, 0, 0, 0}, /* GE */
  70. {BTLE, BTLE, BTLE, 0, 0, 0}, /* GT */
  71. {0, 0, BTEQ, 0, 0, 0} /* NE */
  72. };

这两个表里面的0,表示无法推断真或假的情况。例如通过 a>100 无法推断 a>? 一定为假, 只能推断 a>? 一定为真。 通过100, ?, 以及 test_op 来推断,而test_op就是从BT_implic_table表中取出的BTLE即<=,因此判断的依据是 ? <= 100 为真则a>? 一定为真。

PostgreSQL通过get_btree_test_op 获得test_op,代码如下:

  1. get_btree_test_op
  2. /*
  3. * Look up the "test" strategy number in the implication table
  4. */
  5. if (refute_it)
  6. test_strategy = BT_refute_table[clause_strategy - 1][pred_strategy - 1];
  7. else
  8. test_strategy = BT_implic_table[clause_strategy - 1][pred_strategy - 1];
  9. if (test_strategy == 0)
  10. {
  11. /* Can't determine implication using this interpretation */
  12. continue;
  13. }
  14. /*
  15. * See if opfamily has an operator for the test strategy and the
  16. * datatypes.
  17. */
  18. if (test_strategy == BTNE)
  19. {
  20. test_op = get_opfamily_member(opfamily_id,
  21. pred_op_info->oprighttype,
  22. clause_op_info->oprighttype,
  23. BTEqualStrategyNumber);
  24. if (OidIsValid(test_op))
  25. test_op = get_negator(test_op);
  26. }
  27. else
  28. {
  29. test_op = get_opfamily_member(opfamily_id,
  30. pred_op_info->oprighttype,
  31. clause_op_info->oprighttype,
  32. test_strategy);
  33. }
  34. if (!OidIsValid(test_op))
  35. continue;
  36. return test_op;

PostgreSQL 逻辑推理的用处

那么PostgreSQL可以利用这些逻辑推理来做什么呢? 通过推断 “一定为假” 来排除哪些表不需要参与到执行计划。直接排除掉。

_1

通过推断 “一定对真” ,可以用在建立执行计划的过程中。

_

以一定为假为例,我们看看PostgreSQL优化器如何排除哪些表是不需要参与执行计划的。 constraint_exclusion参数控制的逻辑推理应用,可以看到调用栈如下:

relation_excluded_by_constraints 返回 true 表示不需要扫描这个表,返回 false 表示需要扫描这个表。简单分析一下这个函数的代码,未开启constraint_exclusion时,不进行逻辑推理。

  1. /* Skip the test if constraint exclusion is disabled for the rel */
  2. if (constraint_exclusion == CONSTRAINT_EXCLUSION_OFF ||
  3. (constraint_exclusion == CONSTRAINT_EXCLUSION_PARTITION &&
  4. !(rel->reloptkind == RELOPT_OTHER_MEMBER_REL ||
  5. (root->hasInheritedTarget &&
  6. rel->reloptkind == RELOPT_BASEREL &&
  7. rel->relid == root->parse->resultRelation))))
  8. return false;

在检查表自身的约束和SQL提供的where条件前,先检查where 条件是否有自相矛盾的。例如:

  1. id <> mod(4,3) and id = mod(4,3)
  2. postgres=# \d+ tt11
  3. Table "public.tt11"
  4. Column | Type | Modifiers | Storage | Stats target | Description
  5. --------+---------+-----------+---------+--------------+-------------
  6. id | integer | | plain | |
  7. postgres=# explain (analyze,verbose) select * from tt11 where id<>mod(4,3) and id=mod(4,3);
  8. QUERY PLAN
  9. ------------------------------------------------------------------------------------
  10. Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=0 loops=1)
  11. Output: id
  12. One-Time Filter: false
  13. Planning time: 0.051 ms
  14. Execution time: 0.012 ms
  15. (5 rows)

代码如下:

  1. /*
  2. * Check for self-contradictory restriction clauses. We dare not make
  3. * deductions with non-immutable functions, but any immutable clauses that
  4. * are self-contradictory allow us to conclude the scan is unnecessary.
  5. *
  6. * Note: strip off RestrictInfo because predicate_refuted_by() isn't
  7. * expecting to see any in its predicate argument.
  8. */
  9. safe_restrictions = NIL;
  10. foreach(lc, rel->baserestrictinfo)
  11. {
  12. RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
  13. if (!contain_mutable_functions((Node *) rinfo->clause))
  14. safe_restrictions = lappend(safe_restrictions, rinfo->clause);
  15. }
  16. if (predicate_refuted_by(safe_restrictions, safe_restrictions))
  17. return true;
  18. // 从SQL涉及的表,以及继承表中获取约束
  19. /* Only plain relations have constraints */
  20. if (rte->rtekind != RTE_RELATION || rte->inh)
  21. return false;
  22. /*
  23. * OK to fetch the constraint expressions. Include "col IS NOT NULL"
  24. * expressions for attnotnull columns, in case we can refute those.
  25. */
  26. constraint_pred = get_relation_constraints(root, rte->relid, rel, true);
  27. /*
  28. * We do not currently enforce that CHECK constraints contain only
  29. * immutable functions, so it's necessary to check here. We daren't draw
  30. * conclusions from plan-time evaluation of non-immutable functions. Since
  31. * they're ANDed, we can just ignore any mutable constraints in the list,
  32. * and reason about the rest.
  33. */
  34. safe_constraints = NIL;
  35. foreach(lc, constraint_pred)
  36. {
  37. Node *pred = (Node *) lfirst(lc);
  38. // 包含非immutable函数的表达式不加入推理判断,因为非immutable函数存在变数,不能转常量
  39. if (!contain_mutable_functions(pred))
  40. safe_constraints = lappend(safe_constraints, pred);
  41. }
  42. /*
  43. * The constraints are effectively ANDed together, so we can just try to
  44. * refute the entire collection at once. This may allow us to make proofs
  45. * that would fail if we took them individually.
  46. *
  47. * Note: we use rel->baserestrictinfo, not safe_restrictions as might seem
  48. * an obvious optimization. Some of the clauses might be OR clauses that
  49. * have volatile and nonvolatile subclauses, and it's OK to make
  50. * deductions with the nonvolatile parts.
  51. */
  52. // 检测是否一定为假,如果一定为假,则不需要扫描这个表。
  53. if (predicate_refuted_by(safe_constraints, rel->baserestrictinfo))
  54. return true;

调用栈如下:

  1. predicate_refuted_by
  2. predicate_refuted_by_recurse
  3. predicate_refuted_by_simple_clause
  4. return btree_predicate_proof(predicate, clause, true)
  5. btree_predicate_proof@src/backend/optimizer/util/predtest.c
  6. /*
  7. * Lookup the comparison operator using the system catalogs and the
  8. * operator implication tables.
  9. */
  10. test_op = get_btree_test_op(pred_op, clause_op, refute_it);

PostgreSQL 支持逻辑推理的操作符汇总

目前PostgreSQL仅仅支持有限操作符的逻辑推理,这些操作符必须是btree-indexable operator。

  1. postgres=# select oprname,oprcode from pg_operator where oid in (select amopopr from pg_amop where amopmethod=(select oid from pg_am where amname='btree'));
  2. oprname | oprcode
  3. ---------+--------------------------
  4. = | int48eq
  5. < | int48lt
  6. > | int48gt
  7. <= | int48le
  8. >= | int48ge
  9. < | boollt
  10. > | boolgt
  11. = | booleq
  12. <= | boolle
  13. >= | boolge
  14. = | chareq
  15. = | nameeq
  16. = | int2eq
  17. < | int2lt
  18. = | int4eq
  19. < | int4lt
  20. = | texteq
  21. = | tideq
  22. < | tidlt
  23. > | tidgt
  24. <= | tidle
  25. >= | tidge
  26. = | int8eq
  27. < | int8lt
  28. > | int8gt
  29. <= | int8le
  30. >= | int8ge
  31. = | int84eq
  32. < | int84lt
  33. > | int84gt
  34. <= | int84le
  35. >= | int84ge
  36. > | int2gt
  37. > | int4gt
  38. <= | int2le
  39. <= | int4le
  40. >= | int2ge
  41. >= | int4ge
  42. = | int24eq
  43. = | int42eq
  44. < | int24lt
  45. < | int42lt
  46. > | int24gt
  47. > | int42gt
  48. <= | int24le
  49. <= | int42le
  50. >= | int24ge
  51. >= | int42ge
  52. = | abstimeeq
  53. < | abstimelt
  54. > | abstimegt
  55. <= | abstimele
  56. >= | abstimege
  57. = | reltimeeq
  58. < | reltimelt
  59. > | reltimegt
  60. <= | reltimele
  61. >= | reltimege
  62. = | oideq
  63. < | oidlt
  64. > | oidgt
  65. <= | oidle
  66. >= | oidge
  67. < | oidvectorlt
  68. > | oidvectorgt
  69. <= | oidvectorle
  70. >= | oidvectorge
  71. = | oidvectoreq
  72. = | float4eq
  73. < | float4lt
  74. > | float4gt
  75. <= | float4le
  76. >= | float4ge
  77. < | charlt
  78. <= | charle
  79. > | chargt
  80. >= | charge
  81. < | namelt
  82. <= | namele
  83. > | namegt
  84. >= | namege
  85. < | text_lt
  86. <= | text_le
  87. > | text_gt
  88. >= | text_ge
  89. = | float8eq
  90. < | float8lt
  91. <= | float8le
  92. > | float8gt
  93. >= | float8ge
  94. = | tintervaleq
  95. < | tintervallt
  96. > | tintervalgt
  97. <= | tintervalle
  98. >= | tintervalge
  99. = | cash_eq
  100. < | cash_lt
  101. > | cash_gt
  102. <= | cash_le
  103. >= | cash_ge
  104. = | bpchareq
  105. < | bpcharlt
  106. <= | bpcharle
  107. > | bpchargt
  108. >= | bpcharge
  109. = | array_eq
  110. < | array_lt
  111. > | array_gt
  112. <= | array_le
  113. >= | array_ge
  114. = | date_eq
  115. < | date_lt
  116. <= | date_le
  117. > | date_gt
  118. >= | date_ge
  119. = | time_eq
  120. < | time_lt
  121. <= | time_le
  122. > | time_gt
  123. >= | time_ge
  124. = | timetz_eq
  125. < | timetz_lt
  126. <= | timetz_le
  127. > | timetz_gt
  128. >= | timetz_ge
  129. = | float48eq
  130. < | float48lt
  131. > | float48gt
  132. <= | float48le
  133. >= | float48ge
  134. = | float84eq
  135. < | float84lt
  136. > | float84gt
  137. <= | float84le
  138. >= | float84ge
  139. = | timestamptz_eq
  140. < | timestamptz_lt
  141. <= | timestamptz_le
  142. > | timestamptz_gt
  143. >= | timestamptz_ge
  144. = | interval_eq
  145. < | interval_lt
  146. <= | interval_le
  147. > | interval_gt
  148. >= | interval_ge
  149. = | macaddr_eq
  150. < | macaddr_lt
  151. <= | macaddr_le
  152. > | macaddr_gt
  153. >= | macaddr_ge
  154. = | network_eq
  155. < | network_lt
  156. <= | network_le
  157. > | network_gt
  158. >= | network_ge
  159. = | numeric_eq
  160. < | numeric_lt
  161. <= | numeric_le
  162. > | numeric_gt
  163. >= | numeric_ge
  164. = | biteq
  165. < | bitlt
  166. > | bitgt
  167. <= | bitle
  168. >= | bitge
  169. = | varbiteq
  170. < | varbitlt
  171. > | varbitgt
  172. <= | varbitle
  173. >= | varbitge
  174. = | int28eq
  175. < | int28lt
  176. > | int28gt
  177. <= | int28le
  178. >= | int28ge
  179. = | int82eq
  180. < | int82lt
  181. > | int82gt
  182. <= | int82le
  183. >= | int82ge
  184. = | byteaeq
  185. < | bytealt
  186. <= | byteale
  187. > | byteagt
  188. >= | byteage
  189. = | timestamp_eq
  190. < | timestamp_lt
  191. <= | timestamp_le
  192. > | timestamp_gt
  193. >= | timestamp_ge
  194. ~<~ | text_pattern_lt
  195. ~<=~ | text_pattern_le
  196. ~>=~ | text_pattern_ge
  197. ~>~ | text_pattern_gt
  198. ~<~ | bpchar_pattern_lt
  199. ~<=~ | bpchar_pattern_le
  200. ~>=~ | bpchar_pattern_ge
  201. ~>~ | bpchar_pattern_gt
  202. < | date_lt_timestamp
  203. <= | date_le_timestamp
  204. = | date_eq_timestamp
  205. >= | date_ge_timestamp
  206. > | date_gt_timestamp
  207. < | date_lt_timestamptz
  208. <= | date_le_timestamptz
  209. = | date_eq_timestamptz
  210. >= | date_ge_timestamptz
  211. > | date_gt_timestamptz
  212. < | timestamp_lt_date
  213. <= | timestamp_le_date
  214. = | timestamp_eq_date
  215. >= | timestamp_ge_date
  216. > | timestamp_gt_date
  217. < | timestamptz_lt_date
  218. <= | timestamptz_le_date
  219. = | timestamptz_eq_date
  220. >= | timestamptz_ge_date
  221. > | timestamptz_gt_date
  222. < | timestamp_lt_timestamptz
  223. <= | timestamp_le_timestamptz
  224. = | timestamp_eq_timestamptz
  225. >= | timestamp_ge_timestamptz
  226. > | timestamp_gt_timestamptz
  227. < | timestamptz_lt_timestamp
  228. <= | timestamptz_le_timestamp
  229. = | timestamptz_eq_timestamp
  230. >= | timestamptz_ge_timestamp
  231. > | timestamptz_gt_timestamp
  232. = | uuid_eq
  233. < | uuid_lt
  234. > | uuid_gt
  235. <= | uuid_le
  236. >= | uuid_ge
  237. = | pg_lsn_eq
  238. < | pg_lsn_lt
  239. > | pg_lsn_gt
  240. <= | pg_lsn_le
  241. >= | pg_lsn_ge
  242. = | enum_eq
  243. < | enum_lt
  244. > | enum_gt
  245. <= | enum_le
  246. >= | enum_ge
  247. < | tsvector_lt
  248. <= | tsvector_le
  249. = | tsvector_eq
  250. >= | tsvector_ge
  251. > | tsvector_gt
  252. < | tsquery_lt
  253. <= | tsquery_le
  254. = | tsquery_eq
  255. >= | tsquery_ge
  256. > | tsquery_gt
  257. = | record_eq
  258. < | record_lt
  259. > | record_gt
  260. <= | record_le
  261. >= | record_ge
  262. *= | record_image_eq
  263. *< | record_image_lt
  264. *> | record_image_gt
  265. *<= | record_image_le
  266. *>= | record_image_ge
  267. = | range_eq
  268. < | range_lt
  269. <= | range_le
  270. >= | range_ge
  271. > | range_gt
  272. = | jsonb_eq
  273. < | jsonb_lt
  274. > | jsonb_gt
  275. <= | jsonb_le
  276. >= | jsonb_ge
  277. (273 rows)

PostgreSQL 不能进行逻辑推理的场景及优化思路

除此以外的操作符,不参与逻辑推理。 例如:我们知道geo严格在坐标10,0的左边,肯定能推理出它不可能在11,0的右边,正常情况下是可以排除对这个表的扫描的。但是由于«,»不是btree operator,所以不参与推理。

  1. postgres=# create table tt13(id int, geo point check(geo << point '(10,0)'));
  2. CREATE TABLE
  3. postgres=# explain select * from tt13 where geo >> point '(11,0)';
  4. QUERY PLAN
  5. --------------------------------------------------------
  6. Seq Scan on tt13 (cost=0.00..31.25 rows=170 width=20)
  7. Filter: (geo >> '(11,0)'::point)
  8. (2 rows)

这种逻辑推理在分区表的应用中尤为突出,例如: 用户规划了一批分区表,按照ID取模分区。

  1. postgres=# create table p(id int, info text);
  2. CREATE TABLE
  3. postgres=# create table t0(id int check(abs(mod(id,4))=0), info text);
  4. CREATE TABLE
  5. postgres=# create table t1(id int check(abs(mod(id,4))=1), info text);
  6. CREATE TABLE
  7. postgres=# create table t2(id int check(abs(mod(id,4))=2), info text);
  8. CREATE TABLE
  9. postgres=# create table t3(id int check(abs(mod(id,4))=3), info text);
  10. CREATE TABLE
  11. postgres=# alter table t0 inherit p;
  12. ALTER TABLE
  13. postgres=# alter table t1 inherit p;
  14. ALTER TABLE
  15. postgres=# alter table t2 inherit p;
  16. ALTER TABLE
  17. postgres=# alter table t3 inherit p;
  18. ALTER TABLE
  19. postgres=# explain select * from p where id=0; -- id=0 abs(mod(id,4)) = 0,1,2,3由于操作数不一致,不会进行推理。
  20. QUERY PLAN
  21. ----------------------------------------------------------
  22. Append (cost=0.00..103.50 rows=25 width=36)
  23. -> Seq Scan on p (cost=0.00..0.00 rows=1 width=36)
  24. Filter: (id = 0)
  25. -> Seq Scan on t0 (cost=0.00..25.88 rows=6 width=36)
  26. Filter: (id = 0)
  27. -> Seq Scan on t1 (cost=0.00..25.88 rows=6 width=36)
  28. Filter: (id = 0)
  29. -> Seq Scan on t2 (cost=0.00..25.88 rows=6 width=36)
  30. Filter: (id = 0)
  31. -> Seq Scan on t3 (cost=0.00..25.88 rows=6 width=36)
  32. Filter: (id = 0)
  33. (11 rows)
  34. postgres=# explain select * from p where id=0 and abs(mod(id,4)) = abs(mod(0,4)); -- 所以必须带上与约束一致的操作数
  35. QUERY PLAN
  36. ----------------------------------------------------------
  37. Append (cost=0.00..35.40 rows=2 width=36)
  38. -> Seq Scan on p (cost=0.00..0.00 rows=1 width=36)
  39. Filter: ((id = 0) AND (abs(mod(id, 4)) = 0))
  40. -> Seq Scan on t0 (cost=0.00..35.40 rows=1 width=36)
  41. Filter: ((id = 0) AND (abs(mod(id, 4)) = 0))
  42. (5 rows)

如果我们使用的是范围分区,就不存在以上的问题。因为约束中的操作数和WHERE子句中的操作数可以做到一致。

从以上的例子可以了解到,PostgreSQL优化器的逻辑推理能力还可以加强。 只要能推理出一定为假的,就可以被优化器用于排除表。例如一些几何类型的操作符,数组类型的操作符等等。

参考

  1. 分区字段的分区方法,这种方法对应的函数或操作符必须是immutable的,同时尽量以字段加btree operator来分区,方便写SQL,如果做不到,那么SQL中必须带上原样的表达式,同时代入,例如 abs(mod(id,4)) = abs(mod(?,4)) ;
  2. Partitioning and Constraint Exclusion

    1. The following caveats apply to constraint exclusion:
    2. Constraint exclusion only works when the query's WHERE clause contains constants (or externally supplied parameters). For example, a comparison against a non-immutable function such as CURRENT_TIMESTAMP cannot be optimized, since the planner cannot know which partition the function value might fall into at run time.
    3. Keep the partitioning constraints simple, else the planner may not be able to prove that partitions don't need to be visited. Use simple equality conditions for list partitioning, or simple range tests for range partitioning, as illustrated in the preceding examples. A good rule of thumb is that partitioning constraints should contain only comparisons of the partitioning column(s) to constants using B-tree-indexable operators.
    4. All constraints on all partitions of the master table are examined during constraint exclusion, so large numbers of partitions are likely to increase query planning time considerably. Partitioning using these techniques will work well with up to perhaps a hundred partitions; don't try to use many thousands of partitions.
  3. constraint_exclusion

    1. constraint_exclusion (enum)
    2. Controls the query planner's use of table constraints to optimize queries. The allowed values of constraint_exclusion are on (examine constraints for all tables), off (never examine constraints), and partition (examine constraints only for inheritance child tables and UNION ALL subqueries). partition is the default setting. It is often used with inheritance and partitioned tables to improve performance.