分布式执行计划和调度 - 并行查询执行 - 《OceanBase v2.2.50 官方教程》

启用分区表并行查询
启用非分区表并行查询
启用多表并行查询
关闭并行查询
并行执行相关的系统视图

并行执行（Parallel Execution）是将一个较大的任务切分为多个较小的任务，启动多个线程或者进程来并行地处理这些小任务，通过利用更多的 CPU 与 IO 资源来缩短操作的响应时间。

并行执行分为并行查询（Parallel Query），并行 DDL（Parallel DDL），并行 DML（Parallel DML）。目前 OceanBase 仅支持并行查询，并行 DDL 与并行 DML 还未支持。

启动并行查询的方式有两种：

通过 parallel hint 指定并行度（dop）的方式启动并行查询。
针对查询分区数大于1的分区表的查询会自动启动并行查询。

启用分区表并行查询

针对分区表的查询，如果查询的目标分区数大于 1，系统会自动启用并行查询，dop 的值由系统默认指定为 1。

如下例所示，创建一个分区表 ptable，对 ptable 进行全表数据的 scan 操作，通过 explain 查看生成的执行计划。

create table ptable(c1 int , c2 int) partition by hash(c1) partitions 16;
explain select * from ptable\G;
=======================================================
|ID|OPERATOR               |NAME    |EST. ROWS|COST   |
-------------------------------------------------------
|0 |EXCHANGE IN DISTR      |        |1600000  |1246946|
|1 | EXCHANGE OUT DISTR    |:EX10000|1600000  |1095490|
|2 |  PX PARTITION ITERATOR|        |1600000  |1095490|
|3 |   TABLE SCAN          |ptable  |1600000  |1095490|
=======================================================
Outputs & filters:
-------------------------------------
  0 - output([ptable.c1], [ptable.c2]), filter(nil)
  1 - output([ptable.c1], [ptable.c2]), filter(nil), dop=1
  2 - output([ptable.c1], [ptable.c2]), filter(nil)
  3 - output([ptable.c1], [ptable.c2]), filter(nil),
      access([ptable.c1], [ptable.c2]), partitions(p[0-15])

通过执行计划可以看出，分区表默认的并行查询的 dop 为1。如果 OB 集群一共有 3 个 OBServer，表 ptable 的 16 个分区分散在 3 个 OBServer中，那么每一个 OBServer 都会启动一个工作线程（worker thread）来执行分区数据的 scan 工作，一共需要启动 3 个工作线程来执行表的 scan 工作。

针对分区表，添加 parallel hint，启动并行查询，并指定 dop，可以看到生成的执行计划。

explain select /*+ parallel(8) */ * from ptable\G;
=======================================================
|ID|OPERATOR               |NAME    |EST. ROWS|COST   |
-------------------------------------------------------
|0 |EXCHANGE IN DISTR      |        |1600000  |1246946|
|1 | EXCHANGE OUT DISTR    |:EX10000|1600000  |1095490|
|2 |  PX PARTITION ITERATOR|        |1600000  |1095490|
|3 |   TABLE SCAN          |ptable  |1600000  |1095490|
=======================================================
Outputs & filters:
-------------------------------------
  0 - output([ptable.c1], [ptable.c2]), filter(nil)
  1 - output([ptable.c1], [ptable.c2]), filter(nil), dop=8
  2 - output([ptable.c1], [ptable.c2]), filter(nil)
  3 - output([ptable.c1], [ptable.c2]), filter(nil),
      access([ptable.c1], [ptable.c2]), partitions(p[0-15])

通过执行计划可以看出，并行查询的 dop 为 8。如果查询分区所在的 OBServer 的个数小于等于 dop，那么工作线程（总个数等于 dop）会按照一定的策略分配到涉及的 OBServer 上；如果查询分区所在的 OBServer 的个数大于 dop，那么每一个 OBServer 都会至少启动一个工作线程，一共需要启动的工作线程的数目会大于 dop。例如，当 dop=8，如果 16 个分区均匀的分布在 4 台 OBServer 节点上，那么每一个 OBServer 上都会启动 2 个工作线程来 scan 其上对应的分区（一共启动 8 个工作线程）；如果 16 个分区分布在 16 台 OBServer 节点上（每一个节点一个分区），那么每一台 OBServer 上都会启动 1 个工作线程来 scan 其上对应的分区（一共启动 16 个工作线程）。

如果针对分区表的查询，查询分区数目小于等于1，系统不会启动并行查询。如下例所示，对 ptable 的查询添加一个过滤条件 c1=1。

explain select * from ptable where c1 = 1\G;
======================================
|ID|OPERATOR  |NAME  |EST. ROWS|COST |
--------------------------------------
|0 |TABLE SCAN|ptable|990      |85222|
======================================
Outputs & filters:
-------------------------------------
  0 - output([ptable.c1], [ptable.c2]), filter([ptable.c1 = 1]),
      access([ptable.c1], [ptable.c2]), partitions(p1)

通过计划可以看出，查询的目标分区个数为 1，系统没有启动并行查询。如果希望针对一个分区的查询也能够进行并行执行，就只能通过添加 parallel hint 的方式进行分区内并行查询，可以看到如下的执行计划。

explain select /*+ parallel(8) */ * from ptable where c1 = 1\G;
=================================================
|ID|OPERATOR           |NAME    |EST. ROWS|COST |
-------------------------------------------------
|0 |EXCHANGE IN DISTR  |        |990      |85316|
|1 | EXCHANGE OUT DISTR|:EX10000|990      |85222|
|2 |  PX BLOCK ITERATOR|        |990      |85222|
|3 |   TABLE SCAN      |ptable  |990      |85222|
=================================================
Outputs & filters:
-------------------------------------
  0 - output([ptable.c1], [ptable.c2]), filter(nil)
  1 - output([ptable.c1], [ptable.c2]), filter(nil), dop=8
  2 - output([ptable.c1], [ptable.c2]), filter(nil)
  3 - output([ptable.c1], [ptable.c2]), filter([ptable.c1 = 1]),
      access([ptable.c1], [ptable.c2]), partitions(p1)

注意：

如果希望在查询分区数等于 1 的情况下，能够采用 hint 的方式进行分区内并行查询，需要对应的 dop 的值大于等于 2。
如果 dop 的值为空或者小于 2 将不启动并行查询。

启用非分区表并行查询

非分区表本质上是只有 1 个分区的分区表，因此针对非分区表的查询，只能通过添加 parallel hint 的方式启动分区内并行查询，否则不会启动并行查询。

如下例所示，创建一个非分区表 stable，对 stable 进行全表数据的 scan 操作，可以看到生成的执行计划。

create table stable(c1 int, c2 int);
explain select * from stable\G;
======================================
|ID|OPERATOR  |NAME  |EST. ROWS|COST |
--------------------------------------
|0 |TABLE SCAN|stable|100000   |68478|
======================================
Outputs & filters:
-------------------------------------
  0 - output([stable.c1], [stable.c2]), filter(nil),
      access([stable.c1], [stable.c2]), partitions(p0)

通过执行计划可以看出，非分区表不使用 hint 的情况下，不会启动并行查询。

针对非分区表，添加 parallel hint 启动分区内并行查询，并指定 dop（大于等于2），可以看到生成的执行计划。

explain select /*+ parallel(4)*/  * from stable\G;
=================================================
|ID|OPERATOR           |NAME    |EST. ROWS|COST |
-------------------------------------------------
|0 |EXCHANGE IN DISTR  |        |100000   |77944|
|1 | EXCHANGE OUT DISTR|:EX10000|100000   |68478|
|2 |  PX BLOCK ITERATOR|        |100000   |68478|
|3 |   TABLE SCAN      |stable  |100000   |68478|
=================================================
Outputs & filters:
-------------------------------------
  0 - output([stable.c1], [stable.c2]), filter(nil)
  1 - output([stable.c1], [stable.c2]), filter(nil), dop=4
  2 - output([stable.c1], [stable.c2]), filter(nil)
  3 - output([stable.c1], [stable.c2]), filter(nil),
      access([stable.c1], [stable.c2]), partitions(p0)

启用多表并行查询

在查询中，多表 join 查询最为常见。

如下例所示，首先创建两张分区表 p1table，p2table：

create table p1table(c1 int ,c2 int) partition by hash(c1) partitions 2;
create table p2table(c1 int ,c2 int) partition by hash(c1) partitions 4;

查询 p1table 与 p2table 的 join 结果，join 条件是 p1table.c1=p2table.c2 ，执行计划如下：

explain select  * from p1table p1 join p2table p2 on p1.c1=p2.c2\G
====================================================================
|ID|OPERATOR                          |NAME    |EST. ROWS|COST     |
--------------------------------------------------------------------
|0 |EXCHANGE IN DISTR                 |        |784080000|614282633|
|1 | EXCHANGE OUT DISTR               |:EX10001|784080000|465840503|
|2 |  HASH JOIN                       |        |784080000|465840503|
|3 |   EXCHANGE IN DISTR              |        |200000   |155887   |
|4 |    EXCHANGE OUT DISTR (BROADCAST)|:EX10000|200000   |136955   |
|5 |     PX PARTITION ITERATOR        |        |200000   |136955   |
|6 |      TABLE SCAN                  |p1      |200000   |136955   |
|7 |   PX PARTITION ITERATOR          |        |400000   |273873   |
|8 |    TABLE SCAN                    |p2      |400000   |273873   |
====================================================================
Outputs & filters:
-------------------------------------
  0 - output([p1.c1], [p1.c2], [p2.c1], [p2.c2]), filter(nil)
  1 - output([p1.c1], [p1.c2], [p2.c1], [p2.c2]), filter(nil), dop=1
  2 - output([p1.c1], [p1.c2], [p2.c1], [p2.c2]), filter(nil),
      equal_conds([p1.c1 = p2.c2]), other_conds(nil)
  3 - output([p1.c1], [p1.c2]), filter(nil)
  4 - output([p1.c1], [p1.c2]), filter(nil), dop=1
  5 - output([p1.c1], [p1.c2]), filter(nil)
  6 - output([p1.c1], [p1.c2]), filter(nil),
      access([p1.c1], [p1.c2]), partitions(p[0-1])
  7 - output([p2.c1], [p2.c2]), filter(nil)
  8 - output([p2.c1], [p2.c2]), filter(nil),
      access([p2.c1], [p2.c2]), partitions(p[0-3])

默认情况下针对 p1table 与 p2table（两张表需要查询的分区数都大于 1）都会采用并行查询，默认的 dop 为 1。同样，也可以通过使用 parallel hint 的方式来改变并行度。

如下例所示，改变 join 的条件为 p1table.c1=p2table.c2 和 p2table.c1=1，这样针对 p2table 仅仅会选择单个分区，执行计划如下所示：

explain select  * from p1table p1 join p2table p2 on p1.c1=p2.c2 and p2.c1=1\G;
=============================================================
|ID|OPERATOR                     |NAME    |EST. ROWS|COST   |
-------------------------------------------------------------
|0 |EXCHANGE IN DISTR            |        |1940598  |1807515|
|1 | EXCHANGE OUT DISTR          |:EX10001|1940598  |1440121|
|2 |  HASH JOIN                  |        |1940598  |1440121|
|3 |   EXCHANGE IN DISTR         |        |990      |85316  |
|4 |    EXCHANGE OUT DISTR (PKEY)|:EX10000|990      |85222  |
|5 |     TABLE SCAN              |p2      |990      |85222  |
|6 |   PX PARTITION ITERATOR     |        |200000   |136955 |
|7 |    TABLE SCAN               |p1      |200000   |136955 |
=============================================================
Outputs & filters:
-------------------------------------
  0 - output([p1.c1], [p1.c2], [p2.c1], [p2.c2]), filter(nil)
  1 - output([p1.c1], [p1.c2], [p2.c1], [p2.c2]), filter(nil), dop=1
  2 - output([p1.c1], [p1.c2], [p2.c1], [p2.c2]), filter(nil),
      equal_conds([p1.c1 = p2.c2]), other_conds(nil)
  3 - output([p2.c1], [p2.c2]), filter(nil)
  4 - (#keys=1, [p2.c2]), output([p2.c1], [p2.c2]), filter(nil), dop=1
  5 - output([p2.c1], [p2.c2]), filter([p2.c1 = 1]),
      access([p2.c1], [p2.c2]), partitions(p1)
  6 - output([p1.c1], [p1.c2]), filter(nil)
  7 - output([p1.c1], [p1.c2]), filter(nil),
      access([p1.c1], [p1.c2]), partitions(p[0-1])

通过计划可以看出，p2table 仅仅需要 scan 一个分区，在默认情况下不进行并行查询；p1table 需要 scan 两个分区，默认情况下进行并行查询。同样，也可以通过添加 parallel hint 的方式改变并行度，并且使得 p2table 针对一个分区的查询变为分区内并行查询。

关闭并行查询

分区表在查询的时候会自动启动并行查询（查询分区个数大于1），如果不想启动并行查询，可以使用添加 no_use_px hint 来关闭并行查询。例如，针对分区表 ptable，添加 no_use_px hint 来关闭并行查询，可以看到生成的执行计划如下所示，可以看出对 ptable 表的 scan 没有进行并行查询。

explain select /*+ no_use_px */ * from ptable\G;
=================================================
|ID|OPERATOR           |NAME  |EST. ROWS|COST   |
-------------------------------------------------
|0 |EXCHANGE IN DISTR  |      |1600000  |1246946|
|1 | EXCHANGE OUT DISTR|      |1600000  |1095490|
|2 |  TABLE SCAN       |ptable|1600000  |1095490|
=================================================
Outputs & filters:
-------------------------------------
  0 - output([ptable.c1], [ptable.c2]), filter(nil)
  1 - output([ptable.c1], [ptable.c2]), filter(nil)
  2 - output([ptable.c1], [ptable.c2]), filter(nil),
      access([ptable.c1], [ptable.c2]), partitions(p[0-15])

并行执行相关的系统视图

OceanBase 数据库提供了系统视图 gv$sql_audit/v$sql_audit 来查看并行执行的运行状态以及一些统计信息。

gv$sql_audit/v$sql_audit 包含字段较多，其中与并行执行相关的字段为：qc_id，dfo_id，sqc_id，worker_id。