Distinct 优化

本文档介绍可用于 DISTINCT 的优化,包括简单 DISTINCT 和聚合函数 DISTINCT 的优化。

简单 DISTINCT

通常简单的 DISTINCT 会被优化成 GROUP BY 来执行。例如:

  1. mysql> explain select DISTINCT a from t;
  2. +--------------------------+---------+-----------+---------------+-------------------------------------------------------+
  3. | id | estRows | task | access object | operator info |
  4. +--------------------------+---------+-----------+---------------+-------------------------------------------------------+
  5. | HashAgg_6 | 2.40 | root | | group by:test.t.a, funcs:firstrow(test.t.a)->test.t.a |
  6. | └─TableReader_11 | 3.00 | root | | data:TableFullScan_10 |
  7. | └─TableFullScan_10 | 3.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
  8. +--------------------------+---------+-----------+---------------+-------------------------------------------------------+
  9. 3 rows in set (0.00 sec)

聚合函数 DISTINCT

通常来说,带有 DISTINCT 的聚合函数会单线程的在 TiDB 侧执行。使用系统变量 tidb_opt_distinct_agg_push_down 或者 TiDB 的配置项 distinct-agg-push-down 控制优化器是否执行带有 DISTINCT 的聚合函数(比如 select count(distinct a) from t)下推到 Coprocessor 的优化操作。

在以下示例中,tidb_opt_distinct_agg_push_down 开启前,TiDB 需要从 TiKV 读取所有数据,并在 TiDB 侧执行 disctincttidb_opt_distinct_agg_push_down 开启后,distinct a 被下推到了 Coprocessor,在 HashAgg_5 里新增了一个 group bytest.t.a

  1. mysql> desc select count(distinct a) from test.t;
  2. +-------------------------+----------+-----------+---------------+------------------------------------------+
  3. | id | estRows | task | access object | operator info |
  4. +-------------------------+----------+-----------+---------------+------------------------------------------+
  5. | StreamAgg_6 | 1.00 | root | | funcs:count(distinct test.t.a)->Column#4 |
  6. | └─TableReader_10 | 10000.00 | root | | data:TableFullScan_9 |
  7. | └─TableFullScan_9 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
  8. +-------------------------+----------+-----------+---------------+------------------------------------------+
  9. 3 rows in set (0.01 sec)
  10. mysql> set session tidb_opt_distinct_agg_push_down = 1;
  11. Query OK, 0 rows affected (0.00 sec)
  12. mysql> desc select count(distinct a) from test.t;
  13. +---------------------------+----------+-----------+---------------+------------------------------------------+
  14. | id | estRows | task | access object | operator info |
  15. +---------------------------+----------+-----------+---------------+------------------------------------------+
  16. | HashAgg_8 | 1.00 | root | | funcs:count(distinct test.t.a)->Column#3 |
  17. | └─TableReader_9 | 1.00 | root | | data:HashAgg_5 |
  18. | └─HashAgg_5 | 1.00 | cop[tikv] | | group by:test.t.a, |
  19. | └─TableFullScan_7 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
  20. +---------------------------+----------+-----------+---------------+------------------------------------------+
  21. 4 rows in set (0.00 sec)