TPC-H Benchmark

TPC-H是一个决策支持基准(Decision Support Benchmark),它由一套面向业务的特别查询和并发数据修改组成。查询和填充数据库的数据具有广泛的行业相关性。这个基准测试演示了检查大量数据、执行高度复杂的查询并回答关键业务问题的决策支持系统。TPC-H报告的性能指标称为TPC-H每小时复合查询性能指标(QphH@Size),反映了系统处理查询能力的多个方面。这些方面包括执行查询时所选择的数据库大小,由单个流提交查询时的查询处理能力,以及由多个并发用户提交查询时的查询吞吐量。

本文档主要介绍 Doris 在 TPC-H 100G 测试集上的性能表现。

注1:包括 TPC-H 在内的标准测试集通常和实际业务场景差距较大,并且部分测试会针对测试集进行参数调优。所以标准测试集的测试结果仅能反映数据库在特定场景下的性能表现。建议用户使用实际业务数据进行进一步的测试。

注2:本文档涉及的操作都在 CentOS 7.x 上进行测试。

在 TPC-H 标准测试数据集上的 22 个查询上,我们基于 Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 及 Apache Doris 0.15.0 RC04 版本进行了对别测试, Apache Doris 1.2.0-rc01上相对 Apache Doris 1.1.3 整体性能提升了将近 3 倍,相对于 Apache Doris 0.15.0 RC04 ,性能提升了将近 11 倍 。

image-20220614114351241

1. 硬件环境

硬件配置说明
机器数量4 台腾讯云主机(1个FE,3个BE)
CPUIntel Xeon(Cascade Lake) Platinum 8269CY 16核 (2.5 GHz/3.2 GHz)
内存64G
网络带宽5Gbps
磁盘ESSD云硬盘

2. 软件环境

  • Doris部署 3BE 1FE;
  • 内核版本:Linux version 5.4.0-96-generic (buildd@lgw01-amd64-051)
  • 操作系统版本:CentOS 7.8
  • Doris 软件版本: Apache Doris 1.2.0-rc01、 Apache Doris 1.1.3 、 Apache Doris 0.15.0 RC04
  • JDK:openjdk version “11.0.14” 2022-01-18

3. 测试数据量

整个测试模拟生成 TPCH 100G 的数据分别导入到 Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 及 Apache Doris 0.15.0 RC04 版本进行测试,下面是表的相关说明及数据量。

TPC-H表名行数导入后大小备注
REGION5400KB区域表
NATION257.714 KB国家表
SUPPLIER100万85.528 MB供应商表
PART2000万752.330 MB零部件表
PARTSUPP8000万4.375 GB零部件供应表
CUSTOMER1500万1.317 GB客户表
ORDERS1.5亿6.301 GB订单表
LINEITEM6亿20.882 GB订单明细表

4. 测试SQL

TPCH 22 个测试查询语句 : TPCH-Query-SQL

注意:

以上 SQL 中的以下四个参数在 Apache Doris 0.15.0 RC04 中不存在,在执行的时候,去掉:

  1. 1. enable_vectorized_engine=true,
  2. 2. batch_size=4096,
  3. 3. disable_join_reorder=false
  4. 4. enable_projection=true

5. 测试结果

这里我们使用 Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 及 Apache Doris 0.15.0 RC04 版本进行对比测试,测试结果如下:

QueryApache Doris 1.2.0-rc01 (s)Apache Doris 1.1.3 (s)Apache Doris 0.15.0 RC04 (s)
Q12.123.7528.63
Q20.204.227.88
Q30.622.649.39
Q40.611.59.3
Q51.052.154.11
Q60.080.190.43
Q70.581.041.61
Q80.721.7550.35
Q93.617.9416.34
Q101.261.415.21
Q110.150.351.72
Q120.210.575.39
Q132.628.1520.88
Q140.160.3
Q150.300.661.86
Q160.380.791.32
Q170.651.5126.67
Q182.283.36411.77
Q190.200.8291.71
Q200.212.775.2
Q211.174.4710.34
Q220.460.93.22
合计19.6451.253223.33

结果说明

  • 测试结果对应的数据集为scale 100, 约6亿条。
  • 测试环境配置为用户常用配置,云服务器4台,16核 64G SSD,1 FE 3 BE 部署。
  • 选用用户常见配置测试以降低用户选型评估成本,但整个测试过程中不会消耗如此多的硬件资源。
  • Apache Doris 0.15 RC04 在 TPC-H 测试中 Q14 执行失败,无法完成查询。

6. 环境准备

请先参照 官方文档 进行 Doris 的安装部署,以获得一个正常运行中的 Doris 集群(至少包含 1 FE 1 BE,推荐 1 FE 3 BE)。

7. 数据准备

7.1 下载安装 TPC-H 数据生成工具

执行以下脚本下载并编译 tpch-tools 工具。

  1. sh build-tpch-dbgen.sh

安装成功后,将在 TPC-H_Tools_v3.0.0/ 目录下生成 dbgen 二进制文件。

7.2 生成 TPC-H 测试集

执行以下脚本生成 TPC-H 数据集:

  1. sh gen-tpch-data.sh

注1:通过 sh gen-tpch-data.sh -h 查看脚本帮助。

注2:数据会以 .tbl 为后缀生成在 tpch-data/ 目录下。文件总大小约100GB。生成时间可能在数分钟到1小时不等。

注3:默认生成 100G 的标准测试数据集

7.3 建表

7.3.1 准备 doris-cluster.conf 文件

在调用导入脚本前,需要将 FE 的 ip 端口等信息写在 doris-cluster.conf 文件中。

文件位置在 ${DORIS_HOME}/tools/tpch-tools/conf/ 目录下。

文件内容包括 FE 的 ip,HTTP 端口,用户名,密码以及待导入数据的 DB 名称:

  1. # Any of FE host
  2. export FE_HOST='127.0.0.1'
  3. # http_port in fe.conf
  4. export FE_HTTP_PORT=8030
  5. # query_port in fe.conf
  6. export FE_QUERY_PORT=9030
  7. # Doris username
  8. export USER='root'
  9. # Doris password
  10. export PASSWORD=''
  11. # The database where TPC-H tables located
  12. export DB='tpch1'

7.3.2 执行以下脚本生成创建 TPC-H 表

  1. sh create-tpch-tables.sh

或者复制 create-tpch-tables.sql 中的建表语句,在 Doris 中执行。

7.4 导入数据

通过下面的命令执行数据导入:

  1. sh ./load-tpch-data.sh

7.5 检查导入数据

执行下面的 SQL 语句检查导入的数据与上面的数据量是一致。

  1. select count(*) from lineitem;
  2. select count(*) from orders;
  3. select count(*) from partsupp;
  4. select count(*) from part;
  5. select count(*) from customer;
  6. select count(*) from supplier;
  7. select count(*) from nation;
  8. select count(*) from region;
  9. select count(*) from revenue0;

7.6 查询测试

7.6.1 执行查询脚本

执行上面的测试 SQL 或者 执行下面的命令

  1. ./run-tpch-queries.sh

注意:

  1. 目前Doris的查询优化器和统计信息功能还不完善,所以我们在TPC-H中重写了一些查询以适应Doris的执行框架,但不影响结果的正确性

  2. Doris 新的查询优化器将在后续的版本中发布

  3. 执行查询之前设置 set mem_exec_limit=8G

7.6.2 单个 SQL 执行

下面是测试时使用的 SQL 语句,你也可以从代码库里获取最新的 SQL 。最新测试查询语句地址:TPC-H 测试查询语句

  1. --Q1
  2. select /*+SET_VAR(exec_mem_limit=8589934592, parallel_fragment_exec_instance_num=8, enable_vectorized_engine=true, batch_size=4096, disable_join_reorder=false, enable_cost_based_join_reorder=false, enable_projection=false) */
  3. l_returnflag,
  4. l_linestatus,
  5. sum(l_quantity) as sum_qty,
  6. sum(l_extendedprice) as sum_base_price,
  7. sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
  8. sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
  9. avg(l_quantity) as avg_qty,
  10. avg(l_extendedprice) as avg_price,
  11. avg(l_discount) as avg_disc,
  12. count(*) as count_order
  13. from
  14. lineitem
  15. where
  16. l_shipdate <= date '1998-12-01' - interval '90' day
  17. group by
  18. l_returnflag,
  19. l_linestatus
  20. order by
  21. l_returnflag,
  22. l_linestatus;
  23. --Q2
  24. select /*+SET_VAR(exec_mem_limit=8589934592, parallel_fragment_exec_instance_num=1, enable_vectorized_engine=true, batch_size=4096, disable_join_reorder=true, enable_cost_based_join_reorder=false, enable_projection=true) */
  25. s_acctbal,
  26. s_name,
  27. n_name,
  28. p_partkey,
  29. p_mfgr,
  30. s_address,
  31. s_phone,
  32. s_comment
  33. from
  34. partsupp join
  35. (
  36. select
  37. ps_partkey as a_partkey,
  38. min(ps_supplycost) as a_min
  39. from
  40. partsupp,
  41. part,
  42. supplier,
  43. nation,
  44. region
  45. where
  46. p_partkey = ps_partkey
  47. and s_suppkey = ps_suppkey
  48. and s_nationkey = n_nationkey
  49. and n_regionkey = r_regionkey
  50. and r_name = 'EUROPE'
  51. and p_size = 15
  52. and p_type like '%BRASS'
  53. group by a_partkey
  54. ) A on ps_partkey = a_partkey and ps_supplycost=a_min ,
  55. part,
  56. supplier,
  57. nation,
  58. region
  59. where
  60. p_partkey = ps_partkey
  61. and s_suppkey = ps_suppkey
  62. and p_size = 15
  63. and p_type like '%BRASS'
  64. and s_nationkey = n_nationkey
  65. and n_regionkey = r_regionkey
  66. and r_name = 'EUROPE'
  67. order by
  68. s_acctbal desc,
  69. n_name,
  70. s_name,
  71. p_partkey
  72. limit 100;
  73. --Q3
  74. select /*+SET_VAR(exec_mem_limit=8589934592, parallel_fragment_exec_instance_num=8, enable_vectorized_engine=true, batch_size=4096, disable_join_reorder=true, enable_cost_based_join_reorder=false, enable_projection=true, runtime_filter_wait_time_ms=10000) */
  75. l_orderkey,
  76. sum(l_extendedprice * (1 - l_discount)) as revenue,
  77. o_orderdate,
  78. o_shippriority
  79. from
  80. (
  81. select l_orderkey, l_extendedprice, l_discount, o_orderdate, o_shippriority, o_custkey from
  82. lineitem join orders
  83. where l_orderkey = o_orderkey
  84. and o_orderdate < date '1995-03-15'
  85. and l_shipdate > date '1995-03-15'
  86. ) t1 join customer c
  87. on c.c_custkey = t1.o_custkey
  88. where c_mktsegment = 'BUILDING'
  89. group by
  90. l_orderkey,
  91. o_orderdate,
  92. o_shippriority
  93. order by
  94. revenue desc,
  95. o_orderdate
  96. limit 10;
  97. --Q4
  98. select /*+SET_VAR(exec_mem_limit=8589934592, parallel_fragment_exec_instance_num=4, enable_vectorized_engine=true, batch_size=4096, disable_join_reorder=true, enable_cost_based_join_reorder=false, enable_projection=true) */
  99. o_orderpriority,
  100. count(*) as order_count
  101. from
  102. (
  103. select
  104. *
  105. from
  106. lineitem
  107. where l_commitdate < l_receiptdate
  108. ) t1
  109. right semi join orders
  110. on t1.l_orderkey = o_orderkey
  111. where
  112. o_orderdate >= date '1993-07-01'
  113. and o_orderdate < date '1993-07-01' + interval '3' month
  114. group by
  115. o_orderpriority
  116. order by
  117. o_orderpriority;
  118. --Q5
  119. select /*+SET_VAR(exec_mem_limit=8589934592, parallel_fragment_exec_instance_num=8, enable_vectorized_engine=true, batch_size=4096, disable_join_reorder=false, enable_cost_based_join_reorder=false, enable_projection=true) */
  120. n_name,
  121. sum(l_extendedprice * (1 - l_discount)) as revenue
  122. from
  123. customer,
  124. orders,
  125. lineitem,
  126. supplier,
  127. nation,
  128. region
  129. where
  130. c_custkey = o_custkey
  131. and l_orderkey = o_orderkey
  132. and l_suppkey = s_suppkey
  133. and c_nationkey = s_nationkey
  134. and s_nationkey = n_nationkey
  135. and n_regionkey = r_regionkey
  136. and r_name = 'ASIA'
  137. and o_orderdate >= date '1994-01-01'
  138. and o_orderdate < date '1994-01-01' + interval '1' year
  139. group by
  140. n_name
  141. order by
  142. revenue desc;
  143. --Q6
  144. select /*+SET_VAR(exec_mem_limit=8589934592, parallel_fragment_exec_instance_num=1, enable_vectorized_engine=true, batch_size=4096, disable_join_reorder=false, enable_cost_based_join_reorder=false, enable_projection=true) */
  145. sum(l_extendedprice * l_discount) as revenue
  146. from
  147. lineitem
  148. where
  149. l_shipdate >= date '1994-01-01'
  150. and l_shipdate < date '1994-01-01' + interval '1' year
  151. and l_discount between .06 - 0.01 and .06 + 0.01
  152. and l_quantity < 24;
  153. --Q7
  154. select /*+SET_VAR(exec_mem_limit=458589934592, parallel_fragment_exec_instance_num=2, enable_vectorized_engine=true, batch_size=4096, disable_join_reorder=false, enable_cost_based_join_reorder=false, enable_projection=true) */
  155. supp_nation,
  156. cust_nation,
  157. l_year,
  158. sum(volume) as revenue
  159. from
  160. (
  161. select
  162. n1.n_name as supp_nation,
  163. n2.n_name as cust_nation,
  164. extract(year from l_shipdate) as l_year,
  165. l_extendedprice * (1 - l_discount) as volume
  166. from
  167. supplier,
  168. lineitem,
  169. orders,
  170. customer,
  171. nation n1,
  172. nation n2
  173. where
  174. s_suppkey = l_suppkey
  175. and o_orderkey = l_orderkey
  176. and c_custkey = o_custkey
  177. and s_nationkey = n1.n_nationkey
  178. and c_nationkey = n2.n_nationkey
  179. and (
  180. (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
  181. or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
  182. )
  183. and l_shipdate between date '1995-01-01' and date '1996-12-31'
  184. ) as shipping
  185. group by
  186. supp_nation,
  187. cust_nation,
  188. l_year
  189. order by
  190. supp_nation,
  191. cust_nation,
  192. l_year;
  193. --Q8
  194. select /*+SET_VAR(exec_mem_limit=8589934592, parallel_fragment_exec_instance_num=8, enable_vectorized_engine=true, batch_size=4096, disable_join_reorder=true, enable_cost_based_join_reorder=false, enable_projection=true) */
  195. o_year,
  196. sum(case
  197. when nation = 'BRAZIL' then volume
  198. else 0
  199. end) / sum(volume) as mkt_share
  200. from
  201. (
  202. select
  203. extract(year from o_orderdate) as o_year,
  204. l_extendedprice * (1 - l_discount) as volume,
  205. n2.n_name as nation
  206. from
  207. lineitem,
  208. orders,
  209. customer,
  210. supplier,
  211. part,
  212. nation n1,
  213. nation n2,
  214. region
  215. where
  216. p_partkey = l_partkey
  217. and s_suppkey = l_suppkey
  218. and l_orderkey = o_orderkey
  219. and o_custkey = c_custkey
  220. and c_nationkey = n1.n_nationkey
  221. and n1.n_regionkey = r_regionkey
  222. and r_name = 'AMERICA'
  223. and s_nationkey = n2.n_nationkey
  224. and o_orderdate between date '1995-01-01' and date '1996-12-31'
  225. and p_type = 'ECONOMY ANODIZED STEEL'
  226. ) as all_nations
  227. group by
  228. o_year
  229. order by
  230. o_year;
  231. --Q9
  232. select/*+SET_VAR(exec_mem_limit=37179869184, parallel_fragment_exec_instance_num=2, enable_vectorized_engine=true, batch_size=4096, disable_join_reorder=false, enable_cost_based_join_reorder=false, enable_projection=true, enable_remove_no_conjuncts_runtime_filter_policy=true, runtime_filter_wait_time_ms=100000) */
  233. nation,
  234. o_year,
  235. sum(amount) as sum_profit
  236. from
  237. (
  238. select
  239. n_name as nation,
  240. extract(year from o_orderdate) as o_year,
  241. l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
  242. from
  243. lineitem join orders on o_orderkey = l_orderkey
  244. join[shuffle] part on p_partkey = l_partkey
  245. join[shuffle] partsupp on ps_partkey = l_partkey
  246. join[shuffle] supplier on s_suppkey = l_suppkey
  247. join[broadcast] nation on s_nationkey = n_nationkey
  248. where
  249. ps_suppkey = l_suppkey and
  250. p_name like '%green%'
  251. ) as profit
  252. group by
  253. nation,
  254. o_year
  255. order by
  256. nation,
  257. o_year desc;
  258. --Q10
  259. select /*+SET_VAR(exec_mem_limit=8589934592, parallel_fragment_exec_instance_num=4, enable_vectorized_engine=true, batch_size=4096, disable_join_reorder=true, enable_cost_based_join_reorder=false, enable_projection=true) */
  260. c_custkey,
  261. c_name,
  262. sum(t1.l_extendedprice * (1 - t1.l_discount)) as revenue,
  263. c_acctbal,
  264. n_name,
  265. c_address,
  266. c_phone,
  267. c_comment
  268. from
  269. customer,
  270. (
  271. select o_custkey,l_extendedprice,l_discount from lineitem, orders
  272. where l_orderkey = o_orderkey
  273. and o_orderdate >= date '1993-10-01'
  274. and o_orderdate < date '1993-10-01' + interval '3' month
  275. and l_returnflag = 'R'
  276. ) t1,
  277. nation
  278. where
  279. c_custkey = t1.o_custkey
  280. and c_nationkey = n_nationkey
  281. group by
  282. c_custkey,
  283. c_name,
  284. c_acctbal,
  285. c_phone,
  286. n_name,
  287. c_address,
  288. c_comment
  289. order by
  290. revenue desc
  291. limit 20;
  292. --Q11
  293. select /*+SET_VAR(exec_mem_limit=8589934592, parallel_fragment_exec_instance_num=2, enable_vectorized_engine=true, batch_size=4096, disable_join_reorder=false, enable_cost_based_join_reorder=true, enable_projection=true) */
  294. ps_partkey,
  295. sum(ps_supplycost * ps_availqty) as value
  296. from
  297. partsupp,
  298. (
  299. select s_suppkey
  300. from supplier, nation
  301. where s_nationkey = n_nationkey and n_name = 'GERMANY'
  302. ) B
  303. where
  304. ps_suppkey = B.s_suppkey
  305. group by
  306. ps_partkey having
  307. sum(ps_supplycost * ps_availqty) > (
  308. select
  309. sum(ps_supplycost * ps_availqty) * 0.000002
  310. from
  311. partsupp,
  312. (select s_suppkey
  313. from supplier, nation
  314. where s_nationkey = n_nationkey and n_name = 'GERMANY'
  315. ) A
  316. where
  317. ps_suppkey = A.s_suppkey
  318. )
  319. order by
  320. value desc;
  321. --Q12
  322. select /*+SET_VAR(exec_mem_limit=8589934592, parallel_fragment_exec_instance_num=2, enable_vectorized_engine=true, batch_size=4096, disable_join_reorder=false, enable_cost_based_join_reorder=true, enable_projection=true) */
  323. l_shipmode,
  324. sum(case
  325. when o_orderpriority = '1-URGENT'
  326. or o_orderpriority = '2-HIGH'
  327. then 1
  328. else 0
  329. end) as high_line_count,
  330. sum(case
  331. when o_orderpriority <> '1-URGENT'
  332. and o_orderpriority <> '2-HIGH'
  333. then 1
  334. else 0
  335. end) as low_line_count
  336. from
  337. orders,
  338. lineitem
  339. where
  340. o_orderkey = l_orderkey
  341. and l_shipmode in ('MAIL', 'SHIP')
  342. and l_commitdate < l_receiptdate
  343. and l_shipdate < l_commitdate
  344. and l_receiptdate >= date '1994-01-01'
  345. and l_receiptdate < date '1994-01-01' + interval '1' year
  346. group by
  347. l_shipmode
  348. order by
  349. l_shipmode;
  350. --Q13
  351. select /*+SET_VAR(exec_mem_limit=45899345920, parallel_fragment_exec_instance_num=16, enable_vectorized_engine=true, batch_size=4096, disable_join_reorder=true, enable_cost_based_join_reorder=true, enable_projection=true) */
  352. c_count,
  353. count(*) as custdist
  354. from
  355. (
  356. select
  357. c_custkey,
  358. count(o_orderkey) as c_count
  359. from
  360. orders right outer join customer on
  361. c_custkey = o_custkey
  362. and o_comment not like '%special%requests%'
  363. group by
  364. c_custkey
  365. ) as c_orders
  366. group by
  367. c_count
  368. order by
  369. custdist desc,
  370. c_count desc;
  371. --Q14
  372. select /*+SET_VAR(exec_mem_limit=8589934592, parallel_fragment_exec_instance_num=8, enable_vectorized_engine=true, batch_size=4096, disable_join_reorder=true, enable_cost_based_join_reorder=true, enable_projection=true, runtime_filter_mode=OFF) */
  373. 100.00 * sum(case
  374. when p_type like 'PROMO%'
  375. then l_extendedprice * (1 - l_discount)
  376. else 0
  377. end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
  378. from
  379. part,
  380. lineitem
  381. where
  382. l_partkey = p_partkey
  383. and l_shipdate >= date '1995-09-01'
  384. and l_shipdate < date '1995-09-01' + interval '1' month;
  385. --Q15
  386. select /*+SET_VAR(exec_mem_limit=8589934592, parallel_fragment_exec_instance_num=8, enable_vectorized_engine=true, batch_size=4096, disable_join_reorder=false, enable_cost_based_join_reorder=true, enable_projection=true) */
  387. s_suppkey,
  388. s_name,
  389. s_address,
  390. s_phone,
  391. total_revenue
  392. from
  393. supplier,
  394. revenue0
  395. where
  396. s_suppkey = supplier_no
  397. and total_revenue = (
  398. select
  399. max(total_revenue)
  400. from
  401. revenue0
  402. )
  403. order by
  404. s_suppkey;
  405. --Q16
  406. select /*+SET_VAR(exec_mem_limit=8589934592, parallel_fragment_exec_instance_num=8, enable_vectorized_engine=true, batch_size=4096, disable_join_reorder=false, enable_cost_based_join_reorder=true, enable_projection=true) */
  407. p_brand,
  408. p_type,
  409. p_size,
  410. count(distinct ps_suppkey) as supplier_cnt
  411. from
  412. partsupp,
  413. part
  414. where
  415. p_partkey = ps_partkey
  416. and p_brand <> 'Brand#45'
  417. and p_type not like 'MEDIUM POLISHED%'
  418. and p_size in (49, 14, 23, 45, 19, 3, 36, 9)
  419. and ps_suppkey not in (
  420. select
  421. s_suppkey
  422. from
  423. supplier
  424. where
  425. s_comment like '%Customer%Complaints%'
  426. )
  427. group by
  428. p_brand,
  429. p_type,
  430. p_size
  431. order by
  432. supplier_cnt desc,
  433. p_brand,
  434. p_type,
  435. p_size;
  436. --Q17
  437. select /*+SET_VAR(exec_mem_limit=8589934592, parallel_fragment_exec_instance_num=1, enable_vectorized_engine=true, batch_size=4096, disable_join_reorder=false, enable_cost_based_join_reorder=true, enable_projection=true) */
  438. sum(l_extendedprice) / 7.0 as avg_yearly
  439. from
  440. lineitem join [broadcast]
  441. part p1 on p1.p_partkey = l_partkey
  442. where
  443. p1.p_brand = 'Brand#23'
  444. and p1.p_container = 'MED BOX'
  445. and l_quantity < (
  446. select
  447. 0.2 * avg(l_quantity)
  448. from
  449. lineitem join [broadcast]
  450. part p2 on p2.p_partkey = l_partkey
  451. where
  452. l_partkey = p1.p_partkey
  453. and p2.p_brand = 'Brand#23'
  454. and p2.p_container = 'MED BOX'
  455. );
  456. --Q18
  457. select /*+SET_VAR(exec_mem_limit=45899345920, parallel_fragment_exec_instance_num=4, enable_vectorized_engine=true, batch_size=4096, disable_join_reorder=true, enable_cost_based_join_reorder=true, enable_projection=true) */
  458. c_name,
  459. c_custkey,
  460. t3.o_orderkey,
  461. t3.o_orderdate,
  462. t3.o_totalprice,
  463. sum(t3.l_quantity)
  464. from
  465. customer join
  466. (
  467. select * from
  468. lineitem join
  469. (
  470. select * from
  471. orders left semi join
  472. (
  473. select
  474. l_orderkey
  475. from
  476. lineitem
  477. group by
  478. l_orderkey having sum(l_quantity) > 300
  479. ) t1
  480. on o_orderkey = t1.l_orderkey
  481. ) t2
  482. on t2.o_orderkey = l_orderkey
  483. ) t3
  484. on c_custkey = t3.o_custkey
  485. group by
  486. c_name,
  487. c_custkey,
  488. t3.o_orderkey,
  489. t3.o_orderdate,
  490. t3.o_totalprice
  491. order by
  492. t3.o_totalprice desc,
  493. t3.o_orderdate
  494. limit 100;
  495. --Q19
  496. select /*+SET_VAR(exec_mem_limit=8589934592, parallel_fragment_exec_instance_num=2, enable_vectorized_engine=true, batch_size=4096, disable_join_reorder=false, enable_cost_based_join_reorder=false, enable_projection=true) */
  497. sum(l_extendedprice* (1 - l_discount)) as revenue
  498. from
  499. lineitem,
  500. part
  501. where
  502. (
  503. p_partkey = l_partkey
  504. and p_brand = 'Brand#12'
  505. and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
  506. and l_quantity >= 1 and l_quantity <= 1 + 10
  507. and p_size between 1 and 5
  508. and l_shipmode in ('AIR', 'AIR REG')
  509. and l_shipinstruct = 'DELIVER IN PERSON'
  510. )
  511. or
  512. (
  513. p_partkey = l_partkey
  514. and p_brand = 'Brand#23'
  515. and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
  516. and l_quantity >= 10 and l_quantity <= 10 + 10
  517. and p_size between 1 and 10
  518. and l_shipmode in ('AIR', 'AIR REG')
  519. and l_shipinstruct = 'DELIVER IN PERSON'
  520. )
  521. or
  522. (
  523. p_partkey = l_partkey
  524. and p_brand = 'Brand#34'
  525. and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
  526. and l_quantity >= 20 and l_quantity <= 20 + 10
  527. and p_size between 1 and 15
  528. and l_shipmode in ('AIR', 'AIR REG')
  529. and l_shipinstruct = 'DELIVER IN PERSON'
  530. );
  531. --Q20
  532. select /*+SET_VAR(exec_mem_limit=8589934592, parallel_fragment_exec_instance_num=2, enable_vectorized_engine=true, batch_size=4096, disable_join_reorder=true, enable_cost_based_join_reorder=true, enable_projection=true, runtime_bloom_filter_size=551943) */
  533. s_name, s_address from
  534. supplier left semi join
  535. (
  536. select * from
  537. (
  538. select l_partkey,l_suppkey, 0.5 * sum(l_quantity) as l_q
  539. from lineitem
  540. where l_shipdate >= date '1994-01-01'
  541. and l_shipdate < date '1994-01-01' + interval '1' year
  542. group by l_partkey,l_suppkey
  543. ) t2 join
  544. (
  545. select ps_partkey, ps_suppkey, ps_availqty
  546. from partsupp left semi join part
  547. on ps_partkey = p_partkey and p_name like 'forest%'
  548. ) t1
  549. on t2.l_partkey = t1.ps_partkey and t2.l_suppkey = t1.ps_suppkey
  550. and t1.ps_availqty > t2.l_q
  551. ) t3
  552. on s_suppkey = t3.ps_suppkey
  553. join nation
  554. where s_nationkey = n_nationkey
  555. and n_name = 'CANADA'
  556. order by s_name;
  557. --Q21
  558. select /*+SET_VAR(exec_mem_limit=8589934592, parallel_fragment_exec_instance_num=4, enable_vectorized_engine=true, batch_size=4096, disable_join_reorder=true, enable_cost_based_join_reorder=true, enable_projection=true) */
  559. s_name, count(*) as numwait
  560. from
  561. lineitem l2 right semi join
  562. (
  563. select * from
  564. lineitem l3 right anti join
  565. (
  566. select * from
  567. orders join lineitem l1 on l1.l_orderkey = o_orderkey and o_orderstatus = 'F'
  568. join
  569. (
  570. select * from
  571. supplier join nation
  572. where s_nationkey = n_nationkey
  573. and n_name = 'SAUDI ARABIA'
  574. ) t1
  575. where t1.s_suppkey = l1.l_suppkey and l1.l_receiptdate > l1.l_commitdate
  576. ) t2
  577. on l3.l_orderkey = t2.l_orderkey and l3.l_suppkey <> t2.l_suppkey and l3.l_receiptdate > l3.l_commitdate
  578. ) t3
  579. on l2.l_orderkey = t3.l_orderkey and l2.l_suppkey <> t3.l_suppkey
  580. group by
  581. t3.s_name
  582. order by
  583. numwait desc,
  584. t3.s_name
  585. limit 100;
  586. --Q22
  587. with tmp as (select
  588. avg(c_acctbal) as av
  589. from
  590. customer
  591. where
  592. c_acctbal > 0.00
  593. and substring(c_phone, 1, 2) in
  594. ('13', '31', '23', '29', '30', '18', '17'))
  595. select /*+SET_VAR(exec_mem_limit=8589934592, parallel_fragment_exec_instance_num=4,runtime_bloom_filter_size=4194304) */
  596. cntrycode,
  597. count(*) as numcust,
  598. sum(c_acctbal) as totacctbal
  599. from
  600. (
  601. select
  602. substring(c_phone, 1, 2) as cntrycode,
  603. c_acctbal
  604. from
  605. orders right anti join customer c on o_custkey = c.c_custkey join tmp on c.c_acctbal > tmp.av
  606. where
  607. substring(c_phone, 1, 2) in
  608. ('13', '31', '23', '29', '30', '18', '17')
  609. ) as custsale
  610. group by
  611. cntrycode
  612. order by
  613. cntrycode;