Import Analysis

Doris provides a graphical command to help users analyze a specific import more easily. This article describes how to use this feature.

This function is currently only for Broker Load analysis.

Import plan tree

If you don’t know much about Doris’ query plan tree, please read the previous article DORIS/best practices/query analysis.

The execution process of a Broker Load request is also based on Doris’ query framework. A Broker Load job will be split into multiple subtasks based on the number of DATA INFILE clauses in the import request. Each subtask can be regarded as an independent import execution plan. An import plan consists of only one Fragment, which is composed as follows:

  1. ┌────────────────┐
  2. OlapTableSink
  3. └────────────────┘
  4. ┌────────────────┐
  5. BrokerScanNode
  6. └────────────────┘

BrokerScanNode is mainly responsible for reading the source data and sending it to OlapTableSink, and OlapTableSink is responsible for sending data to the corresponding node according to the partition and bucketing rules, and the corresponding node is responsible for the actual data writing.

The Fragment of the import execution plan will be divided into one or more Instances according to the number of imported source files, the number of BE nodes and other parameters. Each Instance is responsible for part of the data import.

The execution plans of multiple subtasks are executed concurrently, and multiple instances of an execution plan are also executed in parallel.

View import Profile

The user can open the session variable is_report_success with the following command:

  1. SET is_report_success=true;

Then submit a Broker Load import request and wait until the import execution completes. Doris will generate a Profile for this import. Profile contains the execution details of importing each subtask and Instance, which helps us analyze import bottlenecks.

Viewing profiles of unsuccessful import jobs is currently not supported.

We can get the Profile list first with the following command:

  1. mysql> show load profile "/";
  2. +---------+------+-----------+------+------------+- --------------------+---------------------------------+------- ----+------------+
  3. | QueryId | User | DefaultDb | SQL | QueryType | StartTime | EndTime | TotalTime | QueryState |
  4. +---------+------+-----------+------+------------+- --------------------+---------------------------------+------- ----+------------+
  5. | 10441 | N/A | N/A | N/A | Load | 2021-04-10 22:15:37 | 2021-04-10 22:18:54 | 3m17s | N/A |
  6. +---------+------+-----------+------+------------+- --------------------+---------------------------------+------- ----+------------+

This command will list all currently saved import profiles. Each line corresponds to one import. where the QueryId column is the ID of the import job. This ID can also be viewed through the SHOW LOAD statement. We can select the QueryId corresponding to the Profile we want to see to see the specific situation.

Viewing a Profile is divided into 3 steps:

  1. View the subtask overview

    View an overview of subtasks with imported jobs by running the following command:

  1. mysql> show load profile "/10441";
  2. +-----------------------------------+------------+
  3. | TaskId | ActiveTime |
  4. +-----------------------------------+------------+
  5. | 980014623046410a-88e260f0c43031f1 | 3m14s |
  6. +-----------------------------------+------------+

As shown in the figure above, it means that the import job 10441 has a total of one subtask, in which ActiveTime indicates the execution time of the longest instance in this subtask.

  1. View the Instance overview of the specified subtask

    When we find that a subtask takes a long time, we can further check the execution time of each instance of the subtask:

  1. mysql> show load profile "/10441/980014623046410a-88e260f0c43031f1";
  2. +-----------------------------------+------------- -----+------------+
  3. | Instances | Host | ActiveTime |
  4. +-----------------------------------+------------- -----+------------+
  5. | 980014623046410a-88e260f0c43031f2 | 10.81.85.89:9067 | 3m7s |
  6. | 980014623046410a-88e260f0c43031f3 | 10.81.85.89:9067 | 3m6s |
  7. | 980014623046410a-88e260f0c43031f4 | 10.81.85.89:9067 | 3m10s |
  8. | 980014623046410a-88e260f0c43031f5 | 10.81.85.89:9067 | 3m14s |
  9. +-----------------------------------+------------- -----+------------+

This shows the time-consuming of four instances of the subtask 980014623046410a-88e260f0c43031f1, and also shows the execution node where the instance is located.

  1. View the specific Instance

    We can continue to view the detailed profile of each operator on a specific Instance:

  1. mysql> show load profile "/10441/980014623046410a-88e260f0c43031f1/980014623046410a-88e260f0c43031f5"\G
  2. **************************** 1. row ******************** ******
  3. Instance:
  4. ┌-----------------------------------------┐
  5. │[-1: OlapTableSink]
  6. │(Active: 2m17s, non-child: 70.91)
  7. - Counters:
  8. - CloseWaitTime: 1m53s
  9. - ConvertBatchTime: 0ns
  10. - MaxAddBatchExecTime: 1m46s
  11. - NonBlockingSendTime: 3m11s
  12. - NumberBatchAdded: 782
  13. - NumberNodeChannels: 1
  14. - OpenTime: 743.822us
  15. - RowsFiltered: 0
  16. - RowsRead: 1.599729M (1599729)
  17. - RowsReturned: 1.599729M (1599729)│
  18. - SendDataTime: 11s761ms
  19. - TotalAddBatchExecTime: 1m46s
  20. - ValidateDataTime: 9s802ms
  21. └-----------------------------------------┘
  22. ┌------------------------------------------------- ----┐
  23. │[0: BROKER_SCAN_NODE]
  24. │(Active: 56s537ms, non-child: 29.06)
  25. - Counters:
  26. - BytesDecompressed: 0.00
  27. - BytesRead: 5.77 GB
  28. - DecompressTime: 0ns
  29. - FileReadTime: 34s263ms
  30. - MaterializeTupleTime(*): 45s54ms
  31. - NumDiskAccess: 0
  32. - PeakMemoryUsage: 33.03 MB
  33. - RowsRead: 1.599729M (1599729)
  34. - RowsReturned: 1.599729M (1599729)
  35. - RowsReturnedRate: 28.295K /sec
  36. - TotalRawReadTime(*): 1m20s
  37. - TotalReadThroughput: 30.39858627319336 MB/sec
  38. - WaitScannerTime: 56s528ms
  39. └------------------------------------------------- ----┘

The figure above shows the specific profiles of each operator of Instance 980014623046410a-88e260f0c43031f5 in subtask 980014623046410a-88e260f0c43031f1.

Through the above three steps, we can gradually check the execution bottleneck of an import task.