MULTI-LOAD

Name

MULTI LOAD

Description

Users submit multiple import jobs through the HTTP protocol. Multi Load can ensure the atomic effect of multiple import jobs

  1. Syntax:
  2. curl --location-trusted -u user:passwd -XPOST http://host:port/api/{db}/_multi_start?label=xxx
  3. curl --location-trusted -u user:passwd -T data.file http://host:port/api/{db}/{table1}/_load?label=xxx\&sub_label=yyy
  4. curl --location-trusted -u user:passwd -T data.file http://host:port/api/{db}/{table2}/_load?label=xxx\&sub_label=zzz
  5. curl --location-trusted -u user:passwd -XPOST http://host:port/api/{db}/_multi_commit?label=xxx
  6. curl --location-trusted -u user:passwd -XPOST http://host:port/api/{db}/_multi_desc?label=xxx
  7. On the basis of 'MINI LOAD', 'MULTI LOAD' can support users to import to multiple tables at the same time. The specific commands are shown above.
  8. '/api/{db}/_multi_start' starts a multi-table import task
  9. '/api/{db}/{table}/_load' adds a table to be imported to an import task. The main difference from 'MINI LOAD' is that the 'sub_label' parameter needs to be passed in
  10. '/api/{db}/_multi_commit' submits the entire multi-table import task, and starts processing in the background
  11. '/api/{db}/_multi_abort' Abort a multi-table import task
  12. '/api/{db}/_multi_desc' can display the number of jobs submitted by a multi-table import task
  13. Description of the HTTP protocol
  14. Authorization Authentication Currently, Doris uses HTTP Basic authorization authentication. So you need to specify the username and password when importing
  15. This method is to pass the password in clear text, since we are currently in an intranet environment. . .
  16. Expect Doris needs to send the http request, it needs to have 'Expect' header information, the content is '100-continue'
  17. why? Because we need to redirect the request, before transmitting the data content,
  18. This can avoid causing multiple transmissions of data, thereby improving efficiency.
  19. Content-Length Doris needs to send the request with the 'Content-Length' header. If the content sent is greater than
  20. If the 'Content-Length' is less, then Palo thinks that there is a problem with the transmission, and fails to submit the task.
  21. NOTE: If more data is sent than 'Content-Length', then Doris only reads 'Content-Length'
  22. length content and import
  23. Parameter Description:
  24. user: If the user is in the default_cluster, the user is the user_name. Otherwise user_name@cluster_name.
  25. label: Used to specify the label number imported in this batch, which is used for later job status query, etc.
  26. This parameter is required.
  27. sub_label: Used to specify the subversion number inside a multi-table import task. For loads imported from multiple tables, this parameter must be passed in.
  28. columns: used to describe the corresponding column names in the import file.
  29. If it is not passed in, then the order of the columns in the file is considered to be the same as the order in which the table was created.
  30. The specified method is comma-separated, for example: columns=k1,k2,k3,k4
  31. column_separator: used to specify the separator between columns, the default is '\t'
  32. NOTE: url encoding is required, for example, '\t' needs to be specified as the delimiter,
  33. Then you should pass in 'column_separator=%09'
  34. max_filter_ratio: used to specify the maximum ratio of non-standard data allowed to filter, the default is 0, no filtering is allowed
  35. The custom specification should be as follows: 'max_filter_ratio=0.2', which means 20% error rate is allowed
  36. Passing in has effect when '_multi_start'
  37. NOTE:
  38. 1. This import method currently completes the import work on one machine, so it is not suitable for import work with a large amount of data.
  39. It is recommended that the amount of imported data should not exceed 1GB
  40. 2. Currently it is not possible to submit multiple files using `curl -T "{file1, file2}"`, because curl splits them into multiple files
  41. The request is sent. Multiple requests cannot share a label number, so it cannot be used.
  42. 3. Supports the use of curl to import data into Doris in a way similar to streaming, but only after the streaming ends Doris
  43. The real import behavior will occur, and the amount of data in this way cannot be too large.

Example

  1. 1. Import the data in the local file 'testData1' into the table 'testTbl1' in the database 'testDb', and
  2. Import the data of 'testData2' into table 'testTbl2' in 'testDb' (user is in defalut_cluster)
  3. curl --location-trusted -u root -XPOST http://host:port/api/testDb/_multi_start?label=123
  4. curl --location-trusted -u root -T testData1 http://host:port/api/testDb/testTbl1/_load?label=123\&sub_label=1
  5. curl --location-trusted -u root -T testData2 http://host:port/api/testDb/testTbl2/_load?label=123\&sub_label=2
  6. curl --location-trusted -u root -XPOST http://host:port/api/testDb/_multi_commit?label=123
  7. 2. Abandoned in the middle of multi-table import (user is in defalut_cluster)
  8. curl --location-trusted -u root -XPOST http://host:port/api/testDb/_multi_start?label=123
  9. curl --location-trusted -u root -T testData1 http://host:port/api/testDb/testTbl1/_load?label=123\&sub_label=1
  10. curl --location-trusted -u root -XPOST http://host:port/api/testDb/_multi_abort?label=123
  11. 3. Multi-table import to see how much content has been submitted (the user is in the defalut_cluster)
  12. curl --location-trusted -u root -XPOST http://host:port/api/testDb/_multi_start?label=123
  13. curl --location-trusted -u root -T testData1 http://host:port/api/testDb/testTbl1/_load?label=123\&sub_label=1
  14. curl --location-trusted -u root -XPOST http://host:port/api/testDb/_multi_desc?label=123

Keywords

  1. MULTI, MINI, LOAD

Best Practice