Import local data

Stream Load is used to import local files into Doris.

Unlike the submission methods of other commands, Stream Load communicates with Doris through the HTTP protocol.

The HOST:PORT involved in this method should be the HTTP protocol port.

  • BE’s HTTP protocol port, the default is 8040.
  • FE’s HTTP protocol port, the default is 8030. However, it must be ensured that the network of the machine where the client is located can connect to the machine where the BE is located.

In this document, we use the curl command as an example to demonstrate how to import data.

At the end of the document, we give a code example of importing data using Java

Import Data

The request body of Stream Load is as follows:

  1. PUT /api/{db}/{table}/_stream_load
  1. Create a table

    Use the CREATE TABLE command to create a table in the demo to store the data to be imported. For the specific import method, please refer to the CREATE TABLE command manual. An example is as follows:

    1. CREATE TABLE IF NOT EXISTS load_local_file_test
    2. (
    3. id INT,
    4. age TINYINT,
    5. name VARCHAR(50)
    6. )
    7. unique key(id)
    8. DISTRIBUTED BY HASH(id) BUCKETS 3;
  2. Import data

    Execute the following curl command to import the local file:

    1. curl -u user:passwd -H "label:load_local_file_test" -T /path/to/local/demo.txt http://host:port/api/demo/load_local_file_test/_stream_load
    • user:passwd is the user created in Doris. The initial user is admin/root, and the password is blank in the initial state.
    • host:port is the HTTP protocol port of BE, the default is 8040, which can be viewed on the Doris cluster WEB UI page.
    • label: Label can be specified in the Header to uniquely identify this import task.

    For more advanced operations of the Stream Load command, see Stream Load Command documentation.

  3. Wait for the import result

    The Stream Load command is a synchronous command, and a successful return indicates that the import is successful. If the imported data is large, a longer waiting time may be required. Examples are as follows:

    1. {
    2. "TxnId": 1003,
    3. "Label": "load_local_file_test",
    4. "Status": "Success",
    5. "Message": "OK",
    6. "NumberTotalRows": 1000000,
    7. "NumberLoadedRows": 1000000,
    8. "NumberFilteredRows": 1,
    9. "NumberUnselectedRows": 0,
    10. "LoadBytes": 40888898,
    11. "LoadTimeMs": 2144,
    12. "BeginTxnTimeMs": 1,
    13. "StreamLoadPutTimeMs": 2,
    14. "ReadDataTimeMs": 325,
    15. "WriteDataTimeMs": 1933,
    16. "CommitAndPublishTimeMs": 106,
    17. "ErrorURL": "http://192.168.1.1:8042/api/_load_error_log?file=__shard_0/error_log_insert_stmt_db18266d4d9b4ee5-abb00ddd64bdf005_db18266d4d9b4ee5_abb00ddd64bdf005"
    18. }
    • The status of the Status field is Success, which means the import is successful.
    • For details of other fields, please refer to the Stream Load command documentation.

Import suggestion

  • Stream Load can only import local files.
  • It is recommended to limit the amount of data for an import request to 1 - 2 GB. If you have a large number of local files, you can submit them concurrently in batches.

Java code example

Here is a simple JAVA example to execute Stream Load:

  1. package demo.doris;
  2. import org.apache.commons.codec.binary.Base64;
  3. import org.apache.http.HttpHeaders;
  4. import org.apache.http.client.methods.CloseableHttpResponse;
  5. import org.apache.http.client.methods.HttpPut;
  6. import org.apache.http.entity.FileEntity;
  7. import org.apache.http.impl.client.CloseableHttpClient;
  8. import org.apache.http.impl.client.DefaultRedirectStrategy;
  9. import org.apache.http.impl.client.HttpClientBuilder;
  10. import org.apache.http.impl.client.HttpClients;
  11. import org.apache.http.util.EntityUtils;
  12. import java.io.File;
  13. import java.io.IOException;
  14. import java.nio.charset.StandardCharsets;
  15. /*
  16. This is an example of Doris Stream Load, which requires dependencies
  17. <dependency>
  18. <groupId>org.apache.httpcomponents</groupId>
  19. <artifactId>httpclient</artifactId>
  20. <version>4.5.13</version>
  21. </dependency>
  22. */
  23. public class DorisStreamLoader {
  24. //You can choose to fill in the FE address and the http_port of the FE, but the connectivity between the client and the BE node must be guaranteed.
  25. private final static String HOST = "your_host";
  26. private final static int PORT = 8040;
  27. private final static String DATABASE = "db1"; // database to import
  28. private final static String TABLE = "tbl1"; // table to import
  29. private final static String USER = "root"; // Doris username
  30. private final static String PASSWD = ""; // Doris password
  31. private final static String LOAD_FILE_NAME = "/path/to/1.txt"; // local file path to import
  32. private final static String loadUrl = String.format("http://%s:%s/api/%s/%s/_stream_load",
  33. HOST, PORT, DATABASE, TABLE);
  34. private final static HttpClientBuilder httpClientBuilder = HttpClients
  35. .custom()
  36. .setRedirectStrategy(new DefaultRedirectStrategy() {
  37. @Override
  38. protected boolean isRedirectable(String method) {
  39. // If the connection target is FE, you need to handle 307 redirect.
  40. return true;
  41. }
  42. });
  43. public void load(File file) throws Exception {
  44. try (CloseableHttpClient client = httpClientBuilder.build()) {
  45. HttpPut put = new HttpPut(loadUrl);
  46. put.setHeader(HttpHeaders.EXPECT, "100-continue");
  47. put.setHeader(HttpHeaders.AUTHORIZATION, basicAuthHeader(USER, PASSWD));
  48. // You can set stream load related properties in Header, here we set label and column_separator.
  49. put.setHeader("label","label1");
  50. put.setHeader("column_separator",",");
  51. // Set the import file.
  52. // StringEntity can also be used here to transfer arbitrary data.
  53. FileEntity entity = new FileEntity(file);
  54. put.setEntity(entity);
  55. try (CloseableHttpResponse response = client.execute(put)) {
  56. String loadResult = "";
  57. if (response.getEntity() != null) {
  58. loadResult = EntityUtils.toString(response.getEntity());
  59. }
  60. final int statusCode = response.getStatusLine().getStatusCode();
  61. if (statusCode != 200) {
  62. throw new IOException(
  63. String.format("Stream load failed. status: %s load result: %s", statusCode, loadResult));
  64. }
  65. System.out.println("Get load result: " + loadResult);
  66. }
  67. }
  68. }
  69. private String basicAuthHeader(String username, String password) {
  70. final String tobeEncode = username + ":" + password;
  71. byte[] encoded = Base64.encodeBase64(tobeEncode.getBytes(StandardCharsets.UTF_8));
  72. return "Basic " + new String(encoded);
  73. }
  74. public static void main(String[] args) throws Exception{
  75. DorisStreamLoader loader = new DorisStreamLoader();
  76. File file = new File(LOAD_FILE_NAME);
  77. loader.load(file);
  78. }
  79. }

Note: The version of http client here is 4.5.13

org.apache.httpcomponentshttpclient4.5.13