S3 Load

Starting from version 0.14, Doris supports the direct import of data from online storage systems that support the S3 protocol through the S3 protocol.

This document mainly introduces how to import data stored in AWS S3. It also supports the import of other object storage systems that support the S3 protocol, such as Baidu Cloud’s BOS, Alibaba Cloud’s OSS and Tencent Cloud’s COS, etc.

Applicable scenarios

  • Source data in S3 protocol accessible storage systems, such as S3, BOS.
  • Data volumes range from tens to hundreds of GB.

Preparing

  1. Standard AK and SK First, you need to find or regenerate AWS Access keys, you can find the generation method in My Security Credentials of AWS console, as shown in the following figure: AK_SK Select Create New Access Key and pay attention to save and generate AK and SK.
  2. Prepare REGION and ENDPOINT REGION can be selected when creating the bucket or can be viewed in the bucket list. ENDPOINT can be found through REGION on the following page AWS DocumentationS3 Load - 图1 (opens new window)

Other cloud storage systems can find relevant information compatible with S3 in corresponding documents

Start Loading

Like Broker Load just replace WITH BROKER broker_name () with

  1. WITH S3
  2. (
  3. "AWS_ENDPOINT" = "AWS_ENDPOINT",
  4. "AWS_ACCESS_KEY" = "AWS_ACCESS_KEY",
  5. "AWS_SECRET_KEY"="AWS_SECRET_KEY",
  6. "AWS_REGION" = "AWS_REGION"
  7. )

example:

  1. LOAD LABEL example_db.exmpale_label_1
  2. (
  3. DATA INFILE("s3://your_bucket_name/your_file.txt")
  4. INTO TABLE load_test
  5. COLUMNS TERMINATED BY ","
  6. )
  7. WITH S3
  8. (
  9. "AWS_ENDPOINT" = "AWS_ENDPOINT",
  10. "AWS_ACCESS_KEY" = "AWS_ACCESS_KEY",
  11. "AWS_SECRET_KEY"="AWS_SECRET_KEY",
  12. "AWS_REGION" = "AWS_REGION"
  13. )
  14. PROPERTIES
  15. (
  16. "timeout" = "3600"
  17. );

FAQ

S3 SDK uses virtual-hosted style by default. However, some object storage systems may not be enabled or support virtual-hosted style access. At this time, we can add the use_path_style parameter to force the use of path style:

  1. WITH S3
  2. (
  3. "AWS_ENDPOINT" = "AWS_ENDPOINT",
  4. "AWS_ACCESS_KEY" = "AWS_ACCESS_KEY",
  5. "AWS_SECRET_KEY"="AWS_SECRET_KEY",
  6. "AWS_REGION" = "AWS_REGION",
  7. "use_path_style" = "true"
  8. )