Import the data from S3 Compatible object storage

Overview

S3 (Simple Storage Service) object storage refers to Amazon’s Simple Storage Service. You can also store almost any type and size of data with S3-compatible object storage, including data lakes, cloud-native applications, and mobile apps. If you are unfamiliar with S3 object service, you may look up some basic introductions in AWS.

AWS S3 has been remarkably successful for over a decade, so it became the de facto standard for object storage. Thus almost every mainstream public cloud vendors provide an S3-compatible object storage service.

MatrixOne supports loading files from S3-compatible object storage services into databases. MatrixOne supports AWS and mainstream cloud vendors in China (Alibaba Cloud, Tencent Cloud).

In MatrixOne, there are two methods to import the data from S3-compatible object storage:

  • Use Load data with an s3option to load the file into MatrixOne. This method will load the data into MatrixOne, and all next queries will happen inside MatrixOne.
  • Create an external table with an s3option mapping to an S3 file, and query this external table directly. This method allows data access through an S3-compatible object storage service; each query’s networking latency will be counted.

Method 1: LOAD DATA

Syntax

  1. LOAD DATA
  2. | URL s3options {"endpoint"='<string>', "access_key_id"='<string>', "secret_access_key"='<string>', "bucket"='<string>', "filepath"='<string>', "region"='<string>', "compression"='<string>'}
  3. INTO TABLE tbl_name
  4. [{FIELDS | COLUMNS}
  5. [TERMINATED BY 'string']
  6. [[OPTIONALLY] ENCLOSED BY 'char']
  7. [ESCAPED BY 'char']
  8. ]
  9. [IGNORE number {LINES | ROWS}]

Parameter Description

ParameterDescription
endpointA endpoint is a URL that can conncect to object storage service. For example: s3.us-west-2.amazonaws.com
access_key_idAccess key ID
secret_access_keySecret access key
bucketS3 Bucket to access
filepathrelative file path. regex expression is supported as /files/*.csv.
regionobject storage service region
compressionCompressed format of S3 files. If empty or “none”, it indicates uncompressed files. Supported fields or Compressed format are “auto”, “none”, “gzip”, “bz2”, and “lz4”.

The other paramaters are identical to a ordinary LOAD DATA, see LOAD DATA for more details.

Statement Examples:

  1. # LOAD a csv file from AWS S3 us-east-1 region, test-load-mo bucket, without compression
  2. LOAD DATA URL s3option{"endpoint"='s3.us-east-1.amazonaws.com', "access_key_id"='XXXXXX', "secret_access_key"='XXXXXX', "bucket"='test-load-mo', "filepath"='test.csv', "region"='us-east-1', "compression"='none'} INTO TABLE t1 FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\n';
  3. # LOAD all csv files from Alibaba Cloud OSS Shanghai region, test-load-data bucket, without compression
  4. LOAD DATA URL s3option{"endpoint"='oss-cn-shanghai.aliyuncs.com', "access_key_id"='XXXXXX', "secret_access_key"='XXXXXX', "bucket"='test-load-data', "filepath"='/test/*.csv', "region"='oss-cn-shanghai', "compression"='none'} INTO TABLE t1 FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\n';
  5. # LOAD a csv file from Tencent Cloud COS Shanghai region, test-1252279971 bucket, without bz2 compression
  6. LOAD DATA URL s3option{"endpoint"='cos.ap-shanghai.myqcloud.com', "access_key_id"='XXXXXX', "secret_access_key"='XXXXXX', "bucket"='test-1252279971', "filepath"='test.csv.bz2', "region"='ap-shanghai', "compression"='bz2'} INTO TABLE t1 FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\n';

Tutorial: Load a file from AWS S3

In this tutorial, we will walk you through the process of loading a .csv file from AWS S3; we assume that you already have an AWS account and already have your data file ready in your S3 service. If you do not already have that, please sign up and upload your data file first; you may check on the AWS S3 official tutorial. The process for Alibaba Cloud OSS and Tencent Cloud COS is similar to AWS S3.

Note

This code example does not show account information such as access_key_id and secret_access_key because of account privacy. You can read this document to understand the main steps; specific data and account information will not be shown.

  1. Download the data file. Enter into AWS S3 > buckets, create a bucket test-loading with a public access and upload the file char_varchar_1.csv.

    Import data from S3 - 图1

    public block

  2. Get or create your AWS api key. Enter into Your Account Name > Security Credentials, get your existing Access Key or create a new one.

    Import data from S3 - 图3

    Access Key

    You can get the access key id and secret access key from the downloaded credentials or this webpage.

    Retrieve Access Key

  3. Launch the MySQL Client, create tables in MatrixOne, for example:

    1. create database db;
    2. use db;
    3. drop table if exists t1;
    4. create table t1(col1 char(225), col2 varchar(225), col3 text, col4 varchar(225));
  4. Import the file into MatrixOne:

    1. LOAD DATA URL s3option{"endpoint"='s3.us-east-1.amazonaws.com', "access_key_id"='XXXXXX', "secret_access_key"='XXXXXX', "bucket"='test-loading', "filepath"='char_varchar_1.csv', "region"='us-east-1', "compression"='none'} INTO TABLE t1;
  5. After the import is successful, you can run SQL statements to check the result of imported data:

    1. mysql> select * from t1;
    2. +-----------+-----------+-----------+-----------+
    3. | col1 | col2 | col3 | col4 |
    4. +-----------+-----------+-----------+-----------+
    5. | a | b | c | d |
    6. | a | b | c | d |
    7. | 'a' | 'b' | 'c' | 'd' |
    8. | 'a' | 'b' | 'c' | 'd' |
    9. | aa,aa | bb,bb | cc,cc | dd,dd |
    10. | aa, | bb, | cc, | dd, |
    11. | aa,,,aa | bb,,,bb | cc,,,cc | dd,,,dd |
    12. | aa',',,aa | bb',',,bb | cc',',,cc | dd',',,dd |
    13. | aa"aa | bb"bb | cc"cc | dd"dd |
    14. | aa"aa | bb"bb | cc"cc | dd"dd |
    15. | aa"aa | bb"bb | cc"cc | dd"dd |
    16. | aa""aa | bb""bb | cc""cc | dd""dd |
    17. | aa""aa | bb""bb | cc""cc | dd""dd |
    18. | aa",aa | bb",bb | cc",cc | dd",dd |
    19. | aa"",aa | bb"",bb | cc"",cc | dd"",dd |
    20. | | | | |
    21. | | | | |
    22. | NULL | NULL | NULL | NULL |
    23. | | | | |
    24. | " | " | " | " |
    25. | "" | "" | "" | "" |
    26. +-----------+-----------+-----------+-----------+
    27. 21 rows in set (0.03 sec)

Method 2: Specify S3 file to an external table

Syntax

  1. create external table t(...) URL s3option{"endpoint"='<string>', "access_key_id"='<string>', "secret_access_key"='<string>', "bucket"='<string>', "filepath"='<string>', "region"='<string>', "compression"='<string>'}
  2. [{FIELDS | COLUMNS}
  3. [TERMINATED BY 'string']
  4. [[OPTIONALLY] ENCLOSED BY 'char']
  5. [ESCAPED BY 'char']
  6. ]
  7. [IGNORE number {LINES | ROWS}];

Note

MatrixOne only supports select on external tables. Delete, insert, and update are not supported.

Parameter Description

ParameterDescription
endpointA endpoint is a URL that can conncect to object storage service. For example: s3.us-west-2.amazonaws.com
access_key_idAccess key ID
secret_access_keySecret access key
bucketS3 Bucket to access
filepathrelative file path. regex expression is supported as /files/*.csv.
regionobject storage service region
compressionCompressed format of S3 files. If empty or “none”, it indicates uncompressed files. Supported fields or Compressed format are “auto”, “none”, “gzip”, “bz2”, and “lz4”.

The other paramaters are identical to a ordinary LOAD DATA, see LOAD DATA for more details.

For more information about External Table, see CREATE EXTERNAL TABLE.

Statement Examples:

  1. ## Create a external table for a .csv file from AWS S3
  2. create external table t1(col1 char(225)) url s3option{"endpoint"='s3.us-east-1.amazonaws.com', "access_key_id"='XXXXXX', "secret_access_key"='XXXXXX', "bucket"='test-loading', "filepath"='test.csv', "region"='us-east-1', "compression"='none'} fields terminated by ',' enclosed by '\"' lines terminated by '\n';
  3. ## Create a external table for a .csv file compressed with BZIP2 from Tencent Cloud
  4. create external table t1(col1 char(225)) url s3option{"endpoint"='cos.ap-shanghai.myqcloud.com', "access_key_id"='XXXXXX', "secret_access_key"='XXXXXX', "bucket"='test-1252279971', "filepath"='test.csv.bz2', "region"='ap-shanghai', "compression"='bz2'} fields terminated by ',' enclosed by '\"' lines terminated by '\n' ignore 1 lines;

Tutorial: Create an external table with S3 file

This tutorial will walk you through the whole process of creating an external table with a .csv file from AWS S3.

Note

This code example does not show account information such as access_key_id and secret_access_key because of account privacy. You can read this document to understand the main steps; specific data and account information will not be shown.

  1. Download the data file. Enter into AWS S3 > buckets, create a bucket test-loading with a public access and upload the file char_varchar_1.csv.

    Import data from S3 - 图6

    public block

  2. Get or create your AWS api key. Enter into Your Account Name > Security Credentials, get your existing Access Key or create a new one.

    Import data from S3 - 图8

    Access Key

    You can get the access key id and secret access key from the downloaded credentials or this webpage.

    Retrieve Access Key

  3. Launch the MySQL Client, and specify the S3 file to an external table:

    1. create database db;
    2. use db;
    3. drop table if exists t1;
    4. create external table t1(col1 char(225), col2 varchar(225), col3 text, col4 varchar(225)) url s3option{"endpoint"='s3.us-east-1.amazonaws.com', "access_key_id"='XXXXXX', "secret_access_key"='XXXXXX', "bucket"='test-loading', "filepath"='char_varchar_1.csv', "region"='us-east-1', "compression"='none'} fields terminated by ',' enclosed by '\"' lines terminated by '\n';
  4. After the import is successful, you can run SQL statements to check the result of the imported data. You can see that the query speed is significantly slower than querying from a local table.

    1. select * from t1;
    2. +-----------+-----------+-----------+-----------+
    3. | col1 | col2 | col3 | col4 |
    4. +-----------+-----------+-----------+-----------+
    5. | a | b | c | d |
    6. | a | b | c | d |
    7. | 'a' | 'b' | 'c' | 'd' |
    8. | 'a' | 'b' | 'c' | 'd' |
    9. | aa,aa | bb,bb | cc,cc | dd,dd |
    10. | aa, | bb, | cc, | dd, |
    11. | aa,,,aa | bb,,,bb | cc,,,cc | dd,,,dd |
    12. | aa',',,aa | bb',',,bb | cc',',,cc | dd',',,dd |
    13. | aa"aa | bb"bb | cc"cc | dd"dd |
    14. | aa"aa | bb"bb | cc"cc | dd"dd |
    15. | aa"aa | bb"bb | cc"cc | dd"dd |
    16. | aa""aa | bb""bb | cc""cc | dd""dd |
    17. | aa""aa | bb""bb | cc""cc | dd""dd |
    18. | aa",aa | bb",bb | cc",cc | dd",dd |
    19. | aa"",aa | bb"",bb | cc"",cc | dd"",dd |
    20. | | | | |
    21. | | | | |
    22. | NULL | NULL | NULL | NULL |
    23. | | | | |
    24. | " | " | " | " |
    25. | "" | "" | "" | "" |
    26. +-----------+-----------+-----------+-----------+
    27. 21 rows in set (1.32 sec)
  5. (Optional)If you need to import external table data into a data table in MatrixOne, you can use the following SQL statement:

    Create a new table t2 in MatrixOne:

    1. create table t2(col1 char(225), col2 varchar(225), col3 text, col4 varchar(225));

    Import the external table t1 to t2:

    1. insert into t2 select * from t1;

Constraints

  1. MatrixOne only supports loading .csv format files from S3-compatible object storage.
  2. To load many files with a regex path, MatrixOne still has some bugs in loading *.csv without a parent directory. You can only load files as /test/*.csv.