Load csv format data

This document will guide you on how to import large amounts of .csv format data to MatrixOne.

Syntax

  • Scenario 1: The data file is in the same machine with the MatrixOne server.
  1. LOAD DATA
  2. INFILE 'file_name'
  3. INTO TABLE tbl_name
  4. [{FIELDS | COLUMNS}
  5. [TERMINATED BY 'string']
  6. [[OPTIONALLY] ENCLOSED BY 'char']
  7. ]
  8. [LINES
  9. [STARTING BY 'string']
  10. [TERMINATED BY 'string']
  11. ]
  12. [IGNORE number {LINES | ROWS}]
  13. [PARALLEL {'TRUE' | 'FALSE'}]
  • Scenario 2: The data file is in separate machines with the MatrixOne server.
  1. LOAD DATA LOCAL
  2. INFILE 'file_name'
  3. INTO TABLE tbl_name
  4. [{FIELDS | COLUMNS}
  5. [TERMINATED BY 'string']
  6. [[OPTIONALLY] ENCLOSED BY 'char']
  7. ]
  8. [LINES
  9. [STARTING BY 'string']
  10. [TERMINATED BY 'string']
  11. ]
  12. [IGNORE number {LINES | ROWS}]
  13. [PARALLEL {'TRUE' | 'FALSE'}]

Before you start

Make sure you have already Deployed and Launched standalone MatrixOne.

Using the Load data command in MySQL Client

You can use Load Data to import data from big data files.

This section will describe how to import a .csv file.

Note: A csv(comma-separated values) file is a delimited text file that uses a comma to separate values.

Steps

The data file is in the same machine with the MatrixOne server

  1. Before executing Load Data in MatrixOne, the table needs to be created in advance. For now, the data file is required to be at the same machine with MatrixOne server, a file transfer is necessary if they are in separate machines.

  2. Launch the MySQL Client in the MatrixOne local server for accessing the local file system.

    1. mysql -h 127.0.0.1 -P 6001 -uroot -p111

    Info

    The login account in the above code snippet is the initial account; please change the initial password after logging in to MatrixOne; see Password Management.

  3. Execute LOAD DATA with the corresponding file path in MySQL client.

    1. mysql> LOAD DATA INFILE '/tmp/xxx.csv'
    2. INTO TABLE xxxxxx
    3. FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY "\r\n";

The data file is in separate machines with the MatrixOne server

  1. Before executing LOAD DATA LOCAL in MatrixOne, the table needs to be created in advance.

  2. Launch the MySQL Client in the MatrixOne local server for accessing the local file system.

    1. mysql -h <mo-host-ip> -P 6001 -udump -p111 --local-infile
  3. Execute LOAD DATA LOCAL with the corresponding file path in MySQL client.

    1. mysql> LOAD DATA LOCAL INFILE '/tmp/xxx.csv'
    2. INTO TABLE table_name
    3. FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY "\r\n";

Example using Load data with docker version

If you install MatrixOne by docker, the file system is inside the docker image by default. To work with local directory, you need to bind a local directory to the container. In the following example, the local file system path ~/tmp/docker_loaddata_demo/ is binded to the MatrixOne docker image, with a mapping to the /ssb-dbgen-path path inside the docker. We will walk you through the whole process of loading data with MatrixOne 0.6.0 docker version in this example.

  1. Download the dataset file and store the data in ~/tmp/docker_loaddata_demo/:

    1. cd ~/tmp/docker_loaddata_demo/
    2. wget https://community-shared-data-1308875761.cos.ap-beijing.myqcloud.com/lineorder_flat.tar.bz2
  2. Unzip the dataset:

    1. tar -jxvf lineorder_flat.tar.bz2
  3. Use Docker to launch MatrixOne, and mount the directory ~/tmp/docker_loaddata_demo/ that stores data files to a directory in the container. The container directory is /sb-dbgen-path as an example:

    1. sudo docker run --name matrixone --privileged -d -p 6001:6001 -v ~/tmp/docker_loaddata_demo/:/ssb-dbgen-path:rw matrixorigin/matrixone:0.6.0
  4. Connect to MatrixOne server:

    1. mysql -h 127.0.0.1 -P 6001 -uroot -p111

    Note: If your data file is on a different machine from the MatrixOne server, that is, the data file is on the client machine you are using, then you need to use the command line to connect to the MatrixOne service host: mysql -h <mo-host -ip> -P 6001 -udump -p111 --local-infile; and the imported command line needs to use LOAD DATA LOCAL INFILE syntax.

  5. Create lineorder_flat tables in MatrixOne, and import the dataset into MatriOne:

    1. mysql> create database if not exists ssb;
    2. mysql> use ssb;
    3. mysql> drop table if exists lineorder_flat;
    4. mysql> CREATE TABLE lineorder_flat(
    5. LO_ORDERKEY bigint key,
    6. LO_LINENUMBER int,
    7. LO_CUSTKEY int,
    8. LO_PARTKEY int,
    9. LO_SUPPKEY int,
    10. LO_ORDERDATE date,
    11. LO_ORDERPRIORITY char(15),
    12. LO_SHIPPRIORITY tinyint,
    13. LO_QUANTITY double,
    14. LO_EXTENDEDPRICE double,
    15. LO_ORDTOTALPRICE double,
    16. LO_DISCOUNT double,
    17. LO_REVENUE int unsigned,
    18. LO_SUPPLYCOST int unsigned,
    19. LO_TAX double,
    20. LO_COMMITDATE date,
    21. LO_SHIPMODE char(10),
    22. C_NAME varchar(25),
    23. C_ADDRESS varchar(25),
    24. C_CITY char(10),
    25. C_NATION char(15),
    26. C_REGION char(12),
    27. C_PHONE char(15),
    28. C_MKTSEGMENT char(10),
    29. S_NAME char(25),
    30. S_ADDRESS varchar(25),
    31. S_CITY char(10),
    32. S_NATION char(15),
    33. S_REGION char(12),
    34. S_PHONE char(15),
    35. P_NAME varchar(22),
    36. P_MFGR char(6),
    37. P_CATEGORY char(7),
    38. P_BRAND char(9),
    39. P_COLOR varchar(11),
    40. P_TYPE varchar(25),
    41. P_SIZE int,
    42. P_CONTAINER char(10)
    43. );
    44. mysql> load data infile '/ssb-dbgen-path/lineorder_flat.tbl' into table lineorder_flat FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
  6. After the import is successful, you can run SQL statements to check the rows of imported data:

    1. select count(*) from lineorder_flat;
    2. /*
    3. expected results:
    4. */
    5. +----------+
    6. | count(*) |
    7. +----------+
    8. | 10272594 |
    9. +----------+

Constraints

The loaded csv file format supports JSON. However, you need to ensure that the JSON does not contain field termination symbols. If the JSON does contain field termination symbols, wrap the JSON with double quotation marks. For example:

  • Right csv file example: "{"a":1, "b":2}", 2
  • Wrong csv file example: {"a":1, "b":2}, 2