Writing Data to MatrixOne Using DataX

Overview

This article explains using the DataX tool to write data to offline MatrixOne databases.

DataX is an open-source heterogeneous data source offline synchronization tool developed by Alibaba. It provides stable and efficient data synchronization functions to achieve efficient data synchronization between various heterogeneous data sources.

DataX divides the synchronization of different data sources into two main components: Reader (read data source) and Writer (write to the target data source). The DataX framework theoretically supports data synchronization work for any data source type.

MatrixOne is highly compatible with MySQL 8.0. However, since the MySQL Writer plugin with DataX is adapted to the MySQL 5.1 JDBC driver, the community has separately modified the MatrixOneWriter plugin based on the MySQL 8.0 driver to improve compatibility. The MatrixOneWriter plugin implements the functionality of writing data to the target table in the MatrixOne database. In the underlying implementation, MatrixOneWriter connects to the remote MatrixOne database via JDBC and executes the corresponding insert into ... SQL statements to write data to MatrixOne. It also supports batch commits for performance optimization.

MatrixOneWriter uses DataX to retrieve generated protocol data from the Reader and generates the corresponding insert into ... statements based on your configured writeMode. In the event of primary key or uniqueness index conflicts, conflicting rows are excluded, and writing continues. For performance optimization, we use the PreparedStatement + Batch method and set the rewriteBatchedStatements=true option to buffer data to the thread context buffer. The write request is triggered only when the data volume in the buffer reaches the specified threshold.

DataX

Note

To execute the entire task, you must have permission to execute insert into .... Whether other permissions are required depends on the preSql and postSql in your task configuration.

MatrixOneWriter mainly aims at ETL development engineers who use MatrixOneWriter to import data from data warehouses into MatrixOne. At the same time, MatrixOneWriter can also serve as a data migration tool for users such as DBAs.

Before you start

Before using DataX to write data to MatrixOne, you need to complete the installation of the following software:

Steps

Create a MatrixOne Table

Connect to MatrixOne using the MySQL Client and create a test table in MatrixOne:

  1. CREATE DATABASE mo_demo;
  2. USE mo_demo;
  3. CREATE TABLE m_user(
  4. M_ID INT NOT NULL,
  5. M_NAME CHAR(25) NOT NULL
  6. );

Configure the Data Source

In this example, we use data generated in memory as the data source:

  1. "reader": {
  2. "name": "streamreader",
  3. "parameter": {
  4. "column" : [ # You can write multiple columns
  5. {
  6. "value": 20210106, # Represents the value of this column
  7. "type": "long" # Represents the type of this column
  8. },
  9. {
  10. "value": "matrixone",
  11. "type": "string"
  12. }
  13. ],
  14. "sliceRecordCount": 1000 # Indicates how many times to print
  15. }
  16. }

Write the Job Configuration File

Use the following command to view the configuration template:

  1. python datax.py -r {YOUR_READER} -w matrixonewriter

Write the job configuration file stream2matrixone.json:

  1. {
  2. "job": {
  3. "setting": {
  4. "speed": {
  5. "channel": 1
  6. }
  7. },
  8. "content": [
  9. {
  10. "reader": {
  11. "name": "streamreader",
  12. "parameter": {
  13. "column" : [
  14. {
  15. "value": 20210106,
  16. "type": "long"
  17. },
  18. {
  19. "value": "matrixone",
  20. "type": "string"
  21. }
  22. ],
  23. "sliceRecordCount": 1000
  24. }
  25. },
  26. "writer": {
  27. "name": "matrixonewriter",
  28. "parameter": {
  29. "writeMode": "insert",
  30. "username": "root",
  31. "password": "111",
  32. "column": [
  33. "M_ID",
  34. "M_NAME"
  35. ],
  36. "preSql": [
  37. "delete from m_user"
  38. ],
  39. "connection": [
  40. {
  41. "jdbcUrl": "jdbc:mysql://127.0.0.1:6001/mo_demo",
  42. "table": [
  43. "m_user"
  44. ]
  45. }
  46. ]
  47. }
  48. }
  49. }
  50. ]
  51. }
  52. }

Start DataX

Execute the following command to start DataX:

  1. $ cd {YOUR_DATAX_DIR_BIN}
  2. $ python datax.py stream2matrixone.json

View the Results

Connect to MatrixOne using the MySQL Client and use select to query the inserted results. The 1000 records in memory have been successfully written to MatrixOne.

  1. mysql> select * from m_user limit 5;
  2. +----------+-----------+
  3. | m_id | m_name |
  4. +----------+-----------+
  5. | 20210106 | matrixone |
  6. | 20210106 | matrixone |
  7. | 20210106 | matrixone |
  8. | 20210106 | matrixone |
  9. | 20210106 | matrixone |
  10. +----------+-----------+
  11. 5 rows in set (0.01 sec)
  12. mysql> select count(*) from m_user limit 5;
  13. +----------+
  14. | count(*) |
  15. +----------+
  16. | 1000 |
  17. +----------+
  18. 1 row in set (0.00 sec)

Parameter Descriptions

Here are some commonly used parameters for MatrixOneWriter:

Parameter NameParameter DescriptionMandatoryDefault Value
jdbcUrlJDBC connection information for the target database. DataX will append some attributes to the provided jdbcUrl during runtime, such as yearIsDateType=false&zeroDateTimeBehavior=CONVERT_TO_NULL&rewriteBatchedStatements=true&tinyInt1isBit=false&serverTimezone=Asia/Shanghai.YesNone
usernameUsername for the target database.YesNone
passwordPassword for the target database.YesNone
tableName of the target table. Supports writing to one or more tables. If configuring multiple tables, make sure their structures are consistent.YesNone
columnFields in the target table that must be written with data, separated by commas. For example: “column”: [“id”,”name”,”age”]. To write all columns, you can use , for example: “column”: [““].YesNone
preSqlStandard SQL statements to be executed before writing data to the target table.NoNone
postSqlStandard SQL statements to be executed after writing data to the target table.NoNone
writeModeControls the SQL statements used when writing data to the target table. You can choose insert or update.insert or updateinsert
batchSizeSize of records for batch submission. This can significantly reduce network interactions between DataX and MatrixOne, improving overall throughput. However, setting it too large may cause DataX to run out of memory.No1024

Type Conversion

MatrixOneWriter supports most MatrixOne data types, but a few types still need to be supported, so you need to pay special attention to your data types.

Here is a list of type conversions that MatrixOneWriter performs for MatrixOne data types:

DataX Internal TypeMatrixOne Data Type
Longint, tinyint, smallint, bigint
Doublefloat, double, decimal
Stringvarchar, char, text
Datedate, datetime, timestamp, time
Booleanbool
Bytesblob

Additional References

  • MatrixOne is compatible with the MySQL protocol. MatrixOneWriter is a modified version of the MySQL Writer with adjustments for JDBC driver versions. You can still use the MySQL Writer to write to MatrixOne.

  • To add the MatrixOne Writer in DataX, you need to download matrixonewriter.zip and unzip it into the plugin/writer/ directory in the root directory of your DataX project.

Ask and Questions

Q: During runtime, I encountered the error “Configuration information error, the configuration file you provided /{YOUR_MATRIXONE_WRITER_PATH}/plugin.json does not exist.” What should I do?

A: DataX attempts to find the plugin.json file by searching for similar folders when it starts. If the matrixonewriter.zip file also exists in the same directory, DataX will try to find it in .../datax/plugin/writer/matrixonewriter.zip/plugin.json. In the MacOS environment, DataX will also attempt to see it in .../datax/plugin/writer/.DS_Store/plugin.json. In this case, you need to delete these extra files or folders.