Dynamic Table

SinceVersion 2.0.0

A dynamic schema table is a special kind of table which schema expands automatically with the import procedure. Currently, this feature is mainly used for importing semi-structured data such as JSON. Because JSON is self-describing, we can extract the schema information from the original document and infer the final type information. This special table can reduce manual schema change operations and easily import semi-structured data and automatically expand its schema.

Terminology

  • Schema change, changing the structure of the table, such as adding columns, reducing columns, changing column types
  • Static column, column specified during table creation, such as partition columns, primary key columns
  • Dynamic column, columns automatically recognized and added during import

Create dynamic table

  1. CREATE DATABASE test_dynamic_table;
  2. -- Create table and specify static column types, import will automatically convert to the type of static column
  3. -- Choose random bucketing
  4. CREATE TABLE IF NOT EXISTS test_dynamic_table (
  5. qid bigint,
  6. `answers.date` array<datetime>,
  7. `title` string,
  8. ... -- ... Identifying a table as a dynamic table and its syntax for dynamic tables.
  9. )
  10. DUPLICATE KEY(`qid`)
  11. DISTRIBUTED BY RANDOM BUCKETS 5
  12. properties("replication_num" = "1");
  13. -- Three Columns are added to the table by default, and their types are specified
  14. mysql> DESC test_dynamic_table;
  15. +--------------+-----------------+------+-------+---------+-------+
  16. | Field | Type | Null | Key | Default | Extra |
  17. +--------------+-----------------+------+-------+---------+-------+
  18. | qid | BIGINT | Yes | true | NULL | |
  19. | answers.date | ARRAY<DATETIME> | Yes | false | NULL | NONE |
  20. | user | TEXT | Yes | false | NULL | NONE |
  21. +--------------+-----------------+------+-------+---------+-------+
  22. 3 rows in set (0.00 sec)

Importing data

  1. -- example1.json
  2. '{
  3. "title": "Display Progress Bar at the Time of Processing",
  4. "qid": "1000000",
  5. "answers": [
  6. {"date": "2009-06-16T09:55:57.320", "user": "Micha\u0142 Niklas (22595)"},
  7. {"date": "2009-06-17T12:34:22.643", "user": "Jack Njiri (77153)"}
  8. ],
  9. "tag": ["vb6", "progress-bar"],
  10. "user": "Jash",
  11. "creationdate": "2009-06-16T07:28:42.770"
  12. }'
  13. curl -X PUT -T example1.json --location-trusted -u root: -H "read_json_by_line:false" -H "format:json" http://127.0.0.1:8147/api/regression_test_dynamic_table/test_dynamic_table/_stream_load
  14. -- Added five new columns: `title`, `answers.user`, `tag`, `title`, `creationdate`
  15. -- The types of the three columns: `qid`, `answers.date`, `user` remain the same as with the table was created
  16. -- The default value of the new array type is an empty array []
  17. mysql> DESC test_dynamic_table;
  18. +--------------+-----------------+------+-------+---------+-------+
  19. | Field | Type | Null | Key | Default | Extra |
  20. +--------------+-----------------+------+-------+---------+-------+
  21. | qid | BIGINT | Yes | true | NULL | |
  22. | answers.date | ARRAY<DATETIME> | Yes | false | NULL | NONE |
  23. | title | TEXT | Yes | false | NULL | NONE |
  24. | answers.user | ARRAY<TEXT> | No | false | [] | NONE |
  25. | tag | ARRAY<TEXT> | No | false | [] | NONE |
  26. | user | TEXT | Yes | false | NULL | NONE |
  27. | creationdate | TEXT | Yes | false | NULL | NONE |
  28. | date | TEXT | Yes | false | NULL | NONE |
  29. +--------------+-----------------+------+-------+---------+-------+
  30. -- Batch import data
  31. -- Specifying -H "read_json_by_line:true", parsing JSON line by line
  32. curl -X PUT -T example_batch.json --location-trusted -u root: -H "read_json_by_line:true" -H "format:json" http://127.0.0.1:8147/api/regression_test_dynamic_table/test_dynamic_table/_stream_load
  33. -- Specifying -H "strip_outer_array:true", parsing the entire file as a JSON array, each element in the array is the same, more efficient parsing way
  34. curl -X PUT -T example_batch_array.json --location-trusted -u root: -H "strip_outer_array:true" -H "format:json" http://127.0.0.1:8147/api/regression_test_dynamic_table/test_dynamic_table/_stream_load

For a dynamic table, you can also use S3load or Routine load, with similar usage.

Adding Index to Dynamic Columns

  1. -- Create an inverted index on the title column, using English parsing.
  2. CREATE INDEX title_idx ON test_dynamic_table (`title`) using inverted PROPERTIES("parser"="english")

Type conflict resolution

In the first batch import, the unified type will be automatically inferred and used as the final Column type, so it is recommended to keep the Column type consistent, for example:

  1. {"id" : 123}
  2. {"id" : "123"}
  3. -- The type will finally be inferred as Text type, and if {"id" : 123} is imported later, the type will automatically be converted to String type
  4. For types that cannot be unified, such as:
  5. {"id" : [123]}
  6. {"id" : 123}
  7. -- Importing will result in an error."