CREATE TABLE

description

Syntax

  1. CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [database.]table_name
  2. (column_definition1[, column_definition2, ...])
  3. [ENGINE = [olap|mysql|broker]]
  4. [key_desc]
  5. [COMMENT "table comment"]
  6. [partition_desc]
  7. [distribution_desc]
  8. [PROPERTIES ("key"="value", ...)]
  9. [BROKER PROPERTIES ("key"="value", ...)];
  1. column_definition

    Syntax:

    col_name col_type [agg_type] [NULL | NOT NULL] [DEFAULT “default_value”]

    Explain: col_name: Name of column col_type: Type of column

    1. TINYINT(1 Byte)
    2. Range: -2^7 + 1 ~ 2^7 - 1
    3. SMALLINT(2 Bytes)
    4. Range: -2^15 + 1 ~ 2^15 - 1
    5. INT(4 Bytes)
    6. Range: -2^31 + 1 ~ 2^31 - 1
    7. BIGINT(8 Bytes)
    8. Range: -2^63 + 1 ~ 2^63 - 1
    9. LARGEINT(16 Bytes)
    10. Range: -2^127 + 1 ~ 2^127 - 1
    11. FLOAT(4 Bytes)
    12. Support scientific notation
    13. DOUBLE(12 Bytes)
    14. Support scientific notation
    15. DECIMAL[(precision, scale)] (16 Bytes)
    16. Default is DECIMAL(10, 0)
    17. precision: 1 ~ 27
    18. scale: 0 ~ 9
    19. integer part: 1 ~ 18
    20. fractional part: 0 ~ 9
    21. Not support scientific notation
    22. DATE(3 Bytes)
    23. Range: 1900-01-01 ~ 9999-12-31
    24. DATETIME(8 Bytes)
    25. Range: 1900-01-01 00:00:00 ~ 9999-12-31 23:59:59
    26. CHAR[(length)]
    27. Fixed length string. Range: 1 ~ 255. Default: 1
    28. VARCHAR[(length)]
    29. Variable length string. Range: 1 ~ 65533
    30. HLL (1~16385 Bytes)
    31. HLLL tpye, No need to specify length.
    32. This type can only be queried by hll_union_agg, hll_cardinality, hll_hash functions.

    agg_type: Aggregation type. If not specified, the column is key column. Otherwise, the column is value column.

    1. * SUMMAXMINREPLACE
    2. * HLL_UNION: Only for HLL type
    3. * REPLACE_IF_NOT_NULL: The meaning of this aggregation type is that substitution will occur if and only if the newly imported data is a non-null value. If the newly imported data is null, Doris will still retain the original value. Note: if NOT NULL is specified in the REPLACE_IF_NOT_NULL column when the user creates the table, Doris will convert it to NULL and will not report an error to the user. Users can leverage this aggregate type to achieve importing some of columns.
    4. * BITMAP_UNION: Only for BITMAP type

    Allow NULL: Default is NOT NULL. NULL value should be represented as \N in load source file.

    Notice:
    The origin value of BITMAP_UNION column should be TINYINT, SMALLINT, INT.

  2. ENGINE type

    Default is olap. Options are: olap, mysql, broker

    1. For mysql, properties should include:

      1. PROPERTIES (
      2. "host" = "mysql_server_host",
      3. "port" = "mysql_server_port",
      4. "user" = "your_user_name",
      5. "password" = "your_password",
      6. "database" = "database_name",
      7. "table" = "table_name"
      8. )
  1. Notice:
  2. ```
  3. "table_name" is the real table name in MySQL database.
  4. table_name in CREATE TABLE stmt is table is Doris. They can be different or same.
  5. MySQL table created in Doris is for accessing data in MySQL database.
  6. Doris does not maintain and store any data from MySQL table.
  7. ```
  8. 2. For broker, properties should include:
  9. ```
  10. PROPERTIES (
  11. "broker_name" = "broker_name",
  12. "path" = "file_path1[,file_path2]",
  13. "column_separator" = "value_separator"
  14. "line_delimiter" = "value_delimiter"
  15. )
  16. ```
  17. ```
  18. BROKER PROPERTIES(
  19. "username" = "name",
  20. "password" = "password"
  21. )
  22. ```
  23. For different broker, the broker properties are different
  24. Notice:
  25. ```
  26. Files name in "path" is separated by ",". If file name includes ",", use "%2c" instead. If file name includes "%", use "%25" instead.
  27. Support CSV and Parquet. Support GZ, BZ2, LZ4, LZO(LZOP)
  28. ```
  1. key_desc

    Syntax: key_type(k1[,k2 …])

    Explain:

    1. Data is orderd by specified key columns. And has different behaviors for different key desc.
    2. AGGREGATE KEY:
    3. value columns will be aggregated is key columns are same.
    4. UNIQUE KEY:
    5. The new incoming rows will replace the old rows if key columns are same.
    6. DUPLICATE KEY:
    7. All incoming rows will be saved.

    NOTICE: Except for AGGREGATE KEY, no need to specify aggregation type for value columns.

  2. partition_desc Partition has two ways to use:

    1. LESS THAN

    Syntex:

    PARTITION BY RANGE (k1, k2, …) ( PARTITION partition_name1 VALUES LESS THAN MAXVALUE|(“value1”, “value2”, …), PARTITION partition_name2 VALUES LESS THAN MAXVALUE|(“value1”, “value2”, …) … )

    1. ```
    2. Explain:

    1) Partition name only support [A-z0-9_] 2) Partition key column’s type should be:

    1. TINYINT, SMALLINT, INT, BIGINT, LARGEINT, DATE, DATETIME

    3) The range is [closed, open). And the lower bound of first partition is MIN VALUE of specifed column type. 4) NULL values should be save in partition which includes MIN VALUE. 5) Support multi partition columns, the the default partition value is MIN VALUE.

    1. 2Fixed Range Syntex:
    1. PARTITION BY RANGE (k1, k2, k3, ...)
    2. (
    3. PARTITION partition_name1 VALUES [("k1-lower1", "k2-lower1", "k3-lower1",...), ("k1-upper1", "k2-upper1", "k3-upper1", ...)),
    4. PARTITION partition_name2 VALUES [("k1-lower1-2", "k2-lower1-2", ...), ("k1-upper1-2", MAXVALUE, ))
    5. "k3-upper1-2", ...
    6. )

    ```

    Explain: 1)The Fixed Range is more flexible than the LESS THAN, and the left and right intervals are completely determined by the user. 2)Others are consistent with LESS THAN.

  3. distribution_desc

    1. Hash Syntax: DISTRIBUTED BY HASH (k1[,k2 …]) [BUCKETS num] Explain: The default buckets is 10.
  4. PROPERTIES

    1. If ENGINE type is olap. User can specify storage medium, cooldown time and replication number:

      1. PROPERTIES (
      2. "storage_medium" = "[SSD|HDD]",
      3. ["storage_cooldown_time" = "yyyy-MM-dd HH:mm:ss"],
      4. ["replication_num" = "3"]
      5. )

      storage_medium: SSD or HDD storage_cooldown_time: If storage_medium is SSD, data will be automatically moved to HDD when timeout. Default is 7 days. Format: “yyyy-MM-dd HH:mm:ss” replication_num: Replication number of a partition. Default is 3.

      If table is not range partitions. This property takes on Table level. Or it will takes on Partition level.

      User can specify different properties for different partition by ADD PARTITION or MODIFY PARTITION statements.

    2. If Engine type is olap, user can set bloom filter index for column.

      Bloom filter index will be used when query contains IN or EQUAL. Bloom filter index support key columns with type except TINYINT FLOAT DOUBLE, also support value with REPLACE aggregation type.

      1. PROPERTIES (
      2. "bloom_filter_columns"="k1,k2,k3"
      3. )
    3. For Colocation Join:

      1. PROPERTIES (
      2. "colocate_with"="table1"
      3. )

example

  1. 1. Create an olap table, distributed by hash, with aggregation type.
  1. CREATE TABLE example_db.table_hash
  2. (
  3. k1 TINYINT,
  4. k2 DECIMAL(10, 2) DEFAULT "10.5",
  5. v1 CHAR(10) REPLACE,
  6. v2 INT SUM
  7. )
  8. ENGINE=olap
  9. AGGREGATE KEY(k1, k2)
  10. COMMENT "my first doris table"
  11. DISTRIBUTED BY HASH(k1) BUCKETS 32
  12. PROPERTIES ("storage_type"="column");
  13. ```
  1. Create an olap table, distributed by hash, with aggregation type. Also set storage medium and cooldown time.

    ``` CREATE TABLE example_db.table_hash ( k1 BIGINT, k2 LARGEINT, v1 VARCHAR(2048) REPLACE, v2 SMALLINT SUM DEFAULT “10” ) ENGINE=olap UNIQUE KEY(k1, k2) DISTRIBUTED BY HASH (k1, k2) BUCKETS 32 PROPERTIES( “storage_type”=”column”, “storage_medium” = “SSD”, “storage_cooldown_time” = “2015-06-04 00:00:00” );

  2. Create an olap table, with range partitioned, distributed by hash. 1) LESS THAN

    1. CREATE TABLE example_db.table_range
    2. (
    3. k1 DATE,
    4. k2 INT,
    5. k3 SMALLINT,
    6. v1 VARCHAR(2048),
    7. v2 DATETIME DEFAULT "2014-02-04 15:36:00"
    8. )
    9. ENGINE=olap
    10. DUPLICATE KEY(k1, k2, k3)
    11. PARTITION BY RANGE (k1)
    12. (
    13. PARTITION p1 VALUES LESS THAN ("2014-01-01"),
    14. PARTITION p2 VALUES LESS THAN ("2014-06-01"),
    15. PARTITION p3 VALUES LESS THAN ("2014-12-01")
    16. )
    17. DISTRIBUTED BY HASH(k2) BUCKETS 32
    18. PROPERTIES(
    19. "storage_medium" = "SSD", "storage_cooldown_time" = "2015-06-04 00:00:00"
    20. );

    Explain: This statement will create 3 partitions:

    1. ( { MIN }, {"2014-01-01"} )
    2. [ {"2014-01-01"}, {"2014-06-01"} )
    3. [ {"2014-06-01"}, {"2014-12-01"} )

    Data outside these ranges will not be loaded.

2) Fixed Range CREATE TABLE table_range ( k1 DATE, k2 INT, k3 SMALLINT, v1 VARCHAR(2048), v2 DATETIME DEFAULT “2014-02-04 15:36:00” ) ENGINE=olap DUPLICATE KEY(k1, k2, k3) PARTITION BY RANGE (k1, k2, k3) ( PARTITION p1 VALUES [(“2014-01-01”, “10”, “200”), (“2014-01-01”, “20”, “300”)), PARTITION p2 VALUES [(“2014-06-01”, “100”, “200”), (“2014-07-01”, “100”, “300”)) ) DISTRIBUTED BY HASH(k2) BUCKETS 32 PROPERTIES( “storage_medium” = “SSD” );

  1. Create a mysql table

    1. CREATE TABLE example_db.table_mysql
    2. (
    3. k1 DATE,
    4. k2 INT,
    5. k3 SMALLINT,
    6. k4 VARCHAR(2048),
    7. k5 DATETIME
    8. )
    9. ENGINE=mysql
    10. PROPERTIES
    11. (
    12. "host" = "127.0.0.1",
    13. "port" = "8239",
    14. "user" = "mysql_user",
    15. "password" = "mysql_passwd",
    16. "database" = "mysql_db_test",
    17. "table" = "mysql_table_test"
    18. );
  2. Create a broker table, with file on HDFS, line delimit by “|”, column separated by “\n”

    1. CREATE EXTERNAL TABLE example_db.table_broker (
    2. k1 DATE,
    3. k2 INT,
    4. k3 SMALLINT,
    5. k4 VARCHAR(2048),
    6. k5 DATETIME
    7. )
    8. ENGINE=broker
    9. PROPERTIES (
    10. "broker_name" = "hdfs",
    11. "path" = "hdfs://hdfs_host:hdfs_port/data1,hdfs://hdfs_host:hdfs_port/data2,hdfs://hdfs_host:hdfs_port/data3%2c4",
    12. "column_separator" = "|",
    13. "line_delimiter" = "\n"
    14. )
    15. BROKER PROPERTIES (
    16. "username" = "hdfs_user",
    17. "password" = "hdfs_password"
    18. );
  3. Create table will HLL column

    1. CREATE TABLE example_db.example_table
    2. (
    3. k1 TINYINT,
    4. k2 DECIMAL(10, 2) DEFAULT "10.5",
    5. v1 HLL HLL_UNION,
    6. v2 HLL HLL_UNION
    7. )
    8. ENGINE=olap
    9. AGGREGATE KEY(k1, k2)
    10. DISTRIBUTED BY HASH(k1) BUCKETS 32;
  4. Create a table will BITMAP_UNION column

    1. CREATE TABLE example_db.example_table
    2. (
    3. k1 TINYINT,
    4. k2 DECIMAL(10, 2) DEFAULT "10.5",
    5. v1 VARCHAR(0) BITMAP_UNION, // 注意: bitmap_union的varchar长度需要指定为0
    6. v2 VARCHAR(0) BITMAP_UNION
    7. )
    8. ENGINE=olap
    9. AGGREGATE KEY(k1, k2)
    10. DISTRIBUTED BY HASH(k1) BUCKETS 32;
  5. Create 2 colocate join table.

    1. CREATE TABLE `t1` (
    2. `id` int(11) COMMENT "",
    3. `value` varchar(8) COMMENT ""
    4. ) ENGINE=OLAP
    5. DUPLICATE KEY(`id`)
    6. DISTRIBUTED BY HASH(`id`) BUCKETS 10
    7. PROPERTIES (
    8. "colocate_with" = "group1"
    9. );
    10. CREATE TABLE `t2` (
    11. `id` int(11) COMMENT "",
    12. `value` varchar(8) COMMENT ""
    13. ) ENGINE=OLAP
    14. DUPLICATE KEY(`id`)
    15. DISTRIBUTED BY HASH(`id`) BUCKETS 10
    16. PROPERTIES (
    17. "colocate_with" = "group1"
    18. );
  6. Create a broker table, with file on BOS.

    1. CREATE EXTERNAL TABLE example_db.table_broker (
    2. k1 DATE
    3. )
    4. ENGINE=broker
    5. PROPERTIES (
    6. "broker_name" = "bos",
    7. "path" = "bos://my_bucket/input/file",
    8. )
    9. BROKER PROPERTIES (
    10. "bos_endpoint" = "http://bj.bcebos.com",
    11. "bos_accesskey" = "xxxxxxxxxxxxxxxxxxxxxxxxxx",
    12. "bos_secret_accesskey"="yyyyyyyyyyyyyyyyyyyy"
    13. );

    ```

keyword

  1. CREATE,TABLE