CREATE TABLE

Synopsis

The CREATE TABLE statement is used to create a new table in a keyspace. It defines the table name, column names and types, primary key, and table properties.

Syntax

Diagram

create_table

CREATE TABLE - 图1

table_schema

CREATE TABLE - 图2

table_properties

CREATE TABLE - 图3

Grammar

  1. create_table ::= CREATE TABLE [ IF NOT EXISTS ] table_name
  2. '(' table_element [ ',' table_element ...] ')'
  3. [ table_option [ AND table_option ] ];
  4. table_element ::= table_column | table_constraints
  5. table_column ::= column_name column_type [ column_constraint ...]
  6. column_constraint ::= PRIMARY KEY | STATIC
  7. table_constraints ::= PRIMARY KEY '(' partition_key_column_list clustering_key_column_list ')'
  8. partition_key_column_list ::= '(' column_name [ ',' column_name ...] ')' | column_name
  9. clustering_key_column_list ::= [ ',' column_name ...]
  10. table_option ::= WITH table_property [ AND table_property ...]
  11. table_property ::= { property_name = property_literal
  12. | CLUSTERING ORDER BY '(' column_ordering_property [ ',' column_ordering_property ...] ')'
  13. | COMPACT STORAGE }
  14. column_ordering_property ::= column_name [ ASC | DESC ]

Where

  • table_name, column_name, and property_name are identifiers (table_name may be qualified with a keyspace name).
  • property_literal is a literal of either boolean, text, or map data type.

Semantics

  • An error is raised if table_name already exists in the associated keyspace unless the IF NOT EXISTS option is used.

PRIMARY KEY

  • Primary key must be defined in either column_constraint or table_constraint but not in both of them.
  • Each row in a table is uniquely identified by its primary key.
  • Primary key columns are either partitioning columns or clustering columns (described below).
  • If primary key is set as a column constraint, then that column is the partition column and there are no clustering columns.
  • If primary key is set as a table constraint then:
    • The partition columns are given by the first entry in the primary key list: the nested column list (if given), otherwise the first column.
    • The clustering columns are the rest of the columns in the primary key list (if any).

PARTITION KEY

  • Partition key is required and defines a split of rows into partitions.
  • Rows that share the same partition key form a partition and will be colocated on the same replica node.

CLUSTERING KEY

  • Clustering key is optional and defines an ordering for rows within a partition.
  • Default ordering is ascending (ASC) but can be set for each clustering column as ascending or descending using the CLUSTERING ORDER BY table property.

STATIC COLUMNS

  • Columns declared as STATIC will share the same value for all rows within a partition (i.e. rows having the same partition key).
  • Columns in the primary key cannot be static.
  • A table without clustering columns cannot have static columns (without clustering columns the primary key and the partition key are identical so static columns would be the same as regular columns).

TABLE PROPERTIES

  • The CLUSTERING ORDER BY property can be used to set the ordering for each clustering column individually (default is ASC).
  • The default_time_to_live property sets the default expiration time (TTL) in seconds for a table. The expiration time can be overridden by setting TTL for individual rows. The default value is 0 and means rows do not expire.
  • The transactions property specifies if distributed transactions are enabled in the table. To enable distributed transactions, use transactions = { 'enabled' : true }.
  • The tablets property specifies the number of tablets to be used. This is useful for two data center (2DC) deployments. See example below: Create CDC table specifying number of tablets
  • Use the AND operator to use multiple table properties.
  • The other CQL table properties are allowed in the syntax but are currently ignored internally (have no effect).

Examples

Use column constraint to define primary key

‘user_id’ is the partitioning column and there are no clustering columns.

  1. cqlsh:example> CREATE TABLE users(user_id INT PRIMARY KEY, full_name TEXT);

Use table constraint to define primary key

‘supplier_id’ and ‘device_id’ are the partitioning columns and ‘model_year’ is the clustering column.

  1. cqlsh:example> CREATE TABLE devices(supplier_id INT,
  2. device_id INT,
  3. model_year INT,
  4. device_name TEXT,
  5. PRIMARY KEY((supplier_id, device_id), model_year));

Use column constraint to define a static column.

You can do this as shown below.

  1. cqlsh:example> CREATE TABLE items(supplier_id INT,
  2. item_id INT,
  3. supplier_name TEXT STATIC,
  4. item_name TEXT,
  5. PRIMARY KEY((supplier_id), item_id));
  1. cqlsh:example> INSERT INTO items(supplier_id, item_id, supplier_name, item_name)
  2. VALUES (1, 1, 'Unknown', 'Wrought Anvil');
  1. cqlsh:example> INSERT INTO items(supplier_id, item_id, supplier_name, item_name)
  2. VALUES (1, 2, 'Acme Corporation', 'Giant Rubber Band');
  1. cqlsh:example> SELECT * FROM devices;
  1. supplier_id | item_id | supplier_name | item_name
  2. -------------+---------+------------------+-------------------
  3. 1 | 1 | Acme Corporation | Wrought Anvil
  4. 1 | 2 | Acme Corporation | Giant Rubber Band

Use table property to define the order (ascending or descending) for clustering columns

Timestamp column ‘ts’ will be stored in descending order (latest values first).

  1. cqlsh:example> CREATE TABLE user_actions(user_id INT,
  2. ts TIMESTAMP,
  3. action TEXT,
  4. PRIMARY KEY((user_id), ts))
  5. WITH CLUSTERING ORDER BY (ts DESC);
  1. cqlsh:example> INSERT INTO user_actions(user_id, ts, action) VALUES (1, '2000-12-2 12:30:15', 'log in');
  1. cqlsh:example> INSERT INTO user_actions(user_id, ts, action) VALUES (1, '2000-12-2 12:30:25', 'change password');
  1. cqlsh:example> INSERT INTO user_actions(user_id, ts, action) VALUES (1, '2000-12-2 12:30:35', 'log out');
  1. cqlsh:example> SELECT * FROM user_actions;
  1. user_id | ts | action
  2. ---------+---------------------------------+-----------------
  3. 1 | 2000-12-02 19:30:35.000000+0000 | log out
  4. 1 | 2000-12-02 19:30:25.000000+0000 | change password
  5. 1 | 2000-12-02 19:30:15.000000+0000 | log in

Use table property to define the default expiration time for rows

You can do this as shown below.

  1. cqlsh:example> CREATE TABLE sensor_data(sensor_id INT,
  2. ts TIMESTAMP,
  3. value DOUBLE,
  4. PRIMARY KEY((sensor_id), ts))
  5. WITH default_time_to_live = 5;

First insert at time T (row expires at T + 5).

  1. cqlsh:example> INSERT INTO sensor_data(sensor_id, ts, value) VALUES (1, '2017-10-1 11:22:31', 3.1);

Second insert 3 seconds later (row expires at T + 8).

  1. cqlsh:example> INSERT INTO sensor_data(sensor_id, ts, value) VALUES (2, '2017-10-1 11:22:34', 3.4);

First select 3 seconds later (at time T + 6).

  1. cqlsh:example> SELECT * FROM sensor_data;
  1. sensor_id | ts | value
  2. -----------+---------------------------------+-------
  3. 2 | 2017-10-01 18:22:34.000000+0000 | 3.4

Second select 3 seconds later (at time T + 9).

  1. cqlsh:example> SELECT * FROM sensor_data;
  1. sensor_id | ts | value
  2. -----------+----+-------

Create CDC table specifying number of tablets

For two data center (2DC) deployments that require the identical number of tablets on both clusters, you can use the CREATE TABLE statement with the WITH clause to specify the number of tablets.

  1. cqlsh:example> CREATE TABLE tracking (id int PRIMARY KEY) WITH tablets = 10;

If you create an index for these tables, you can also specify the number of tablets for the index.

You can also use AND to add other table properties, like in this example.

  1. cqlsh:example> CREATE TABLE tracking (id int PRIMARY KEY) WITH tablets = 10 AND transactions = { 'enabled' : true };

See also