CREATE HIVEFORMAT TABLE

Description

The CREATE TABLE statement defines a new table using Hive format.

Syntax

  1. CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier
  2. [ ( col_name1[:] col_type1 [ COMMENT col_comment1 ], ... ) ]
  3. [ COMMENT table_comment ]
  4. [ PARTITIONED BY ( col_name2[:] col_type2 [ COMMENT col_comment2 ], ... )
  5. | ( col_name1, col_name2, ... ) ]
  6. [ CLUSTERED BY ( col_name1, col_name2, ...)
  7. [ SORTED BY ( col_name1 [ ASC | DESC ], col_name2 [ ASC | DESC ], ... ) ]
  8. INTO num_buckets BUCKETS ]
  9. [ ROW FORMAT row_format ]
  10. [ STORED AS file_format ]
  11. [ LOCATION path ]
  12. [ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ]
  13. [ AS select_statement ]

Note that, the clauses between the columns definition clause and the AS SELECT clause can come in as any order. For example, you can write COMMENT table_comment after TBLPROPERTIES.

Parameters

  • table_identifier

    Specifies a table name, which may be optionally qualified with a database name.

    Syntax: [ database_name. ] table_name

  • EXTERNAL

    Table is defined using the path provided as LOCATION, does not use default location for this table.

  • PARTITIONED BY

    Partitions are created on the table, based on the columns specified.

  • CLUSTERED BY

    Partitions created on the table will be bucketed into fixed buckets based on the column specified for bucketing.

    NOTE: Bucketing is an optimization technique that uses buckets (and bucketing columns) to determine data partitioning and avoid data shuffle.

  • SORTED BY

    Specifies an ordering of bucket columns. Optionally, one can use ASC for an ascending order or DESC for a descending order after any column names in the SORTED BY clause. If not specified, ASC is assumed by default.

  • INTO num_buckets BUCKETS

    Specifies buckets numbers, which is used in CLUSTERED BY clause.

  • row_format

    Specifies the row format for input and output. See HIVE FORMAT for more syntax details.

  • STORED AS

    File format for table storage, could be TEXTFILE, ORC, PARQUET, etc.

  • LOCATION

    Path to the directory where table data is stored, which could be a path on distributed storage like HDFS, etc.

  • COMMENT

    A string literal to describe the table.

  • TBLPROPERTIES

    A list of key-value pairs that is used to tag the table definition.

  • AS select_statement

    The table is populated using the data from the select statement.

Examples

  1. --Use hive format
  2. CREATE TABLE student (id INT, name STRING, age INT) STORED AS ORC;
  3. --Use data from another table
  4. CREATE TABLE student_copy STORED AS ORC
  5. AS SELECT * FROM student;
  6. --Specify table comment and properties
  7. CREATE TABLE student (id INT, name STRING, age INT)
  8. COMMENT 'this is a comment'
  9. STORED AS ORC
  10. TBLPROPERTIES ('foo'='bar');
  11. --Specify table comment and properties with different clauses order
  12. CREATE TABLE student (id INT, name STRING, age INT)
  13. STORED AS ORC
  14. TBLPROPERTIES ('foo'='bar')
  15. COMMENT 'this is a comment';
  16. --Create partitioned table
  17. CREATE TABLE student (id INT, name STRING)
  18. PARTITIONED BY (age INT)
  19. STORED AS ORC;
  20. --Create partitioned table with different clauses order
  21. CREATE TABLE student (id INT, name STRING)
  22. STORED AS ORC
  23. PARTITIONED BY (age INT);
  24. --Use Row Format and file format
  25. CREATE TABLE student (id INT, name STRING)
  26. ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
  27. STORED AS TEXTFILE;
  28. --Use complex datatype
  29. CREATE EXTERNAL TABLE family(
  30. name STRING,
  31. friends ARRAY<STRING>,
  32. children MAP<STRING, INT>,
  33. address STRUCT<street: STRING, city: STRING>
  34. )
  35. ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ESCAPED BY '\\'
  36. COLLECTION ITEMS TERMINATED BY '_'
  37. MAP KEYS TERMINATED BY ':'
  38. LINES TERMINATED BY '\n'
  39. NULL DEFINED AS 'foonull'
  40. STORED AS TEXTFILE
  41. LOCATION '/tmp/family/';
  42. --Use predefined custom SerDe
  43. CREATE TABLE avroExample
  44. ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  45. STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  46. OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  47. TBLPROPERTIES ('avro.schema.literal'='{ "namespace": "org.apache.hive",
  48. "name": "first_schema",
  49. "type": "record",
  50. "fields": [
  51. { "name":"string1", "type":"string" },
  52. { "name":"string2", "type":"string" }
  53. ] }');
  54. --Use personalized custom SerDe(we may need to `ADD JAR xxx.jar` first to ensure we can find the serde_class,
  55. --or you may run into `CLASSNOTFOUND` exception)
  56. ADD JAR /tmp/hive_serde_example.jar;
  57. CREATE EXTERNAL TABLE family (id INT, name STRING)
  58. ROW FORMAT SERDE 'com.ly.spark.serde.SerDeExample'
  59. STORED AS INPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleInputFormat'
  60. OUTPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleOutputFormat'
  61. LOCATION '/tmp/family/';
  62. --Use `CLUSTERED BY` clause to create bucket table without `SORTED BY`
  63. CREATE TABLE clustered_by_test1 (ID INT, AGE STRING)
  64. CLUSTERED BY (ID)
  65. INTO 4 BUCKETS
  66. STORED AS ORC
  67. --Use `CLUSTERED BY` clause to create bucket table with `SORTED BY`
  68. CREATE TABLE clustered_by_test2 (ID INT, NAME STRING)
  69. PARTITIONED BY (YEAR STRING)
  70. CLUSTERED BY (ID, NAME)
  71. SORTED BY (ID ASC)
  72. INTO 3 BUCKETS
  73. STORED AS PARQUET