Orc Format
Format: Serialization Schema Format: Deserialization Schema
The Apache Orc format allows to read and write Orc data.
Dependencies
In order to use the ORC format the following dependencies are required for both projects using a build automation tool (such as Maven or SBT) and SQL Client with SQL JAR bundles.
| Maven dependency | SQL Client |
|---|---|
Copied to clipboard! | Download |
How to create a table with Orc format
Here is an example to create a table using Filesystem connector and Orc format.
CREATE TABLE user_behavior (user_id BIGINT,item_id BIGINT,category_id BIGINT,behavior STRING,ts TIMESTAMP(3),dt STRING) PARTITIONED BY (dt) WITH ('connector' = 'filesystem','path' = '/tmp/user_behavior','format' = 'orc')
Format Options
| Option | Required | Default | Type | Description |
|---|---|---|---|---|
format | required | (none) | String | Specify what format to use, here should be ‘orc’. |
Orc format also supports table properties from Table properties. For example, you can configure orc.compress=SNAPPY to enable snappy compression.
Data Type Mapping
Orc format type mapping is compatible with Apache Hive. The following table lists the type mapping from Flink type to Orc type.
| Flink Data Type | Orc physical type | Orc logical type |
|---|---|---|
| CHAR | bytes | CHAR |
| VARCHAR | bytes | VARCHAR |
| STRING | bytes | STRING |
| BOOLEAN | long | BOOLEAN |
| BYTES | bytes | BINARY |
| DECIMAL | decimal | DECIMAL |
| TINYINT | long | BYTE |
| SMALLINT | long | SHORT |
| INT | long | INT |
| BIGINT | long | LONG |
| FLOAT | double | FLOAT |
| DOUBLE | double | DOUBLE |
| DATE | long | DATE |
| TIMESTAMP | timestamp | TIMESTAMP |
| ARRAY | - | LIST |
| MAP | - | MAP |
| ROW | - | STRUCT |