Presto
This documentation is a guide for using Paimon in Presto.
Version
Paimon currently supports Presto 0.236 and above.
Preparing Paimon Jar File
Download from master: https://paimon.apache.org/docs/master/project/download/
You can also manually build a bundled jar from the source code.
To build from the source code, clone the git repository.
Build presto connector plugin with the following command.
mvn clean install -DskipTests
After the packaging is complete, you can choose the corresponding connector based on your own Presto version:
| Version | Package |
|---|---|
| [0.236, 0.268) | ./paimon-presto-0.236/target/paimon-presto-0.236-0.9.0-plugin.tar.gz |
| [0.268, 0.273) | ./paimon-presto-0.268/target/paimon-presto-0.268-0.9.0-plugin.tar.gz |
| [0.273, latest] | ./paimon-presto-0.273/target/paimon-presto-0.273-0.9.0-plugin.tar.gz |
Of course, we also support different versions of Hive and Hadoop. But note that we utilize Presto-shaded versions of Hive and Hadoop packages to address dependency conflicts. You can check the following two links to select the appropriate versions of Hive and Hadoop:
Both Hive 2 and 3, as well as Hadoop 2 and 3, are supported.
For example, if your presto version is 0.274, hive and hadoop version is 2.x, you could run:
mvn clean install -DskipTests -am -pl paimon-presto-0.273 -Dpresto.version=0.274 -Dhadoop.apache2.version=2.7.4-9 -Dhive.apache.version=1.2.2-2
Tmp Dir
Paimon will unzip some jars to the tmp directory for codegen. By default, Presto will use '/tmp' as the temporary directory, but '/tmp' may be periodically deleted.
You can configure this environment variable when Presto starts:
-Djava.io.tmpdir=/path/to/other/tmpdir
Let Paimon use a secure temporary directory.
Configure Paimon Catalog
Install Paimon Connector
tar -zxf paimon-presto-${PRESTO_VERSION}/target/paimon-presto-${PRESTO_VERSION}-${PAIMON_VERSION}-plugin.tar.gz -C ${PRESTO_HOME}/plugin
Note that, the variable PRESTO_VERSION is module name, must be one of 0.236, 0.268, 0.273.
Configuration
cd ${PRESTO_HOME}mkdir -p etc/catalog
connector.name=paimon# set your filesystem path, such as hdfs://namenode01:8020/path and s3://${YOUR_S3_BUCKET}/pathwarehouse=${YOUR_FS_PATH}
If you are using HDFS FileSystem, you will also need to do one more thing: choose one of the following ways to configure your HDFS:
- set environment variable HADOOP_HOME.
- set environment variable HADOOP_CONF_DIR.
- configure
hadoop-conf-dirin the properties.
If you are using S3 FileSystem, you need to add paimon-s3-${PAIMON_VERSION}.jar in ${PRESTO_HOME}/plugin/paimon and additionally configure the following properties in paimon.properties:
s3.endpoint=${YOUR_ENDPOINTS}s3.access-key=${YOUR_AK}s3.secret-key=${YOUR_SK}
Query HiveCatalog table:
vim etc/catalog/paimon.properties
and set the following config:
connector.name=paimon# set your filesystem path, such as hdfs://namenode01:8020/path and s3://${YOUR_S3_BUCKET}/pathwarehouse=${YOUR_FS_PATH}metastore=hiveuri=thrift://${YOUR_HIVE_METASTORE}:9083
Kerberos
You can configure kerberos keytab file when using KERBEROS authentication in the properties.
security.kerberos.login.principal=hadoop-usersecurity.kerberos.login.keytab=/etc/presto/hdfs.keytab
Keytab files must be distributed to every node in the cluster that runs Presto.
Create Schema
CREATE SCHEMA paimon.test_db;
Create Table
CREATE TABLE paimon.test_db.orders (order_key bigint,order_status varchar,total_price decimal(18,4),order_date date)WITH (file_format = 'ORC',primary_key = ARRAY['order_key','order_date'],partitioned_by = ARRAY['order_date'],bucket = '2',bucket_key = 'order_key',changelog_producer = 'input')
Add Column
CREATE TABLE paimon.test_db.orders (order_key bigint,orders_tatus varchar,total_price decimal(18,4),order_date date)WITH (file_format = 'ORC',primary_key = ARRAY['order_key','order_date'],partitioned_by = ARRAY['order_date'],bucket = '2',bucket_key = 'order_key',changelog_producer = 'input')ALTER TABLE paimon.test_db.orders ADD COLUMN "shipping_address varchar;
Query
SELECT * FROM paimon.default.MyTable
Presto to Paimon type mapping
This section lists all supported type conversion between Presto and Paimon. All Presto’s data types are available in package com.facebook.presto.common.type.
| Presto Data Type | Paimon Data Type | Atomic Type |
|---|---|---|
RowType | RowType | false |
MapType | MapType | false |
ArrayType | ArrayType | false |
BooleanType | BooleanType | true |
TinyintType | TinyIntType | true |
SmallintType | SmallIntType | true |
IntegerType | IntType | true |
BigintType | BigIntType | true |
RealType | FloatType | true |
DoubleType | DoubleType | true |
CharType(length) | CharType(length) | true |
VarCharType(VarCharType.MAX_LENGTH) | VarCharType(VarCharType.MAX_LENGTH) | true |
VarCharType(length) | VarCharType(length), length is less than VarCharType.MAX_LENGTH | true |
DateType | DateType | true |
TimestampType | TimestampType | true |
DecimalType(precision, scale) | DecimalType(precision, scale) | true |
VarBinaryType(length) | VarBinaryType(length) | true |
TimestampWithTimeZoneType | LocalZonedTimestampType | true |
