Iceberg

Usage

When connecting to Iceberg, Doris:

  1. Supports Iceberg V1/V2 table formats;
  2. Supports Position Delete but not Equality Delete for V2 format;
  3. Only supports Hive Metastore Catalogs. The usage is the same as that of Hive Catalogs.

Create Catalog

Hive Metastore Catalog

Same as creating Hive Catalogs. A simple example is provided here. See Hive for more information.

  1. CREATE CATALOG iceberg PROPERTIES (
  2. 'type'='hms',
  3. 'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
  4. 'hadoop.username' = 'hive',
  5. 'dfs.nameservices'='your-nameservice',
  6. 'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
  7. 'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
  8. 'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
  9. 'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
  10. );

Iceberg Native Catalog

SinceVersion dev

Access metadata with the iceberg API. The Hive, REST, Glue and other services can serve as the iceberg catalog.

  • Using Iceberg Hive Catalog
  1. CREATE CATALOG iceberg PROPERTIES (
  2. 'type'='iceberg',
  3. 'iceberg.catalog.type'='hms',
  4. 'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
  5. 'hadoop.username' = 'hive',
  6. 'dfs.nameservices'='your-nameservice',
  7. 'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
  8. 'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
  9. 'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
  10. 'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
  11. );
  • Using Iceberg REST Catalog

RESTful service as the server side. Implementing RESTCatalog interface of iceberg to obtain metadata.

  1. CREATE CATALOG iceberg PROPERTIES (
  2. 'type'='iceberg',
  3. 'iceberg.catalog.type'='rest',
  4. 'uri' = 'http://172.21.0.1:8181',
  5. );

If you want to use S3 storage, the following properties need to be set.

  1. "AWS_ACCESS_KEY" = "ak"
  2. "AWS_SECRET_KEY" = "sk"
  3. "AWS_REGION" = "region-name"
  4. "AWS_ENDPOINT" = "http://endpoint-uri"
  5. "AWS_CREDENTIALS_PROVIDER" = "provider-class-name" // Optional. The default credentials class is based on BasicAWSCredentials.

Column Type Mapping

Same as that in Hive Catalogs. See the relevant section in Hive.

Time Travel

SinceVersion dev

Doris supports reading the specified Snapshot of Iceberg tables.

Each write operation to an Iceberg table will generate a new Snapshot.

By default, a read request will only read the latest Snapshot.

You can read data of historical table versions using the FOR TIME AS OF or FOR VERSION AS OF statements based on the Snapshot ID or the timepoint the Snapshot is generated. For example:

SELECT * FROM iceberg_tbl FOR TIME AS OF "2022-10-07 17:20:37";

SELECT * FROM iceberg_tbl FOR VERSION AS OF 868895038966572;

You can use the iceberg_meta table function to view the Snapshot details of the specified table.