Hive

Once Doris is connected to Hive Metastore or made compatible with Hive Metastore metadata service, it can access databases and tables in Hive and conduct queries.

Besides Hive, many other systems, such as Iceberg and Hudi, use Hive Metastore to keep their metadata. Thus, Doris can also access these systems via Hive Catalog.

Usage

When connnecting to Hive, Doris:

  1. Supports Hive version 1/2/3;
  2. Supports both Managed Table and External Table;
  3. Can identify metadata of Hive, Iceberg, and Hudi stored in Hive Metastore;
  4. Supports Hive tables with data stored in JuiceFS, which can be used the same way as normal Hive tables (put juicefs-hadoop-x.x.x.jar in fe/lib/ and apache_hdfs_broker/lib/).

Create Catalog

  1. CREATE CATALOG hive PROPERTIES (
  2. 'type'='hms',
  3. 'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
  4. 'hadoop.username' = 'hive',
  5. 'dfs.nameservices'='your-nameservice',
  6. 'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
  7. 'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
  8. 'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
  9. 'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
  10. );

In addition to type and hive.metastore.uris , which are required, you can specify other parameters regarding the connection.

For example, to specify HDFS HA:

  1. CREATE CATALOG hive PROPERTIES (
  2. 'type'='hms',
  3. 'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
  4. 'hadoop.username' = 'hive',
  5. 'dfs.nameservices'='your-nameservice',
  6. 'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
  7. 'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
  8. 'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
  9. 'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
  10. );

To specify HDFS HA and Kerberos authentication information:

  1. CREATE CATALOG hive PROPERTIES (
  2. 'type'='hms',
  3. 'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
  4. 'hive.metastore.sasl.enabled' = 'true',
  5. 'dfs.nameservices'='your-nameservice',
  6. 'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
  7. 'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
  8. 'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
  9. 'hadoop.security.authentication' = 'kerberos',
  10. 'hadoop.kerberos.keytab' = '/your-keytab-filepath/your.keytab',
  11. 'hadoop.kerberos.principal' = 'your-principal@YOUR.COM',
  12. 'hive.metastore.kerberos.principal' = 'your-hms-principal',
  13. 'yarn.resourcemanager.address' = 'your-rm-address:your-rm-port',
  14. 'yarn.resourcemanager.principal' = 'your-rm-principal/your-rm-address@YOUR.COM'
  15. );

Remember krb5.conf and keytab file should be placed at all BE nodes and FE nodes. The location of keytab file should be equal to the value of hadoop.kerberos.keytab. As default, krb5.conf should be placed at /etc/krb5.conf.

Value of hive.metastore.kerberos.principal should be same with the same name property used by HMS you are connecting to, which can be found in hive-site.xml.

To provide Hadoop KMS encrypted transmission information:

  1. CREATE CATALOG hive PROPERTIES (
  2. 'type'='hms',
  3. 'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
  4. 'dfs.encryption.key.provider.uri' = 'kms://http@kms_host:kms_port/kms'
  5. );

Or to connect to Hive data stored in JuiceFS:

  1. CREATE CATALOG hive PROPERTIES (
  2. 'type'='hms',
  3. 'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
  4. 'hadoop.username' = 'root',
  5. 'fs.jfs.impl' = 'io.juicefs.JuiceFileSystem',
  6. 'fs.AbstractFileSystem.jfs.impl' = 'io.juicefs.JuiceFS',
  7. 'juicefs.meta' = 'xxx'
  8. );

In Doris 1.2.1 and newer, you can create a Resource that contains all these parameters, and reuse the Resource when creating new Catalogs. Here is an example:

  1. # 1. Create Resource
  2. CREATE RESOURCE hms_resource PROPERTIES (
  3. 'type'='hms',
  4. 'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
  5. 'hadoop.username' = 'hive',
  6. 'dfs.nameservices'='your-nameservice',
  7. 'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
  8. 'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
  9. 'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
  10. 'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
  11. );
  12. # 2. Create Catalog and use an existing Resource. The key and value information in the followings will overwrite the corresponding information in the Resource.
  13. CREATE CATALOG hive WITH RESOURCE hms_resource PROPERTIES(
  14. 'key' = 'value'
  15. );

You can also put the hive-site.xml file in the conf directories of FE and BE. This will enable Doris to automatically read information from hive-site.xml. The relevant information will be overwritten based on the following rules :

  • Information in Resource will overwrite that in hive-site.xml.
  • Information in CREATE CATALOG PROPERTIES will overwrite that in Resource.

Hive Versions

Doris can access Hive Metastore in all Hive versions. By default, Doris uses the interface compatible with Hive 2.3 to access Hive Metastore. You can specify a certain Hive version when creating Catalogs, for example:

  1. CREATE CATALOG hive PROPERTIES (
  2. 'type'='hms',
  3. 'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
  4. 'hive.version' = '1.1.0'
  5. );

Column Type Mapping

This is applicable for Hive/Iceberge/Hudi.

HMS TypeDoris TypeComment
booleanboolean
tinyinttinyint
smallintsmallint
intint
bigintbigint
datedate
timestampdatetime
floatfloat
doubledouble
charchar
varcharvarchar
decimaldecimal
array<type>array<type>Support nested array, such as array<array<int>>
otherunsupported