Elasticsearch Connector

Overview

The Elasticsearch Connector allows access to Elasticsearch data from openLooKeng. This document describes how to setup the Elasticsearch Connector to run SQL queries against Elasticsearch.

Note It is highly recommended to use Elasticsearch 6.0.0 or later.

Configuration

To configure the Elasticsearch connector, create a catalog properties file etc/catalog/elasticsearch.properties with the following contents, replacing the properties as appropriate:

  1. connector.name=elasticsearch
  2. elasticsearch.default-schema-name=default
  3. elasticsearch.table-description-directory=etc/elasticsearch/
  4. elasticsearch.scroll-size=1000
  5. elasticsearch.scroll-timeout=1m
  6. elasticsearch.request-timeout=2s
  7. elasticsearch.max-request-retries=5
  8. elasticsearch.max-request-retry-time=10s

Configuration Properties

The following configuration properties are available:

Property NameDescription
elasticsearch.default-schema-nameDefault schema name for tables.
elasticsearch.table-description-directoryDirectory containing JSON table description files.
elasticsearch.scroll-sizeMaximum number of hits to be returned with each Elasticsearch scroll request.
elasticsearch.scroll-timeoutTimeout for keeping the search context alive for scroll requests.
elasticsearch.max-hitsMaximum number of hits a single Elasticsearch request can fetch.
elasticsearch.request-timeoutTimeout for Elasticsearch requests.
elasticsearch.max-request-retriesMaximum number of Elasticsearch request retries.
elasticsearch.max-request-retry-timeUse exponential backoff starting at 1s up to the value specified by this configuration when retrying failed requests.

elasticsearch.default-schema-name

Defines the schema that will contain all tables defined without a qualifying schema name.

This property is optional; the default is default.

elasticsearch.table-description-directory

Specifies a path under the openLooKeng deployment directory that contains one or more JSON files with table descriptions (must end with .json).

This property is optional; the default is etc/elasticsearch.

elasticsearch.scroll-size

This property defines the maximum number of hits that can be returned with each Elasticsearch scroll request.

This property is optional; the default is 1000.

elasticsearch.scroll-timeout

This property defines the amount of time (ms) Elasticsearch will keep the search context alive for scroll requests

This property is optional; the default is 1s.

elasticsearch.max-hits

This property defines the maximum number of hits an Elasticsearch request can fetch.

This property is optional; the default is 1000.

elasticsearch.request-timeout

This property defines the timeout value for all Elasticsearch requests.

This property is optional; the default is 100ms.

elasticsearch.max-request-retries

This property defines the maximum number of Elasticsearch request retries.

This property is optional; the default is 5.

elasticsearch.max-request-retry-time

Use exponential backoff starting at 1s up to the value specified by this configuration when retrying failed requests.

This property is optional; the default is 10s.

Search Guard Authentication

The Elasticsearch connector provides additional security options to support Elasticsearch clusters that have been configured to use Search Guard.

You can configure the certificate format by setting the searchguard.ssl.certificate_format config property in the Elasticsearch catalog properties file. The allowed values for this configuration are:

Property ValueDescription
NONE (default)Do not use Search Guard Authentication.
PEMUse X.509 PEM certificates and PKCS #8 keys.
JKSUse Keystore and Truststore files.

If you use X.509 PEM certificates and PKCS #8 keys, the following properties must be set:

Property NameDescription
searchguard.ssl.pemcert-filepathPath to the X.509 node certificate chain.
searchguard.ssl.pemkey-filepathPath to the certificates key file.
searchguard.ssl.pemkey-passwordKey password. Omit this setting if the key has no password.
searchguard.ssl.pemtrustedcas-filepathPath to the root CA(s) (PEM format).

If you use Keystore and Truststore files, the following properties must be set:

Property NameDescription
searchguard.ssl.keystore-filepathPath to the keystore file.
searchguard.ssl.keystore-passwordKeystore password.
searchguard.ssl.truststore-filepathPath to the truststore file.
searchguard.ssl.truststore-passwordTruststore password.

searchguard.ssl.pemcert-filepath

The path to the X.509 node certificate chain. This file must be readable by the operating system user running openLooKeng.

This property is optional; the default is etc/elasticsearch/esnode.pem.

searchguard.ssl.pemkey-filepath

The path to the certificates key file. This file must be readable by the operating system user running openLooKeng.

This property is optional; the default is etc/elasticsearch/esnode-key.pem.

searchguard.ssl.pemkey-password

The key password for the key file specified by searchguard.ssl.pemkey-filepath.

This property is optional; the default is empty string.

searchguard.ssl.pemtrustedcas-filepath

The path to the root CA(s) (PEM format). This file must be readable by the operating system user running openLooKeng.

This property is optional; the default is etc/elasticsearch/root-ca.pem.

searchguard.ssl.keystore-filepath

The path to the keystore file. This file must be readable by the operating system user running openLooKeng.

This property is optional; the default is etc/elasticsearch/keystore.jks.

searchguard.ssl.keystore-password

The keystore password for the keystore file specified by searchguard.ssl.keystore-filepath

This property is optional; the default is empty string.

searchguard.ssl.truststore-filepath

The path to the truststore file. This file must be readable by the operating system user running openLooKeng.

This property is optional; the default is etc/elasticsearch/truststore.jks.

searchguard.ssl.truststore-password

The truststore password for the truststore file specified by searchguard.ssl.truststore-password

This property is optional; the default is empty string.

Table Definition Files

Elasticsearch stores the data across multiple nodes and builds indices for fast retrieval. For openLooKeng, this data must be mapped into columns to allow queries against the data.

A table definition file describes a table in JSON format.

  1. {
  2. "tableName": ...,
  3. "schemaName": ...,
  4. "hostAddress": ...,
  5. "port": ...,
  6. "clusterName": ...,
  7. "index": ...,
  8. "indexExactMatch": ...,
  9. "type": ...
  10. "columns": [
  11. {
  12. "name": ...,
  13. "type": ...,
  14. "jsonPath": ...,
  15. "jsonType": ...,
  16. "ordinalPosition": ...
  17. }
  18. ]
  19. }
FieldRequiredTypeDescription
tableNamerequiredstringName of the table.
schemaNameoptionalstringSchema that contains the table. If omitted, the default schema name is used.
hostrequiredstringElasticsearch search node host name.
portrequiredintegerElasticsearch search node port number.
clusterNamerequiredstringElasticsearch cluster name.
indexrequiredstringElasticsearch index that is backing this table.
indexExactMatchoptionalbooleanIf set to true, the index specified with the index property is used. Otherwise, all indices starting with the prefix specified by the index property are used.
typerequiredstringElasticsearch mapping type, which determines how the document are indexed.
columnsoptionallistList of column metadata information.

Elasticsearch Column Metadata

Optionally, column metadata can be described in the same table description JSON file with these fields:

FieldRequiredTypeDescription
nameoptionalstringColumn name of Elasticsearch field.
typeoptionalstringColumn type of Elasticsearch field.
jsonPathoptionalstringJson path of Elasticsearch field.
jsonTypeoptionalstringJson type of Elasticsearch field.
ordinalPositionoptionalintegerOrdinal position of the column.