Apache Kafka Lookups

Lookups are an experimental feature.

To use this Apache Druid extension, make sure to include druid-lookups-cached-global and druid-kafka-extraction-namespace as an extension.

If you need updates to populate as promptly as possible, it is possible to plug into a Kafka topic whose key is the old value and message is the desired new value (both in UTF-8) as a LookupExtractorFactory.

  1. {
  2. "type":"kafka",
  3. "kafkaTopic":"testTopic",
  4. "kafkaProperties":{"zookeeper.connect":"somehost:2181/kafka"}
  5. }
ParameterDescriptionRequiredDefault
kafkaTopicThe Kafka topic to read the data fromYes
kafkaPropertiesKafka consumer properties. At least”zookeeper.connect” must be specified. Only the zookeeper connector is supportedYes
connectTimeoutHow long to wait for an initial connectionNo0 (do not wait)
isOneToOneThe map is a one-to-one (see Lookup DimensionSpecs)Nofalse

The extension kafka-extraction-namespace enables reading from a Kafka feed which has name/key pairs to allow renaming of dimension values. An example use case would be to rename an ID to a human readable format.

The consumer properties group.id and auto.offset.reset CANNOT be set in kafkaProperties as they are set by the extension as UUID.randomUUID().toString() and smallest respectively.

See lookups for how to configure and use lookups.

Limitations

Currently the Kafka lookup extractor feeds the entire Kafka stream into a local cache. If you are using on-heap caching, this can easily clobber your java heap if the Kafka stream spews a lot of unique keys. off-heap caching should alleviate these concerns, but there is still a limit to the quantity of data that can be stored. There is currently no eviction policy.

Testing the Kafka rename functionality

To test this setup, you can send key/value pairs to a Kafka stream via the following producer console:

  1. ./bin/kafka-console-producer.sh --property parse.key=true --property key.separator="->" --broker-list localhost:9092 --topic testTopic

Renames can then be published as OLD_VAL->NEW_VAL followed by newline (enter or return)