Microsoft Azure

To use this Apache Druid extension, make sure to include druid-azure-extensions extension.

Deep Storage

Microsoft Azure Storage is another option for deep storage. This requires some additional Druid configuration.

PropertyPossible ValuesDescriptionDefault
druid.storage.typeazureMust be set.
druid.azure.accountAzure Storage account name.Must be set.
druid.azure.keyAzure Storage account key.Must be set.
druid.azure.containerAzure Storage container name.Must be set.
druid.azure.protocolhttp or httpshttps
druid.azure.maxTriesNumber of tries before cancel an Azure operation.3

See Azure Services for more information.

Firehose

StaticAzureBlobStoreFirehose

This firehose ingests events, similar to the StaticS3Firehose, but from an Azure Blob Store.

Data is newline delimited, with one JSON object per line and parsed as per the InputRowParser configuration.

The storage account is shared with the one used for Azure deep storage functionality, but blobs can be in a different container.

As with the S3 blobstore, it is assumed to be gzipped if the extension ends in .gz

This firehose is splittable and can be used by native parallel index tasks. Since each split represents an object in this firehose, each worker task of index_parallel will read an object.

Sample spec:

  1. "firehose" : {
  2. "type" : "static-azure-blobstore",
  3. "blobs": [
  4. {
  5. "container": "container",
  6. "path": "/path/to/your/file.json"
  7. },
  8. {
  9. "container": "anothercontainer",
  10. "path": "/another/path.json"
  11. }
  12. ]
  13. }

This firehose provides caching and prefetching features. In IndexTask, a firehose can be read twice if intervals or shardSpecs are not specified, and, in this case, caching can be useful. Prefetching is preferred when direct scan of objects is slow.

propertydescriptiondefaultrequired?
typeThis should be static-azure-blobstore.N/Ayes
blobsJSON array of Azure blobs.N/Ayes
maxCacheCapacityBytesMaximum size of the cache space in bytes. 0 means disabling cache. Cached files are not removed until the ingestion task completes.1073741824no
maxFetchCapacityBytesMaximum size of the fetch space in bytes. 0 means disabling prefetch. Prefetched files are removed immediately once they are read.1073741824no
prefetchTriggerBytesThreshold to trigger prefetching Azure objects.maxFetchCapacityBytes / 2no
fetchTimeoutTimeout for fetching an Azure object.60000no
maxFetchRetryMaximum retry for fetching an Azure object.3no

Azure Blobs:

propertydescriptiondefaultrequired?
containerName of the azure containerN/Ayes
pathThe path where data is located.N/Ayes