HLL (HyperLogLog)

Description

HLL

HLL cannot be used as a key column, and the aggregation type is HLL_UNION when create table. The user does not need to specify the length and default value. The length is controlled within the system according to the degree of data aggregation. And HLL columns can only be queried or used through the matching hll_union_agg, hll_raw_agg, hll_cardinality, and hll_hash.

HLL is approximate count of distinct elements, and its performance is better than Count Distinct when the amount of data is large. The error of HLL is usually around 1%, sometimes up to 2%.

example

  1. select hour, HLL_UNION_AGG(pv) over(order by hour) uv from(
  2. select hour, HLL_RAW_AGG(device_id) as pv
  3. from metric_table -- Query the accumulated UV per hour
  4. where datekey=20200922
  5. group by hour order by 1
  6. ) final;

keyword

HLL,HYPERLOGLOG