BITMAP_UNION

Create table

The aggregation model needs to be used when creating the table. The data type is bitmap and the aggregation function is bitmap_union.

  1. CREATE TABLE `pv_bitmap` (
  2. `dt` int (11) NULL COMMENT" ",
  3. `page` varchar (10) NULL COMMENT" ",
  4. `user_id` bitmap BITMAP_UNION NULL COMMENT" "
  5. ) ENGINE = OLAP
  6. AGGREGATE KEY (`dt`,` page`)
  7. COMMENT "OLAP"
  8. DISTRIBUTED BY HASH (`dt`) BUCKETS 2;

Note: When the amount of data is large, it is best to create a corresponding rollup table for high-frequency bitmap_union queries

  1. ALTER TABLE pv_bitmap ADD ROLLUP pv (page, user_id);

Data Load

TO_BITMAP (expr): Convert 0 ~ 18446744073709551615 unsigned bigint to bitmap

BITMAP_EMPTY (): Generate empty bitmap columns, used for insert or import to fill the default value

BITMAP_HASH (expr): Convert any type of column to a bitmap by hashing

Stream Load

  1. cat data | curl --location-trusted -u user: passwd -T--H "columns: dt, page, user_id, user_id = to_bitmap (user_id)" http: // host: 8410 / api / test / testDb / _stream_load
  1. cat data | curl --location-trusted -u user: passwd -T--H "columns: dt, page, user_id, user_id = bitmap_hash (user_id)" http: // host: 8410 / api / test / testDb / _stream_load
  1. cat data | curl --location-trusted -u user: passwd -T--H "columns: dt, page, user_id, user_id = bitmap_empty ()" http: // host: 8410 / api / test / testDb / _stream_load

Insert Into

id2’s column type is bitmap

  1. insert into bitmap_table1 select id, id2 from bitmap_table2;

id2’s column type is bitmap

  1. INSERT INTO bitmap_table1 (id, id2) VALUES (1001, to_bitmap (1000)), (1001, to_bitmap (2000));

id2’s column type is bitmap

  1. insert into bitmap_table1 select id, bitmap_union (id2) from bitmap_table2 group by id;

id2’s column type is int

  1. insert into bitmap_table1 select id, to_bitmap (id2) from table;

id2’s column type is String

  1. insert into bitmap_table1 select id, bitmap_hash (id_string) from table;

Data Query

Syntax

BITMAP_UNION (expr): Calculate the union of two Bitmaps. The return value is the new Bitmap value.

BITMAP_UNION_COUNT (expr): Calculate the cardinality of the union of two Bitmaps, equivalent to BITMAP_COUNT (BITMAP_UNION (expr)). It is recommended to use the BITMAP_UNION_COUNT function first, its performance is better than BITMAP_COUNT (BITMAP_UNION (expr)).

BITMAP_UNION_INT (expr): Count the number of different values ​​in columns of type TINYINT, SMALLINT and INT, return the sum of COUNT (DISTINCT expr) same

INTERSECT_COUNT (bitmap_column_to_count, filter_column, filter_values ​​...): The calculation satisfies filter_column The cardinality of the intersection of multiple bitmaps of the filter. bitmap_column_to_count is a column of type bitmap, filter_column is a column of varying dimensions, and filter_values ​​is a list of dimension values.

Example

The following SQL uses the pv_bitmap table above as an example:

Calculate the deduplication value for user_id:

  1. select bitmap_union_count (user_id) from pv_bitmap;
  2. select bitmap_count (bitmap_union (user_id)) from pv_bitmap;

Calculate the deduplication value of id:

  1. select bitmap_union_int (id) from pv_bitmap;

Calculate the retention of user_id:

  1. select intersect_count (user_id, page, 'meituan') as meituan_uv,
  2. intersect_count (user_id, page, 'waimai') as waimai_uv,
  3. intersect_count (user_id, page, 'meituan', 'waimai') as retention // Number of users appearing on both 'meituan' and 'waimai' pages
  4. from pv_bitmap
  5. where page in ('meituan', 'waimai');

keyword

BITMAP, BITMAP_COUNT, BITMAP_EMPTY, BITMAP_UNION, BITMAP_UNION_INT, TO_BITMAP, BITMAP_UNION_COUNT, INTERSECT_COUNT