Doris NGram BloomFilter Index

SinceVersion 2.0.0

In order to improve the like query performance, the NGram BloomFilter index was implemented, which referenced to the ClickHouse’s ngrambf skip indices;

Create Column With NGram BloomFilter Index

During create table:

  1. CREATE TABLE `table3` (
  2. `siteid` int(11) NULL DEFAULT "10" COMMENT "",
  3. `citycode` smallint(6) NULL COMMENT "",
  4. `username` varchar(100) NULL DEFAULT "" COMMENT "",
  5. INDEX idx_ngrambf (`username`) USING NGRAM_BF PROPERTIES("gram_size"="3", "bf_size"="256") COMMENT 'username ngram_bf index'
  6. ) ENGINE=OLAP
  7. AGGREGATE KEY(`siteid`, `citycode`, `username`) COMMENT "OLAP"
  8. DISTRIBUTED BY HASH(`siteid`) BUCKETS 10
  9. PROPERTIES (
  10. "replication_num" = "1"
  11. );
  12. -- PROPERTIES("gram_size"="3", "bf_size"="1024"),indicate the number of gram and bytes of bloom filter respectively.
  13. -- the gram size set to same as the like query pattern string length. and the suitable bytes of bloom filter can be get by test, more larger more better, 256 maybe is a good start.
  14. -- Usually, if the data's cardinality is small, you can increase the bytes of bloom filter to improve the efficiency.

Show NGram BloomFilter Index

  1. show index from example_db.table3;

Drop NGram BloomFilter Index

  1. alter table example_db.table3 drop index idx_ngrambf;

Add NGram BloomFilter Index

Add NGram BloomFilter Index for old column:

  1. alter table example_db.table3 add index idx_ngrambf(username) using NGRAM_BF PROPERTIES("gram_size"="3", "bf_size"="512")comment 'username ngram_bf index'

Some notes about Doris NGram BloomFilter

  1. NGram BloomFilter only support CHAR/VARCHAR/String column.
  2. NGram BloomFilter index and BloomFilter index should be exclusive on same column
  3. The gram number and bytes of BloomFilter can be adjust and optimize. Like if gram is too small, you can increase the bytes of BloomFilter.
  4. To find some query whether use the NGram BloomFilter index, you can check the query profile.