count_min_sketch()

Produces a Count-Min Sketch in the form of an aggregate that can be passed to the approx_count function to estimate how many times a particular value has appeared in a column.

  1. count_min_sketch(
  2. values TEXT,
  3. error DOUBLE PRECISION,
  4. probability DOUBLE PRECISION,
  5. ) RETURNS CountMinSketch
warning

Experimental features could have bugs. They might not be backwards compatible, and could be removed in future releases. Use these features at your own risk, and do not use any experimental features in production.

Required arguments

NameTypeDescription
valuesTEXTColumn to aggregate
errorDOUBLE PRECISIONError tolerance in estimate, calculated relative to the number of values added to the sketch
probabilityDOUBLE PRECISIONProbability that an estimate falls outside the error bounds

Returns

ColumnTypeDescription
aggCountMinSketchAn object storing a table of counters.

Sample usage

Create a Count-Min Sketch of the stock symbols seen in your tick data. With this aggregate, you’ll then be able to estimate how often any text value appears in the symbol column.

  1. SELECT count_min_sketch(symbol, 0.01, 0.01) AS symbol_sketch
  2. FROM stocks_real_time;

In this example, the first 0.01 dictates that your frequency estimates have a relative error of 1%. A relative error of 1% means that the approximate count of an item is overestimated by at most 1% of the total number of (non-NULL) rows in the column you aggregated. (The Count-Min Sketch is a biased estimator of the true frequency because it may overestimate the count of an item, but it cannot underestimate that count.)

The second 0.01 means that your estimated frequency falls outside those error bounds 1% of the time (on average).

You can then pass this aggregate into the approx_count function. Doing so gives you an estimate of how many times a given symbol appears in the symbol column.

For example, if you wanted to know approximately how many of the quotes in the tick data were for the AAPL stock, you would then do the following:

  1. WITH t AS (
  2. SELECT count_min_sketch(symbol, 0.01, 0.01) AS symbol_sketch
  3. FROM stocks_real_time
  4. )
  5. SELECT toolkit_experimental.approx_count('AAPL', symbol_sketch)
  6. FROM t;