topn()

Returns the most common values accumulated in a frequency aggregate or top N aggregate.

  1. topn (
  2. agg FrequencyAggregate,
  3. n INTEGER
  4. ) RETURNS topn AnyElement
  5. topn (
  6. agg TopnAggregate,
  7. n INTEGER
  8. ) RETURNS topn AnyElement

Both frequency aggregates and top N aggregates can be used to calculate topn. Top N aggregates allow you to specify the target number of values you want returned, without estimating their threshold frequency. Frequency aggregates allow you to store all values that surpass a threshold frequency. They are useful if you want to store and use frequency information, and not just calculate top N.

warning

Experimental features could have bugs. They might not be backwards compatible, and could be removed in future releases. Use these features at your own risk, and do not use any experimental features in production.

Required arguments

NameTypeDescription
aggFrequencyAggregate or TopnAggregateThe aggregate to display values for
nINTEGERThe number of values to return. Required only for frequency aggregates. For top N aggregates, defaults to target n of the aggregate itself

Optional arguments

NameTypeDescription
nINTEGERThe number of values to return. Optional only for top N aggregates, where it must be less than the target n of the aggregate itself. Defaults to the target n of the aggregate.

Returns

ColumnTypeDescription
topnAnyElementThe n most-frequent values in the aggregate.

In some cases, the function might return fewer than n values. This happens if:

  • The underlying frequency aggregate doesn’t contain n elements with the minimum frequency
  • The data isn’t skewed enough to support n values from a top N aggregate
warning

Requesting more values from a top N aggregate than it was created for will return an error. To get more values, adjust the target n in topn_agg.

Sample usage

This test uses a table of randomly generated data. The values used are the integer square roots of a random number in the range (0,400).

  1. CREATE TABLE value_test(value INTEGER);
  2. INSERT INTO value_test SELECT floor(sqrt(random() * 400)) FROM generate_series(1,100000);

This returns the 5 most common values seen in the table:

  1. SELECT toolkit_experimental.topn(
  2. toolkit_experimental.freq_agg(0.05, value),
  3. 5)
  4. FROM value_test;

The output for this query:

  1. topn
  2. ------
  3. 19
  4. 18
  5. 17
  6. 16
  7. 15