6.15. Window Functions

Window functions perform calculations across rows of the query result.They run after the HAVING clause but before the ORDER BY clause.Invoking a window function requires special syntax using the OVERclause to specify the window. A window has three components:

  • The partition specification, which separates the input rows into differentpartitions. This is analogous to how the GROUP BY clause separates rowsinto different groups for aggregate functions.
  • The ordering specification, which determines the order in which input rowswill be processed by the window function.
  • The window frame, which specifies a sliding window of rows to be processedby the function for a given row. If the frame is not specified, it defaultsto RANGE UNBOUNDED PRECEDING, which is the same asRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This frame contains allrows from the start of the partition up to the last peer of the current row.

For example, the following query ranks orders for each clerk by price:

  1. SELECT orderkey, clerk, totalprice,
  2. rank() OVER (PARTITION BY clerk
  3. ORDER BY totalprice DESC) AS rnk
  4. FROM orders
  5. ORDER BY clerk, rnk

Aggregate Functions

All Aggregate Functions can be used as window functions by adding the OVERclause. The aggregate function is computed for each row over the rows withinthe current row’s window frame.

For example, the following query produces a rolling sum of order pricesby day for each clerk:

  1. SELECT clerk, orderdate, orderkey, totalprice,
  2. sum(totalprice) OVER (PARTITION BY clerk
  3. ORDER BY orderdate) AS rolling_sum
  4. FROM orders
  5. ORDER BY clerk, orderdate, orderkey

Ranking Functions

  • cume_dist() → bigint
  • Returns the cumulative distribution of a value in a group of values.The result is the number of rows preceding or peer with the row in thewindow ordering of the window partition divided by the total number ofrows in the window partition. Thus, any tie values in the ordering willevaluate to the same distribution value.

  • dense_rank() → bigint

  • Returns the rank of a value in a group of values. This is similar torank(), except that tie values do not produce gaps in the sequence.

  • ntile(n) → bigint

  • Divides the rows for each window partition into n buckets rangingfrom 1 to at most n. Bucket values will differ by at most 1.If the number of rows in the partition does not divide evenly into thenumber of buckets, then the remainder values are distributed one perbucket, starting with the first bucket.

For example, with 6 rows and 4 buckets, the bucket values wouldbe as follows: 1 1 2 2 3 4

  • percent_rank() → double
  • Returns the percentage ranking of a value in group of values. The resultis (r - 1) / (n - 1) where r is the rank() of the row andn is the total number of rows in the window partition.

  • rank() → bigint

  • Returns the rank of a value in a group of values. The rank is one plusthe number of rows preceding the row that are not peer with the row.Thus, tie values in the ordering will produce gaps in the sequence.The ranking is performed for each window partition.

  • row_number() → bigint

  • Returns a unique, sequential number for each row, starting with one,according to the ordering of rows within the window partition.

Value Functions

  • firstvalue(_x) → [same as input]
  • Returns the first value of the window.

  • lastvalue(_x) → [same as input]

  • Returns the last value of the window.

  • nthvalue(_x, offset) → [same as input]

  • Returns the value at the specified offset from beginning the window.Offsets start at 1. The offset can be any scalarexpression. If the offset is null or greater than the number of values inthe window, null is returned. It is an error for the offset to be zero ornegative.

  • lead(x[, offset[, default_value]]) → [same as input]

  • Returns the value at offset rows after the current row in the window.Offsets start at 0, which is the current row. Theoffset can be any scalar expression. The default offset is 1. If theoffset is null or larger than the window, the default_value is returned,or if it is not specified null is returned.

  • lag(x[, offset[, default_value]]) → [same as input]

  • Returns the value at offset rows before the current row in the windowOffsets start at 0, which is the current row. Theoffset can be any scalar expression. The default offset is 1. If theoffset is null or larger than the window, the default_value is returned,or if it is not specified null is returned.