Window Functions

Description

Window functions are a kind of aggregation for a group of rows, referred as a window. It will return the aggregation value for each row based on the group of rows.

Syntax

  1. window_function OVER ( [ { PARTITION | DISTRIBUTE } BY colName ( [, ... ] ) ]
  2. { ORDER | SORT } BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ... ]
  3. [ window_frame ] )

Parameters

window_function

Hive dialect supports the following window functions:

  • Windowing functions

    • LEAD
    • LAG
    • FIRST_VALUE
    • LAST_VALUE

    Note: For FIRST_VALUE/LAST_VALUE, use parameter to control skip null values or respect null values isn’t supported yet. And they will always skip null values

  • Analytic functions

    • RANK
    • ROW_NUMBER
    • DENSE_RANK
    • CUME_DIST
    • PERCENT_RANK
    • NTILE
  • Aggregate Functions

    • COUNT
    • SUM
    • MIN
    • MAX
    • AVG

window_frame

It’s used to specified which row to start on and where to end it. Window frame supports the following formats:

  1. (ROWS | RANGE) BETWEEN (UNBOUNDED | [num]) PRECEDING AND ([num] PRECEDING | CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
  2. (ROWS | RANGE) BETWEEN CURRENT ROW AND (CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
  3. (ROWS | RANGE) BETWEEN [num] FOLLOWING AND (UNBOUNDED | [num]) FOLLOWING

When ORDER BY is specified, but missing window_frame, the window frame defaults to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

When both ORDER BY and window_frame are missing, the window frame defaults to ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

Note: Distinct is not supported in window function yet.

Examples

  1. -- PARTITION BY with one partitioning column, no ORDER BY or window specification
  2. SELECT a, COUNT(b) OVER (PARTITION BY c) FROM t;
  3. -- PARTITION BY with two partitioning columns, no ORDER BY or window specification
  4. SELECT a, COUNT(b) OVER (PARTITION BY c, d) FROM t;
  5. -- PARTITION BY with two partitioning columns, no ORDER BY or window specification
  6. SELECT a, SUM(b) OVER (PARTITION BY c, d ORDER BY e, f) FROM t;
  7. -- PARTITION BY with partitioning, ORDER BY, and window specification
  8. SELECT a, SUM(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
  9. FROM t;
  10. SELECT a, AVG(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
  11. FROM t;
  12. SELECT a, AVG(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING)
  13. FROM t;
  14. SELECT a, AVG(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
  15. FROM t;