Window

Applies a pipeline to segments of rows, producing one output value for every input value.

  1. window rows:(range) range:(range) expanding:false rolling:0 (pipeline)

For each row, the segment over which the pipeline is applied is determined by one of:

  • rows, which takes a range of rows relative to the current row position.
    • 0 references the current row.
  • range, which takes a range of values relative to current row value.

The bounds of the range are inclusive. If a bound is omitted, the segment will extend until the edge of the table or group.

For ease of use, there are two flags that override rows or range:

  • expanding:true is an alias for rows:..0. A sum using this window is also known as “cumulative sum”.
  • rolling:n is an alias for rows:(-n+1)..0, where n is an integer. This will include n last values, including current row. An average using this window is also knows as a Simple Moving Average.

Some examples:

ExpressionMeaning
rows:0..2current row plus two following
rows:-2..0two preceding rows plus current row
rolling:3(same as previous)
rows:-2..4two preceding rows plus current row plus four following rows
rows:..0all rows from the start of the table up to & including current row
expanding:true(same as previous)
rows:0..current row and all following rows until the end of the table
rows:..all rows, which same as not having window at all

Example

PRQL

  1. from employees
  2. group employee_id (
  3. sort month
  4. window rolling:12 (
  5. derive {trail_12_m_comp = sum paycheck}
  6. )
  7. )

SQL

  1. SELECT
  2. *,
  3. SUM(paycheck) OVER (
  4. PARTITION BY employee_id
  5. ORDER BY
  6. month ROWS BETWEEN 11 PRECEDING AND CURRENT ROW
  7. ) AS trail_12_m_comp
  8. FROM
  9. employees

PRQL

  1. from orders
  2. sort day
  3. window rows:-3..3 (
  4. derive {centered_weekly_average = average value}
  5. )
  6. group {order_month} (
  7. sort day
  8. window expanding:true (
  9. derive {monthly_running_total = sum value}
  10. )
  11. )

SQL

  1. SELECT
  2. *,
  3. AVG(value) OVER (
  4. ORDER BY
  5. day ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
  6. ) AS centered_weekly_average,
  7. SUM(value) OVER (
  8. PARTITION BY order_month
  9. ORDER BY
  10. day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  11. ) AS monthly_running_total
  12. FROM
  13. orders

Rows vs Range:

PRQL

  1. from [
  2. {time_id=1, value=15},
  3. {time_id=2, value=11},
  4. {time_id=3, value=16},
  5. {time_id=4, value=9},
  6. {time_id=7, value=20},
  7. {time_id=8, value=22},
  8. ]
  9. window rows:-2..0 (
  10. sort time_id
  11. derive {sma3rows = average value}
  12. )
  13. window range:-2..0 (
  14. sort time_id
  15. derive {sma3range = average value}
  16. )

SQL

  1. WITH table_0 AS (
  2. SELECT
  3. 1 AS time_id,
  4. 15 AS value
  5. UNION
  6. ALL
  7. SELECT
  8. 2 AS time_id,
  9. 11 AS value
  10. UNION
  11. ALL
  12. SELECT
  13. 3 AS time_id,
  14. 16 AS value
  15. UNION
  16. ALL
  17. SELECT
  18. 4 AS time_id,
  19. 9 AS value
  20. UNION
  21. ALL
  22. SELECT
  23. 7 AS time_id,
  24. 20 AS value
  25. UNION
  26. ALL
  27. SELECT
  28. 8 AS time_id,
  29. 22 AS value
  30. )
  31. SELECT
  32. time_id,
  33. value,
  34. AVG(value) OVER (
  35. ORDER BY
  36. time_id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  37. ) AS sma3rows,
  38. AVG(value) OVER (
  39. ORDER BY
  40. time_id RANGE BETWEEN 2 PRECEDING AND CURRENT ROW
  41. ) AS sma3range
  42. FROM
  43. table_0
  44. ORDER BY
  45. time_id
time_idvaluesma3rowssma3range
1151515
2111313
3161414
491212
7201520
8221721

We can see that rows having time_id of 5 and 6 are missing in example data; we can say there are gaps in our time series data.

When computing SMA 3 for the fifth row (time_id==7) then:

  • “rows” will compute average on 3 rows (time_id in 3, 4, 7)
  • “range” will compute average on single row only (time_id==7)

When computing SMA 3 for the sixth row (time_id==8) then:

  • “rows” will compute average on 3 rows (time_id in 4, 7, 8)
  • “range” will compute average on 2 rows (time_id in 7, 8)

We can observe that “rows” ignores the content of the time_id, only uses its order; we can say its window operates on physical rows. On the other hand “range” looks at the content of the time_id and based on the content decides how many rows fits into window; we can say window operates on logical rows.

Windowing by default

If you use window functions without window transform, they will be applied to the whole table. Unlike in SQL, they will remain window functions and will not trigger aggregation.

PRQL

  1. from employees
  2. sort age
  3. derive {rnk = rank age}

SQL

  1. SELECT
  2. *,
  3. RANK() OVER (
  4. ORDER BY
  5. age
  6. ) AS rnk
  7. FROM
  8. employees
  9. ORDER BY
  10. age

You can also only apply group:

PRQL

  1. from employees
  2. group department (
  3. sort age
  4. derive {rnk = rank age}
  5. )

SQL

  1. SELECT
  2. *,
  3. RANK() OVER (
  4. PARTITION BY department
  5. ORDER BY
  6. age
  7. ) AS rnk
  8. FROM
  9. employees

Window functions as first class citizens

There are no limitations on where windowed expressions can be used:

PRQL

  1. from employees
  2. filter salary < (average salary)

SQL

  1. WITH table_0 AS (
  2. SELECT
  3. *,
  4. AVG(salary) OVER () AS _expr_0
  5. FROM
  6. employees
  7. )
  8. SELECT
  9. *
  10. FROM
  11. table_0
  12. WHERE
  13. salary < _expr_0