Syntax

Summary

A summary of PRQL syntax

SyntaxUsageExample
|Pipelinesfrom employees | select first_name
=Assigns & Aliasesfrom e = employees
derive total = (sum salary)
:Named args & Parametersinterp low:0 1600 sat_score
[]Listsselect [id, amount]
()Precedence & Parenthesesderive celsius = (fahrenheit - 32) / 1.8
‘’ & “”Stringsderive name = ‘Mary’
`</code></td><td><a href="$8a5bacd7e62abd36.md#quoted-identifiers">Quoted identifiers</a></td><td><code>selectfirst name`
#Comments# A comment
@Dates & Times@2021-01-01
==Expressionsfilter a == b and c != d and e > f
==Self-equality in joinjoin s=salaries [==id]
->Function definitionsfunc add a b -> a + b
+/-Sort ordersort [-amount, +date]
??Coalesceamount ?? 0

Pipes

Pipes — the connection between transforms that make up a pipeline — can be either line breaks or a pipe character (|).

In almost all situations, line-breaks pipe the result of a line’s transform into the transform on the following line. For example, the filter transform operates on the result of from employees (which is just the employees table), and the select transform operates on the result of the filter transform.

PRQL

  1. from employees
  2. filter department == "Product"
  3. select [first_name, last_name]

SQL

  1. SELECT
  2. first_name,
  3. last_name
  4. FROM
  5. employees
  6. WHERE
  7. department = 'Product'

In the place of a line-break, it’s also possible to use the | character to pipe results, such that this is equivalent:

PRQL

  1. from employees | filter department == "Product" | select [first_name, last_name]

SQL

  1. SELECT
  2. first_name,
  3. last_name
  4. FROM
  5. employees
  6. WHERE
  7. department = 'Product'

A line-break doesn’t create a pipeline in a couple of cases:

  • within a list (e.g. the derive examples below),
  • when the following line is a new statement, which starts with a keyword of func, table or from.

Lists

Lists are represented with [], and can span multiple lines. A final trailing comma is optional.

PRQL

  1. from numbers
  2. derive [x = 1, y = 2]
  3. derive [
  4. a = x,
  5. b = y
  6. ]
  7. derive [
  8. c = a,
  9. d = b,
  10. ]

SQL

  1. SELECT
  2. *,
  3. 1 AS x,
  4. 2 AS y,
  5. 1 AS a,
  6. 2 AS b,
  7. 1 AS c,
  8. 2 AS d
  9. FROM
  10. numbers

Most transforms can take either a list or a single item, so these are equivalent:

PRQL

  1. from employees
  2. select [first_name]

SQL

  1. SELECT
  2. first_name
  3. FROM
  4. employees

PRQL

  1. from employees
  2. select first_name

SQL

  1. SELECT
  2. first_name
  3. FROM
  4. employees

Expressions

PRQL is made up of expressions, like 2 + 3 or ((1 + x) * y). In the example below, note the use of expressions to calculate the alias circumference and in the filter transform.

PRQL

  1. from foo
  2. select [
  3. circumference = diameter * 3.14159,
  4. color,
  5. ]
  6. filter circumference > 10 and color != "red"

SQL

  1. WITH table_1 AS (
  2. SELECT
  3. diameter * 3.14159 AS circumference,
  4. color
  5. FROM
  6. foo
  7. )
  8. SELECT
  9. circumference,
  10. color
  11. FROM
  12. table_1 AS table_0
  13. WHERE
  14. circumference > 10
  15. AND color <> 'red'

Precedence and Parentheses

Parentheses — () — are used to give precedence to inner expressions.

Note

We realize some of the finer points here are not intuitive. We are considering approaches to make this more intuitive — even at the cost of requiring more syntax in some circumstances. And we’re planning to make the error messages much better, so the compiler is there to help out.

Parentheses are required around:

  • Any nested function call containing a pipe, either the | symbol or a new line. “Nested” means within a transform; i.e. not just the main pipeline.
  • Any function call that isn’t a single item in a list or a pipeline, like sum distance in round 0 (sum distance)1.
  • A minus sign in a function argument, like in add (-1) (-3)
  • Inner transforms for group, window, and other transforms.

Parentheses are not required around expressions which use operators but no function call, like foo + bar.

Here’s a full rundown of times this applier:

PRQL

  1. from employees
  2. # Requires parentheses, because it's contains a pipe
  3. derive is_proximate = (distance | in 0..20)
  4. # Requires parentheses, because it's a function call
  5. derive total_distance = (sum distance)
  6. # `??` doesn't require parentheses, as it's not a function call
  7. derive min_capped_distance = (min distance ?? 5)
  8. # No parentheses needed, because no function call
  9. derive travel_time = distance / 40
  10. # No inner parentheses needed around `1+1` because no function call
  11. derive distance_rounded_2_dp = (round 1+1 distance)
  12. derive [
  13. # Requires parentheses, because it contains a pipe
  14. is_far = (distance | in 100..),
  15. # The left value of the range requires parentheses,
  16. # because of the minus sign
  17. is_negative = (distance | in (-100..0)),
  18. # ...this is equivalent
  19. is_negative = (distance | in (-100)..0),
  20. # Doesn't require parentheses, because it's in a list (confusing, see footnote)!
  21. average_distance = average distance,
  22. ]
  23. # Requires parentheses because of the minus sign
  24. sort (-distance)
  25. # A list is fine too
  26. sort [-distance]

SQL

  1. SELECT
  2. *,
  3. distance BETWEEN 0 AND 20 AS is_proximate,
  4. SUM(distance) OVER () AS total_distance,
  5. MIN(COALESCE(distance, 5)) OVER () AS min_capped_distance,
  6. distance / 40 AS travel_time,
  7. ROUND(distance, 2) AS distance_rounded_2_dp,
  8. distance >= 100 AS is_far,
  9. distance BETWEEN -100 AND 0,
  10. distance BETWEEN -100 AND 0 AS is_negative,
  11. AVG(distance) OVER () AS average_distance
  12. FROM
  13. employees
  14. ORDER BY
  15. distance DESC

This doesn’t work, for example (though it should provide a much better error message):

PRQL

  1. from employees
  2. derive total_distance = sum distance

Error

  1. Error:
  2. ╭─[:2:29]
  3. 2 derive total_distance = sum distance
  4. ────┬───
  5. ╰───── Unknown name distance
  6. ───╯

For a more formal definition, refer to this precedence table. Because function calls have the lowest precedence, nested function calls or arguments that start or end with an operator require parenthesis.

GroupOperatorsPrecedenceAssociativity
identifier dot.1
unary- + ! ==2
range..3
mul* / %4left-to-right
add+ -5left-to-right
compare== != <= >= < >6left-to-right
coalesce??7left-to-right
andand8left-to-right
oror9left-to-right
function call10

Inner Transforms

Parentheses are also used for transforms (such as group and window) that pass their result to an “inner transform”. The example below applies the aggregate pipeline to each group of unique title and country values:

PRQL

  1. from employees
  2. group [title, country] (
  3. aggregate [
  4. average salary,
  5. ct = count
  6. ]
  7. )

SQL

  1. SELECT
  2. title,
  3. country,
  4. AVG(salary),
  5. COUNT(*) AS ct
  6. FROM
  7. employees
  8. GROUP BY
  9. title,
  10. country

Comments

Comments are represented by #.

PRQL

  1. from employees # Comment 1
  2. # Comment 2
  3. aggregate [average salary]

SQL

  1. SELECT
  2. AVG(salary)
  3. FROM
  4. employees

There’s no distinct multiline comment syntax.

Quoted identifiers

To use identifiers that are otherwise invalid, surround them with backticks. Depending on the dialect, these will remain as backticks or be converted to double-quotes.

PRQL

  1. prql target:sql.mysql
  2. from employees
  3. select `first name`

SQL

  1. SELECT
  2. `first name`
  3. FROM
  4. employees

PRQL

  1. prql target:sql.postgres
  2. from employees
  3. select `first name`

SQL

  1. SELECT
  2. "first name"
  3. FROM
  4. employees

PRQL

  1. from `dir/*.parquet`

SQL

  1. SELECT
  2. *
  3. FROM
  4. "dir/*.parquet"

BigQuery also uses backticks to surround project & dataset names (even if valid identifiers) in the SELECT statement:

PRQL

  1. prql target:sql.bigquery
  2. from `project-foo.dataset.table`
  3. join `project-bar.dataset.table` [==col_bax]

SQL

  1. SELECT
  2. `project-foo.dataset.table`.*,
  3. `project-bar.dataset.table`.*
  4. FROM
  5. `project-foo.dataset.table`
  6. JOIN `project-bar.dataset.table` ON `project-foo.dataset.table`.col_bax = `project-bar.dataset.table`.col_bax

Quoting schemas

Note

This is currently not great and we are working on improving it; see https://github.com/PRQL/prql/issues/1535 for progress.

If supplying a schema without a column — for example in a from or join transform, that also needs to be a quoted identifier:

PRQL

  1. from `music.albums`

SQL

  1. SELECT
  2. *
  3. FROM
  4. music.albums

Parameters

PRQL will retain parameters like $1 in SQL output, which can then be supplied to the SQL query as a prepared query:

PRQL

  1. from employees
  2. filter id == $1

SQL

  1. SELECT
  2. *
  3. FROM
  4. employees
  5. WHERE
  6. id = $1

Numbers

Numbers can contain underscores between numbers; which can make reading large numbers easier:

PRQL

  1. from numbers
  2. select [
  3. small = 1.000_000_1,
  4. big = 5_000_000,
  5. ]

SQL

  1. SELECT
  2. 1.0000001 AS small,
  3. 5000000 AS big
  4. FROM
  5. numbers

Keywords

At the moment, PRQL uses only four keywords:

  • prql
  • let
  • func
  • case

To use these names as columns or relations, use backticks: `case` .

It may seem that transforms are also keywords, but they are normal function within std namespace:

PRQL

  1. std.from my_table
  2. std.select [from = my_table.a, take = my_table.b]
  3. std.take 3

SQL

  1. SELECT
  2. a AS "from",
  3. b AS take
  4. FROM
  5. my_table
  6. LIMIT
  7. 3

1: or, technically, it’s on the right side of an assignment in a list…