Functions

Functions are a fundamental abstraction in PRQL — they allow us to run code in many places that we’ve written once. This reduces the number of errors in our code, makes our code more readable, and simplifies making changes.

Functions have two types of parameters:

  1. Positional parameters, which require an argument.
  2. Named parameters, which optionally take an argument, otherwise using their default value.

So this function is named fahrenheit_to_celsius and has one parameter temp:

PRQL

  1. func fahrenheit_to_celsius temp -> (temp - 32) / 1.8
  2. from cities
  3. derive temp_c = (fahrenheit_to_celsius temp_f)

SQL

  1. SELECT
  2. *,
  3. (temp_f - 32) / 1.8 AS temp_c
  4. FROM
  5. cities

This function is named interp, and has two positional parameters named high and x, and one named parameter named low which takes a default argument of 0. It calculates the proportion of the distance that x is between low and high.

PRQL

  1. func interp low:0 high x -> (x - low) / (high - low)
  2. from students
  3. derive [
  4. sat_proportion_1 = (interp 1600 sat_score),
  5. sat_proportion_2 = (interp low:0 1600 sat_score),
  6. ]

SQL

  1. SELECT
  2. *,
  3. (sat_score - 0) / 1600 AS sat_proportion_1,
  4. (sat_score - 0) / 1600 AS sat_proportion_2
  5. FROM
  6. students

Piping

Consistent with the principles of PRQL, it’s possible to pipe values into functions, which makes composing many functions more readable. When piping a value into a function, the value is passed as an argument to the final positional parameter of the function. Here’s the same result as the examples above with an alternative construction:

PRQL

  1. func interp low:0 high x -> (x - low) / (high - low)
  2. from students
  3. derive [
  4. sat_proportion_1 = (sat_score | interp 1600),
  5. sat_proportion_2 = (sat_score | interp low:0 1600),
  6. ]

SQL

  1. SELECT
  2. *,
  3. (sat_score - 0) / 1600 AS sat_proportion_1,
  4. (sat_score - 0) / 1600 AS sat_proportion_2
  5. FROM
  6. students

and

PRQL

  1. func fahrenheit_to_celsius temp -> (temp - 32) / 1.8
  2. from cities
  3. derive temp_c = (temp_f | fahrenheit_to_celsius)

SQL

  1. SELECT
  2. *,
  3. (temp_f - 32) / 1.8 AS temp_c
  4. FROM
  5. cities

We can combine a chain of functions, which makes logic more readable:

PRQL

  1. func fahrenheit_to_celsius temp -> (temp - 32) / 1.8
  2. func interp low:0 high x -> (x - low) / (high - low)
  3. from kettles
  4. derive boiling_proportion = (temp_c | fahrenheit_to_celsius | interp 100)

SQL

  1. SELECT
  2. *,
  3. ((temp_c - 32) / 1.8 - 0) / 100 AS boiling_proportion
  4. FROM
  5. kettles

Scope

Late binding

Functions can binding to any variables in scope when the function is executed. For example, here cost_total refers to the column that’s introduced in the from.

PRQL

  1. func cost_share cost -> cost / cost_total
  2. from costs
  3. select [materials, labor, overhead, cost_total]
  4. derive [
  5. materials_share = (cost_share materials),
  6. labor_share = (cost_share labor),
  7. overhead_share = (cost_share overhead),
  8. ]

SQL

  1. SELECT
  2. materials,
  3. labor,
  4. overhead,
  5. cost_total,
  6. materials / cost_total AS materials_share,
  7. labor / cost_total AS labor_share,
  8. overhead / cost_total AS overhead_share
  9. FROM
  10. costs