Name resolution

Because PRQL primarily handles relational data, it has specialized scoping rules for referencing columns.

Scopes

In PRQL’s compiler, a scope is the collection of all names one can reference from a specific point in the program.

In PRQL, names in the scope are composed from namespace and variable name which are separated by a dot, similar to SQL. Namespaces can contain many dots, but variable names cannot.

Example

Name my_table.some_column is a variable some_column from namespace my_table.

Name foo.bar.baz is a variable baz from namespace foo.bar.

When processing a query, a scope is maintained and updated for each point in the query.

It start with only namespace std, which is the standard library. It contains common functions like sum or count, along with all transform functions such as derive and group.

In pipelines (or rather in transform functions), scope is also injected with namespaces of tables which may have been referenced with from or join transforms. These namespaces contain simply all the columns of the table and possibly a wildcard variable, which matches any variable (see the algorithm below). Within transforms, there is also a special namespace that does not have a name. It is called a “frame” and it contains columns of the current table the transform is operating on.

Resolving

For each ident we want to resolve, we search the scope’s items in order. One of three things can happen:

  • Scope contains an exact match, e.g. a name that matches in namespace and the variable name.

  • Scope does not contain an exact match, but the ident did not specify a namespace, so we can match a namespace that contains a * wildcard. If there’s a single namespace, the matched namespace is also updated to contain this new variable name.

  • Otherwise, the nothing is matched and an error is raised.

Translating to SQL

When translating into an SQL statement which references only one table, there is no need to reference column names with table prefix.

PRQL

  1. from employees
  2. select first_name

SQL

  1. SELECT
  2. first_name
  3. FROM
  4. employees

But when there are multiple tables and we don’t have complete knowledge of all table columns, a column without a prefix (i.e. first_name) may actually reside in multiple tables. Because of this, we have to use table prefixes for all column names.

PRQL

  1. from employees
  2. derive {first_name, dept_id}
  3. join d=departments (==dept_id)
  4. select {first_name, d.title}

SQL

  1. SELECT
  2. employees.first_name,
  3. d.title
  4. FROM
  5. employees
  6. JOIN departments AS d ON employees.dept_id = d.dept_id

As you can see, employees.first_name now needs table prefix, to prevent conflicts with potential column with the same name in departments table. Similarly, d.title needs the table prefix.