Lexical structure

The query in the YQL language is a valid UTF-8 text consisting of commands (statements) separated by semicolons (;).
The last semicolon can be omitted.
Each command is a sequence of tokens that are valid for this command.
Tokens can be keywords, IDs, literals, and so on.
Tokens are separated by whitespace characters (space, tab, line feed) or comments. The comment is not a part of the command and is syntactically equivalent to a space character.

Syntax compatibility modes

Two syntax compatibility modes are supported:

  • Advanced C++ (default)
  • ANSI SQL

ANSI SQL mode is enabled with a special comment --!ansi-lexer that must be in the beginning of the query.

Specifics of interpretation of lexical elements in different compatibility modes are described below.

Comments

The following types of comments are supported:

  • Single-line comment: starts with -- (two minus characters following one another) and continues to the end of the line
  • Multiline comment: starts with /* and ends with */
  1. SELECT 1; -- A single-line comment
  2. /*
  3. Some multi-line comment
  4. */

Lexical structure - 图1

In C++ syntax compatibility mode (default), a multiline comment ends with the nearest */.
The ANSI SQL syntax compatibility mode accounts for nesting of multiline comments:

  1. --!ansi_lexer
  2. SELECT * FROM T; /* this is a comment /* this is a nested comment, without ansi_lexer it raises an error */ */

Lexical structure - 图2

Keywords and identifiers

Keywords are tokens that have a fixed value in the YQL language. Examples of keywords: SELECT, INSERT, FROM, ACTION, and so on. Keywords are case-insensitive, that is, SELECT and SeLEcT are equivalent to each other.
The list of keywords is not fixed and is going to expand as the language develops. A keyword can’t contain numbers and begin or end with an underscore.

Identifiers are tokens that identify the names of tables, columns, and other objects in YQL. Identifiers in YQL are always case-sensitive.
An identifier can be written in the body of the program without any special formatting, if the identifier:

  • Is not a keyword
  • Begins with a Latin letter or underscore
  • Is followed by a Latin letter, an underscore, or a number
  1. SELECT my_column FROM my_table; -- my_column and my_table are identifiers

Lexical structure - 图3

To include an arbitrary ID in the body of a query, the ID is enclosed in backticks:

  1. SELECT `column with space` from T;
  2. SELECT * FROM `my_dir/my_table`

Lexical structure - 图4

IDs in backticks are never interpreted as keywords:

  1. SELECT `select` FROM T; -- select - Column name in the T table

Lexical structure - 图5

When using backticks, you can use the standard C escaping:

  1. SELECT 1 as `column with\n newline, \x0a newline and \` backtick `;

Lexical structure - 图6

In ANSI SQL syntax compatibility mode, arbitrary IDs can also be enclosed in double quotes. To include a double quote in a quoted ID, use two double quotes:

  1. --!ansi_lexer
  2. SELECT 1 as "column with "" double quote"; -- column name will be: column with " double quote

Lexical structure - 图7

String literals

A string literal (constant) is expressed as a sequence of characters enclosed in single quotes. Inside a string literal, you can use the C-style escaping rules:

  1. SELECT 'string with\n newline, \x0a newline and \' backtick ';

Lexical structure - 图8

In the C++ syntax compatibility mode (default), you can use double quotes instead of single quotes:

  1. SELECT "string with\n newline, \x0a newline and \" backtick ";

Lexical structure - 图9

In ASNI SQL compatibility mode, double quotes are used for IDs, and the only escaping that can be used for string literals is a pair of single quotes:

  1. --!ansi_lexer
  2. SELECT 'string with '' quote'; -- result: a string with a ' quote

Lexical structure - 图10

String literals can be used to produce primitive type literals.

Multi-line string literals

A multiline string literal is expressed as an arbitrary set of characters enclosed in double at signs @@:

  1. $text = @@some
  2. multiline
  3. text@@;
  4. SELECT LENGTH($text);

Lexical structure - 图11

If you need to use double at signs in your text, duplicate them:

  1. $text = @@some
  2. multiline with double at: @@@@
  3. text@@;
  4. SELECT $text;

Lexical structure - 图12

Typed string literals

  • For string literals, for example, multiline literals, the String type is used by default.
  • You can use the following suffixes to explicitly control the literal type:
    • u: Utf8.
    • y: Yson.
    • j: Json.

Example:

  1. SELECT "foo"u, '[1;2]'y, @@{"a":null}@@j;

Lexical structure - 图13

Numeric literals

  • Integer literals have the default type Int32, if they fit within the Int32 range. Otherwise, they automatically expand to Int64.
  • You can use the following suffixes to explicitly control the literal type:
    • l: Int64.
    • s: Int16.
    • t: Int8.
  • Add the suffix u to convert a type to its corresponding unsigned type:
    • ul: Uint64.
    • u: Uint32.
    • us: Uint16.
    • ut: Uint8.
  • You can also use hexadecimal, octal, and binary format for integer literals using the prefixes 0x, 0o and 0b, respectively. You can arbitrarily combine them with the above-mentioned suffixes.
  • Floating point literals have the Double type by default, but you can use the suffix f to narrow it down to Float.
  1. SELECT
  2. 123l AS `Int64`,
  3. 0b01u AS `Uint32`,
  4. 0xfful AS `Uint64`,
  5. 0o7ut AS `Uint8`,
  6. 456s AS `Int16`,
  7. 1.2345f AS `Float`;

Lexical structure - 图14