Lexical Structure

An SQL input consists of a sequence of commands each of which is a sequence of tokens, terminated by a semicolon (;).

The syntax of a command defines its set of valid tokens. A token can be a key word, an identifier, a quoted identifier, a literal (or constant), or a special character symbol.

Table of Contents

String Literal

String literals are defined as an arbitrary sequence of characters that are delimited with single quotes ' as defined in ANSI SQL, for example 'This is a string'.

Escape Strings

The escape character in CrateDB is the single-quote '. A character gets escaped when adding a single-quote before it. For example a single quote character within a string literal can be included by writing two adjacent single quotes, e.g. 'Jack''s car'.

Note

Two adjacent single quotes are not equivalent to the double-quote character ".

String Literals with C-Style Escapes

In addition to the escaped character ', CrateDB supports C-Style escaped string sequences. Such a sequence is constructed by prefixing the string literal with the letter E or e, for example, e'hello\nWorld'. The following escaped sequences are supported:

Escape SequenceInterpretation
\bbackspace
\fform feed
\nnewline
\rcarriage return
\ttab
\o, \oo, \ooo (o = [0-7])octal byte value
\xh, xhh (h = [0-9,A-F,a-f])hexadecimal byte value
\uxxxx, \Uxxxxxxxx (x = [0-9,A-F,a-f])16 or 32-bit hexadecimal Unicode character value

For instance, the escape string literal e'\u0061\x61\141' is equivalent to the 'aaa' string literal.

  1. cr> select e'\u0061\x61\141' as col1;
  2. +------+
  3. | col1 |
  4. +------+
  5. | aaa |
  6. +------+
  7. SELECT 1 row in set (... sec)

Any other character following a backslash is taken literally. Thus, to include a backslash character \, two adjacent backslashes need to be used (i.e. \\).

  1. cr> select e'aa\\nbb' as col1;
  2. +--------+
  3. | col1 |
  4. +--------+
  5. | aa\nbb |
  6. +--------+
  7. SELECT 1 row in set (... sec)

Finally, a single quote can be included in an escape string literal by also using the escape backslash character: \', in addition to the single-quote described in String Literals.

  1. cr> select e'aa\'bb' as col1;
  2. +-------+
  3. | col1 |
  4. +-------+
  5. | aa'bb |
  6. +-------+
  7. SELECT 1 row in set (... sec)

Key Words and Identifiers

The table bellow lists all reserved key words in CrateDB. These need to be quoted if used as identifiers.

ADDALLALTERAND
ANYARRAYASASC
BETWEENBOOLEANBYBYTE
CALLEDCASECASTCOLUMN
CONSTRAINTCREATECROSSCURRENT_DATE
CURRENT_SCHEMACURRENT_TIMECURRENT_TIMESTAMPCURRENT_USER
DEFAULTDELETEDENYDESC
DESCRIBEDIRECTORYDISTINCTDOUBLE
DROPELSEENDESCAPE
EXCEPTEXISTSEXTRACTFALSE
FIRSTFLOATFORFROM
FULLFUNCTIONGRANTGROUP
HAVINGIFININDEX
INNERINPUTINSERTINT
INTEGERINTERSECTINTOIP
ISJOINLASTLEFT
LICENSELIKELIMITLONG
MATCHNATURALNOTNULL
NULLSOBJECTOFFSETON
ORORDEROUTERPERSISTENT
PRIMARYRECURSIVEREPLACERESET
RETURNSREVOKERIGHTSELECT
SESSION_USERSETSHORTSOME
STRATIFYSTRINGSUBSTRINGTABLE
THENTRANSIENTTRUETRY_CAST
UNBOUNDEDUNIONUPDATEUSER
USINGWHENWHEREWITH

Tokens such as my_table, id, name, or data in the example above are identifiers, which identify names of tables, columns, and other database objects.

Example:

  1. CREATE TABLE my_table (
  2. id INTEGER,
  3. name STRING,
  4. data OBJECT
  5. ) WITH (number_of_replicas = 0);

Note

Key words and unquoted identifiers are case insensitive.

This means that:

  1. select foo from t;

is equivalent to:

  1. select Foo from t;

or:

  1. select FOO from t;

A widely used convention is to write key words in uppercase and identifiers in lowercase, such as

  1. ALTER TABLE foo ADD COLUMN new_column INTEGER;
  1. INSERT INTO foo (id, name) VALUES (1, 'bar');

Quoted identifiers can contain an arbitrary sequence of charactes enclosed by double quotes ("). Quoted identifiers are never keywords, so you can use "update" as a table or column name.

Special Characters

Some non-alphanumeric characters do have a special meaning. For their usage please refer to the sections where the respective syntax elements are described.

Semicolon:The semicolon (;) terminates an SQL statement. It cannot appear anywhere else within the command, except within a string or quoted identifier.
Comma:The comma (,) is used in various syntactical elements to separate elements of a list.
Brackets:Square brackets ([]) are used to select elements of arrays and objects, e.g. arr[1] or obj[‘key’].
Asterisk:The asterisk () is used in some contexts to denote all columns of a table. As an argument in global aggregate functions it has the meaning of any field, e.g. COUNT().
Period:The period (.) is used for numeric values and to separate schema and table names, e.g. blob.my_blob_table.

Comments

An SQL statement can contain comments. Single line comments start with a double dash (--) and end at the end of that line. Multi line comments start with /* and end with */.

Example:

  1. /*
  2. * Retrieve information about all tables in the 'doc' schema.
  3. */
  4. SELECT *
  5. FROM information_schema.tables
  6. WHERE table_schema = 'doc'; -- query information schema for doc tables