Lexical elements

This document is a living document and may not represent the current implementation of Flux. Any section that is not currently implemented is commented with a [IMPL#XXX] where XXX is an issue number tracking discussion and progress towards implementation.

Comments

Comment serve as documentation. Comments begin with the character sequence // and stop at the end of the line.

Comments cannot start inside string or regexp literals. Comments act like newlines.

Tokens

Flux is built up from tokens. There are four classes of tokens:

  • identifiers
  • keywords
  • operators
  • literals

White space formed from spaces, horizontal tabs, carriage returns, and newlines is ignored except as it separates tokens that would otherwise combine into a single token. While breaking the input into tokens, the next token is the longest sequence of characters that form a valid token.

Identifiers

Identifiers name entities within a program. An identifier is a sequence of one or more letters and digits. An identifier must start with a letter.

  1. identifier = letter { letter | unicode_digit } .
Examples of identifiers
  1. a
  2. _x
  3. longIdentifierName
  4. αβ

Keywords

The following keywords are reserved and may not be used as identifiers:

  1. and import not return option test
  2. empty in or package builtin

IMPL#256 Add in and empty operator support.

Operators

The following character sequences represent operators:

  1. + == != ( ) =>
  2. - < !~ [ ] ^
  3. * > =~ { }
  4. / <= = , :
  5. % >= <- . |>

Numeric literals

Numeric literals may be integers or floating point values. Literals have arbitrary precision and are coerced to a specific type when used.

The following coercion rules apply to numeric literals:

  • An integer literal can be coerced to an “int”, “uint”, or “float” type,
  • A float literal can be coerced to a “float” type.
  • An error will occur if the coerced type cannot represent the literal value.

IMPL#255 Allow numeric literal coercion.

Integer literals

An integer literal is a sequence of digits representing an integer value. Only decimal integers are supported.

  1. int_lit = "0" | decimal_lit .
  2. decimal_lit = ( "1" "9" ) { decimal_digit } .
Examples of integer literals
  1. 0
  2. 42
  3. 317316873

Floating-point literals

A floating-point literal is a decimal representation of a floating-point value. It has an integer part, a decimal point, and a fractional part. The integer and fractional part comprise decimal digits. One of the integer part or the fractional part may be elided.

  1. float_lit = decimals "." [ decimals ]
  2. | "." decimals .
  3. decimals = decimal_digit { decimal_digit } .
Examples of floating-point literals
  1. 0.
  2. 72.40
  3. 072.40 // == 72.40
  4. 2.71828
  5. .26

IMPL#254 Parse float literals.

Duration literals

A duration literal is a representation of a length of time. It has an integer part and a duration unit part. Multiple durations may be specified together and the resulting duration is the sum of each smaller part. When several durations are specified together, larger units must appear before smaller ones, and there can be no repeated units.

  1. duration_lit = { int_lit duration_unit } .
  2. duration_unit = "y" | "mo" | "w" | "d" | "h" | "m" | "s" | "ms" | "us" | "µs" | "ns" .
UnitsMeaning
yyear (12 months)
momonth
wweek (7 days)
dday
hhour (60 minutes)
mminute (60 seconds)
ssecond
msmilliseconds (1 thousandth of a second)
us or µsmicroseconds (1 millionth of a second)
nsnanoseconds (1 billionth of a second)

Durations represent a length of time. Lengths of time are dependent on specific instants in time they occur and as such, durations do not represent a fixed amount of time. There are no amount of days equal to a month, as months vary in their number of days. Durations are a tuple of positive integers that represent a duration and the sign of the duration (positive or negative). Durations are implemented this way so it is possible to determine whether a duration is positive or negative. Since duration values depend on their context, the only way to know if a duration is a positive or negative number is if all magnitudes have the same sign. In the canonical implementation, this is implemented as a tuple of the months and nanoseconds and a boolean that indicates whether it is positive or negative. The spec does not prescribe a specific implementation and other implementations may use a different internal representation.

Durations cannot be combined by addition and subtraction. All magnitudes in the tuple must be a positive integer which cannot be guaranteed when using addition and subtraction. Durations can be multiplied by any integer value. The unary negative operator is the equivalent of multiplying the duration by -1. These operations are performed on each time unit independently.

Examples of duration literals
  1. 1s
  2. 10d
  3. 1h15m // 1 hour and 15 minutes
  4. 5w
  5. 1mo5d // 1 month and 5 days
  6. -1mo5d // negative 1 month and 5 days
  7. 5w * 2 // 10 weeks

Durations can be added to date times to produce a new date time.

Addition and subtraction of durations to date times do not commute and are left associative. Addition and subtraction of durations to date times applies months, days and seconds in that order. When months are added to a date times and the resulting date is past the end of the month, the day is rolled back to the last day of the month.

Examples of duration literals
  1. 2018-01-01T00:00:00Z + 1d // 2018-01-02T00:00:00Z
  2. 2018-01-01T00:00:00Z + 1mo // 2018-02-01T00:00:00Z
  3. 2018-01-01T00:00:00Z + 2mo // 2018-03-01T00:00:00Z
  4. 2018-01-31T00:00:00Z + 2mo // 2018-03-31T00:00:00Z
  5. 2018-02-28T00:00:00Z + 2mo // 2018-04-28T00:00:00Z
  6. 2018-01-31T00:00:00Z + 1mo // 2018-02-28T00:00:00Z, February 31th is rolled back to the last day of the month, February 28th in 2018.
  7. // Addition and subtraction of durations to date times does not commute
  8. 2018-02-28T00:00:00Z + 1mo + 1d // 2018-03-29T00:00:00Z
  9. 2018-02-28T00:00:00Z + 1mo + 1d // 2018-03-29T00:00:00Z
  10. 2018-02-28T00:00:00Z + 1d + 1mo // 2018-04-01T00:00:00Z
  11. 2018-01-01T00:00:00Z + 2mo - 1d // 2018-02-28T00:00:00Z
  12. 2018-01-01T00:00:00Z - 1d + 3mo // 2018-03-31T00:00:00Z
  13. 2018-01-31T00:00:00Z + 1mo + 1mo // 2018-03-28T00:00:00Z
  14. 2018-01-31T00:00:00Z + 2mo // 2018-03-31T00:00:00Z
  15. // Addition and subtraction of durations to date times applies months, days and seconds in that order.
  16. 2018-01-28T00:00:00Z + 1mo + 2d // 2018-03-02T00:00:00Z
  17. 2018-01-28T00:00:00Z + 1mo2d // 2018-03-02T00:00:00Z
  18. 2018-01-28T00:00:00Z + 2d + 1mo // 2018-02-28T00:00:00Z, explicit left associative add of 2d first changes the result
  19. 2018-02-01T00:00:00Z + 2mo2d // 2018-04-03T00:00:00Z
  20. 2018-01-01T00:00:00Z + 1mo30d // 2018-03-02T00:00:00Z, Months are applied first to get February 1st, then days are added resulting in March 2 in 2018.
  21. 2018-01-31T00:00:00Z + 1mo1d // 2018-03-01T00:00:00Z, Months are applied first to get February 28th, then days are added resulting in March 1 in 2018.
  22. // Multiplication works
  23. 2018-01-01T00:00:00Z + 1mo * 1 // 2018-02-01T00:00:00Z
  24. 2018-01-01T00:00:00Z + 1mo * 2 // 2018-03-01T00:00:00Z
  25. 2018-01-01T00:00:00Z + 1mo * 3 // 2018-04-01T00:00:00Z
  26. 2018-01-31T00:00:00Z + 1mo * 1 // 2018-02-28T00:00:00Z
  27. 2018-01-31T00:00:00Z + 1mo * 2 // 2018-03-31T00:00:00Z
  28. 2018-01-31T00:00:00Z + 1mo * 3 // 2018-04-30T00:00:00Z

IMPL#657 Implement Duration vectors.

Date and time literals

A date and time literal represents a specific moment in time. It has a date part, a time part and a time offset part. The format follows the RFC 3339 specification. The time is optional. When it is omitted, the time is assumed to be midnight for the default location. The time_offset is optional. When it is omitted, the location option is used to determine the offset.

  1. date_time_lit = date [ "T" time ] .
  2. date = year "-" month "-" day .
  3. year = decimal_digit decimal_digit decimal_digit decimal_digit .
  4. month = decimal_digit decimal_digit .
  5. day = decimal_digit decimal_digit .
  6. time = hour ":" minute ":" second [ fractional_second ] [ time_offset ] .
  7. hour = decimal_digit decimal_digit .
  8. minute = decimal_digit decimal_digit .
  9. second = decimal_digit decimal_digit .
  10. fractional_second = "." { decimal_digit } .
  11. time_offset = "Z" | ("+" | "-" ) hour ":" minute .
Examples of date and time literals
  1. 1952-01-25T12:35:51Z
  2. 2018-08-15T13:36:23-07:00
  3. 2009-10-15T09:00:00 // October 15th 2009 at 9 AM in the default location
  4. 2018-01-01 // midnight on January 1st 2018 in the default location

IMPL#152 Implement shorthand time literals.

String literals

A string literal represents a sequence of characters enclosed in double quotes. Within the quotes any character may appear except an unescaped double quote. String literals support several escape sequences.

  1. \n U+000A line feed or newline
  2. \r U+000D carriage return
  3. \t U+0009 horizontal tab
  4. \" U+0022 double quote
  5. \\ U+005C backslash
  6. \${ U+0024 U+007B dollar sign and opening curly bracket

Additionally, any byte value may be specified via a hex encoding using \x as the prefix.

  1. string_lit = `"` { unicode_value | byte_value | StringExpression | newline } `"` .
  2. byte_value = `\` "x" hex_digit hex_digit .
  3. hex_digit = "0" … "9" | "A" … "F" | "a" … "f" .
  4. unicode_value = unicode_char | escaped_char .
  5. escaped_char = `\` ( "n" | "r" | "t" | `\` | `"` ) .
  6. StringExpression = "${" Expression "}" .
Examples of string literals
  1. "abc"
  2. "string with double \" quote"
  3. "string with backslash \\"
  4. "日本語"
  5. "\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e" // the explicit UTF-8 encoding of the previous line

String literals are also interpolated for embedded expressions to be evaluated as strings. Embedded expressions are enclosed in a dollar sign and curly braces (${}). The expressions are evaluated in the scope containing the string literal. The result of an expression is formatted as a string and replaces the string content between the braces. All types are formatted as strings according to their literal representation. A function printf exists to allow more precise control over formatting of various types. To include the literal ${ within a string, it must be escaped.

IMPL#248 Add printf function.

Example: Interpolation
  1. n = 42
  2. "the answer is ${n}" // the answer is 42
  3. "the answer is not ${n+1}" // the answer is not 43
  4. "dollar sign opening curly bracket \${" // dollar sign opening curly bracket ${

IMPL#1775 Interpolate arbitrary expressions in string literals

Regular expression literals

A regular expression literal represents a regular expression pattern, enclosed in forward slashes. Within the forward slashes, any unicode character may appear except for an unescaped forward slash. The \x hex byte value representation from string literals may also be present.

Regular expression literals support only the following escape sequences:

  1. \/ U+002f forward slash
  2. \\ U+005c backslash
  1. regexp_lit = "/" { unicode_char | byte_value | regexp_escape_char } "/" .
  2. regexp_escape_char = `\` (`/` | `\`)
Examples of regular expression literals
  1. /.*/
  2. /http:\/\/localhost:8086/
  3. /^\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e(ZZ)?$/
  4. /^日本語(ZZ)?$/ // the above two lines are equivalent
  5. /\\xZZ/ // this becomes the literal pattern "\xZZ"

The regular expression syntax is defined by RE2.