Representation

Source code is encoded in UTF-8. The text need not be canonicalized.

Characters

This document will use the term character to refer to a Unicode code point.

The following terms are used to denote specific Unicode character classes:

  1. newline = /* the Unicode code point U+000A */ .
  2. unicode_char = /* an arbitrary Unicode code point except newline */ .
  3. unicode_letter = /* a Unicode code point classified as "Letter" */ .
  4. unicode_digit = /* a Unicode code point classified as "Number, decimal digit" */ .

In The Unicode Standard 8.0, Section 4.5, “General Category” defines a set of character categories. Flux treats all characters in any of the Letter categories (Lu, Ll, Lt, Lm, or Lo) as Unicode letters, and those in the Number category (Nd) as Unicode digits.

Letters and digits

The underscore character _ (U+005F) is considered a letter.

  1. letter = unicode_letter | "_" .
  2. decimal_digit = "0" "9" .