2.1 Character Set

The syntax of Erlang tokens allow the use of the full ISO-8859-1 (Latin-1) character set. This is noticeable in the following ways:

  • All the Latin-1 printable characters can be used and are shown without the escape backslash convention.

  • Atoms and variables can use all Latin-1 letters.

OctalDecimal Class
200 - 237128 - 159 Control characters
240 - 277160 - 191- ¿Punctuation characters
300 - 326192 - 214À - ÖUppercase letters
327215×Punctuation character
330 - 336216 - 222Ø - ÞUppercase letters
337 - 366223 - 246ß - öLowercase letters
367247÷Punctuation character
370 - 377248 - 255ø - ÿLowercase letters

Table 2.1: Character Classes

In Erlang/OTP R16B the syntax of Erlang tokens was extended to handle Unicode. The support was limited to string literals and comments. More about the usage of Unicode in Erlang source files can be found in STDLIB's User's Guide.

From Erlang/OTP 20, atoms and function names are also allowed to contain Unicode characters outside the ISO-Latin-1 range. Module names, application names, and node names are still restricted to the ISO-Latin-1 range.