Unicode Identifiers

ECMAScript 6 offers better Unicode support than previous versions of JavaScript, and it also changes what characters may be used as identifiers. In ECMAScript 5, it was already possible to use Unicode escape sequences for identifiers. For example:

  1. // Valid in ECMAScript 5 and 6
  2. var \u0061 = "abc";
  3. console.log(\u0061); // "abc"
  4. // equivalent to:
  5. console.log(a); // "abc"

After the var statement in this example, you can use either \u0061 or a to access the variable. In ECMAScript 6, you can also use Unicode code point escape sequences as identifiers, like this:

  1. // Valid in ECMAScript 5 and 6
  2. var \u{61} = "abc";
  3. console.log(\u{61}); // "abc"
  4. // equivalent to:
  5. console.log(a); // "abc"

This example just replaces \u0061 with its code point equivalent. Otherwise, it does exactly the same thing as the previous example.

Additionally, ECMAScript 6 formally specifies valid identifiers in terms of Unicode Standard Annex #31: Unicode Identifier and Pattern Syntax, which gives the following rules:

  1. The first character must be $, _, or any Unicode symbol with a derived core property of ID_Start.
  2. Each subsequent character must be $, _, \u200c (a zero-width non-joiner), \u200d (a zero-width joiner), or any Unicode symbol with a derived core property of ID_Continue.

The ID_Start and ID_Continue derived core properties are defined in Unicode Identifier and Pattern Syntax as a way to identify symbols that are appropriate for use in identifiers such as variables and domain names. The specification is not specific to JavaScript.