Other Regular Expression Changes

Regular expressions are an important part of working with strings in JavaScript, and like many parts of the language, they haven’t changed much in recent versions. ECMAScript 6, however, makes several improvements to regular expressions to go along with the updates to strings.

The Regular Expression y Flag

ECMAScript 6 standardized the y flag after it was implemented in Firefox as a proprietary extension to regular expressions. The y flag affects a regular expression search’s sticky property, and it tells the search to start matching characters in a string at the position specified by the regular expression’s lastIndex property. If there is no match at that location, then the regular expression stops matching. To see how this works, consider the following code:

  1. var text = "hello1 hello2 hello3",
  2. pattern = /hello\d\s?/,
  3. result = pattern.exec(text),
  4. globalPattern = /hello\d\s?/g,
  5. globalResult = globalPattern.exec(text),
  6. stickyPattern = /hello\d\s?/y,
  7. stickyResult = stickyPattern.exec(text);
  8. console.log(result[0]); // "hello1 "
  9. console.log(globalResult[0]); // "hello1 "
  10. console.log(stickyResult[0]); // "hello1 "
  11. pattern.lastIndex = 1;
  12. globalPattern.lastIndex = 1;
  13. stickyPattern.lastIndex = 1;
  14. result = pattern.exec(text);
  15. globalResult = globalPattern.exec(text);
  16. stickyResult = stickyPattern.exec(text);
  17. console.log(result[0]); // "hello1 "
  18. console.log(globalResult[0]); // "hello2 "
  19. console.log(stickyResult[0]); // Error! stickyResult is null

This example has three regular expressions. The expression in pattern has no flags, the one in globalPattern uses the g flag, and the one in stickyPattern uses the y flag. In the first trio of console.log() calls, all three regular expressions should return "hello1 " with a space at the end.

After that, the lastIndex property is changed to 1 on all three patterns, meaning that the regular expression should start matching from the second character on all of them. The regular expression with no flags completely ignores the change to lastIndex and still matches "hello1 " without incident. The regular expression with the g flag goes on to match "hello2 " because it is searching forward from the second character of the string ("e"). The sticky regular expression doesn’t match anything beginning at the second character so stickyResult is null.

The sticky flag saves the index of the next character after the last match in lastIndex whenever an operation is performed. If an operation results in no match, then lastIndex is set back to 0. The global flag behaves the same way, as demonstrated here:

  1. var text = "hello1 hello2 hello3",
  2. pattern = /hello\d\s?/,
  3. result = pattern.exec(text),
  4. globalPattern = /hello\d\s?/g,
  5. globalResult = globalPattern.exec(text),
  6. stickyPattern = /hello\d\s?/y,
  7. stickyResult = stickyPattern.exec(text);
  8. console.log(result[0]); // "hello1 "
  9. console.log(globalResult[0]); // "hello1 "
  10. console.log(stickyResult[0]); // "hello1 "
  11. console.log(pattern.lastIndex); // 0
  12. console.log(globalPattern.lastIndex); // 7
  13. console.log(stickyPattern.lastIndex); // 7
  14. result = pattern.exec(text);
  15. globalResult = globalPattern.exec(text);
  16. stickyResult = stickyPattern.exec(text);
  17. console.log(result[0]); // "hello1 "
  18. console.log(globalResult[0]); // "hello2 "
  19. console.log(stickyResult[0]); // "hello2 "
  20. console.log(pattern.lastIndex); // 0
  21. console.log(globalPattern.lastIndex); // 14
  22. console.log(stickyPattern.lastIndex); // 14

The value of lastIndex changes to 7 after the first call to exec() and to 14 after the second call, for both the stickyPattern and globalPattern variables.

There are two more subtle details about the sticky flag to keep in mind:

  1. The lastIndex property is only honored when calling methods that exist on the regular expression object, like the exec() and test() methods. Passing the regular expression to a string method, such as match(), will not result in the sticky behavior.
  2. When using the ^ character to match the start of a string, sticky regular expressions only match from the start of the string (or the start of the line in multiline mode). While lastIndex is 0, the ^ makes a sticky regular expression no different from a non-sticky one. If lastIndex doesn’t correspond to the beginning of the string in single-line mode or the beginning of a line in multiline mode, the sticky regular expression will never match.

As with other regular expression flags, you can detect the presence of y by using a property. In this case, you’d check the sticky property, as follows:

  1. var pattern = /hello\d/y;
  2. console.log(pattern.sticky); // true

The sticky property is set to true if the sticky flag is present, and the property is false if not. The sticky property is read-only based on the presence of the flag and cannot be changed in code.

Similar to the u flag, the y flag is a syntax change, so it will cause a syntax error in older JavaScript engines. You can use the following approach to detect support:

  1. function hasRegExpY() {
  2. try {
  3. var pattern = new RegExp(".", "y");
  4. return true;
  5. } catch (ex) {
  6. return false;
  7. }
  8. }

Just like the u check, this returns false if it’s unable to create a regular expression with the y flag. In one final similarity to u, if you need to use y in code that runs in older JavaScript engines, be sure to use the RegExp constructor when defining those regular expressions to avoid a syntax error.

Duplicating Regular Expressions

In ECMAScript 5, you can duplicate regular expressions by passing them into the RegExp constructor like this:

  1. var re1 = /ab/i,
  2. re2 = new RegExp(re1);

The re2 variable is just a copy of the re1 variable. But if you provide the second argument to the RegExp constructor, which specifies the flags for the regular expression, your code won’t work, as in this example:

  1. var re1 = /ab/i,
  2. // throws an error in ES5, okay in ES6
  3. re2 = new RegExp(re1, "g");

If you execute this code in an ECMAScript 5 environment, you’ll get an error stating that the second argument cannot be used when the first argument is a regular expression. ECMAScript 6 changed this behavior such that the second argument is allowed and overrides any flags present on the first argument. For example:

  1. var re1 = /ab/i,
  2. // throws an error in ES5, okay in ES6
  3. re2 = new RegExp(re1, "g");
  4. console.log(re1.toString()); // "/ab/i"
  5. console.log(re2.toString()); // "/ab/g"
  6. console.log(re1.test("ab")); // true
  7. console.log(re2.test("ab")); // true
  8. console.log(re1.test("AB")); // true
  9. console.log(re2.test("AB")); // false

In this code, re1 has the case-insensitive i flag while re2 has only the global g flag. The RegExp constructor duplicated the pattern from re1 and substituted the g flag for the i flag. Without the second argument, re2 would have the same flags as re1.

The flags Property

Along with adding a new flag and changing how you can work with flags, ECMAScript 6 added a property associated with them. In ECMAScript 5, you could get the text of a regular expression by using the source property, but to get the flag string, you’d have to parse the output of the toString() method as shown below:

  1. function getFlags(re) {
  2. var text = re.toString();
  3. return text.substring(text.lastIndexOf("/") + 1, text.length);
  4. }
  5. // toString() is "/ab/g"
  6. var re = /ab/g;
  7. console.log(getFlags(re)); // "g"

This converts a regular expression into a string and then returns the characters found after the last /. Those characters are the flags.

ECMAScript 6 makes fetching flags easier by adding a flags property to go along with the source property. Both properties are prototype accessor properties with only a getter assigned, making them read-only. The flags property makes inspecting regular expressions easier for both debugging and inheritance purposes.

A late addition to ECMAScript 6, the flags property returns the string representation of any flags applied to a regular expression. For example:

  1. var re = /ab/g;
  2. console.log(re.source); // "ab"
  3. console.log(re.flags); // "g"

This fetches all flags on re and prints them to the console with far fewer lines of code than the toString() technique can. Using source and flags together allows you to extract the pieces of the regular expression that you need without parsing the regular expression string directly.

The changes to strings and regular expressions that this chapter has covered so far are definitely powerful, but ECMAScript 6 improves your power over strings in a much bigger way. It brings a type of literal to the table that makes strings more flexible.