Chapter 3. Lexical Analysis (Tokenization)

Esprima tokenizer takes a string as an input and produces an array of tokens, a list of object representing categorized input characters. This is known as lexical analysis.

The interface of the tokenize function is as follows:

  1. esprima.tokenize(input, config)

where

  • input is a string representing the program to be tokenized
  • config is an object used to customize the parsing behavior (optional)

The input argument is mandatory. Its type must be a string, otherwise the tokenization behavior is not determined.

The description of various properties of config is summarized in the following table:

NameTypeDefaultDescription
rangeBooleanfalseAnnotate each token with its zero-based start and end location
locBooleanfalseAnnotate each token with its column and row-based location
commentBooleanfalseInclude every line and block comment in the output

An example Node.js REPL session that demonstrates the use of Esprima tokenizer is:

  1. $ node
  2. > var esprima = require('esprima')
  3. > esprima.tokenize('answer = 42')
  4. [ { type: 'Identifier', value: 'answer' },
  5. { type: 'Punctuator', value: '=' },
  6. { type: 'Numeric', value: '42' } ]

In the above example, the input string is tokenized into 3 tokens: an identifier, a punctuator, and a number. For each token, the type property is a string indicating the type of the token and the value property stores the corresponding the lexeme, i.e. a string of characters which forms a syntactic unit.

Unlike the parse function, the tokenize function can work with an input string that does not represent a valid JavaScript program. This is because lexical analysis, as the name implies, does not involve the process of understanding the syntactic structure of the input.

  1. $ node
  2. > var esprima = require('esprima')
  3. > esprima.tokenize('42 = answer')
  4. [ { type: 'Numeric', value: '42' },
  5. { type: 'Punctuator', value: '=' },
  6. { type: 'Identifier', value: 'answer' } ]
  7. > esprima.tokenize('while (if {}')
  8. [ { type: 'Keyword', value: 'while' },
  9. { type: 'Punctuator', value: '(' },
  10. { type: 'Keyword', value: 'if' },
  11. { type: 'Punctuator', value: '{' },
  12. { type: 'Punctuator', value: '}' } ]