Chapter 3. Lexical Analysis (Tokenization) - Token Location - 《Esprima 3.1 Document》

Token Location

Token Location

By default, each token in the array returned by the tokenizer only has two properties, the type of the token and the lexeme. For some use cases, the location of each token needs to be known as well (e.g. to offer a meaningful feedback to the user). Esprima tokenizer can add that location information to each token in two forms, zero-based range and line-column location. This is done by customizing the tokenization process with the configuration object.

Setting range (in the configuration object) to true adds a new property, range, to each token. It is an array of two elements, each indicating the zero-based index of the starting and end location (exclusive) of the token. A simple example follows:

$ node
> var esprima = require('esprima')
> esprima.tokenize('answer = 42', { range: true })
[ { type: 'Identifier', value: 'answer', range: [ 0, 6 ] },
  { type: 'Punctuator', value: '=', range: [ 7, 8 ] },
  { type: 'Numeric', value: '42', range: [ 9, 11 ] } ]

In the above example, the starting and end location of each token can be determined from its range property. For instance, the equal sign (=) is the 7th character in the input string, because its range is [7, 8].

Setting loc to true adds a new property, loc, to each token. It is a object that contains the line number and column number of the starting and end location (exclusive) of the token. This is illustrated in the example:

$ node
> var esprima = require('esprima')
> tokens = esprima.tokenize('answer = 42', { loc: true });
> tokens[2]
{ type: 'Numeric',
  value: '42',
  loc: { start: { line: 1, column: 9 }, end: { line: 1, column: 11 } } }

Note that the line number is one-based while the column number is zero-based.

It is possible to set both range and loc to true, thereby giving each token the most complete location information.