4 内部实现

4.1 Bytecode

The compiler generates bytecode directly with no intermediate representation such as a parse tree, hence it is very fast. Several optimizations passes are done over the generated bytecode.

A stack-based bytecode was chosen because it is simple and generates compact code.

For each function, the maximum stack size is computed at compile time so that no runtime stack overflow tests are needed.

A separate compressed line number table is maintained for the debug information.

Access to closure variables is optimized and is almost as fast as local variables.

Direct eval in strict mode is optimized.

4.2 Executable generation

4.2.1 qjsc 编译器

The qjsc compiler generates C sources from Javascript files. By default the C sources are compiled with the system compiler (gcc or clang).

The generated C source contains the bytecode of the compiled functions or modules. If a full complete executable is needed, it also contains a main() function with the necessary C code to initialize the Javascript engine and to load and execute the compiled functions and modules.

Javascript code can be mixed with C modules.

In order to have smaller executables, specific Javascript features can be disabled, in particular eval or the regular expressions. The code removal relies on the Link Time Optimization of the system compiler.

4.2.2 二进制 JSON

qjsc works by compiling scripts or modules and then serializing them to a binary format. A subset of this format (without functions or modules) can be used as binary JSON. The example test_bjson.js shows how to use it.

Warning: the binary JSON format may change without notice, so it should not be used to store persistent data. The test_bjson.js example is only used to test the binary object format functions.

4.3 运行时

4.3.1 Strings

字符串存储为8位或16位字符数组。因此,随机访问字符总是很快。

C API提供将Javascript字符串转换为C UTF-8编码字符串的函数。最常见情况是 Javascript字符串仅包含ASCII 字符串不涉及复制。

4.3.2 Objects

The object shapes (object prototype, property names and flags) are shared between objects to save memory.

Arrays with no holes (except at the end of the array) are optimized.

TypedArray访问已优化。

4.3.3 Atoms

Object property names and some strings are stored as Atoms (unique strings) to save memory and allow fast comparison. Atoms are represented as a 32 bit integer. Half of the atom range is reserved for immediate integer literals from 0 to 2^{31}-1.

4.3.4 Numbers

Numbers are represented either as 32-bit signed integers or 64-bit IEEE-754 floating point values. Most operations have fast paths for the 32-bit integer case.

4.3.5 垃圾回收

引用计数用于自动和准确地释放对象。A separate cycle removal pass is done when the allocated memory becomes too large. The cycle removal algorithm only uses the reference counts and the object content, so no explicit garbage collection roots need to be manipulated in the C code.

4.3.6 JSValue

It is a Javascript value which can be a primitive type (such as Number, String, …) or an Object. NaN boxing is used in the 32-bit version to store 64-bit floating point numbers. The representation is optimized so that 32-bit integers and reference counted values can be efficiently tested.

In 64-bit code, JSValue are 128-bit large and no NaN boxing is used. The rationale is that in 64-bit code memory usage is less critical.

In both cases (32 or 64 bits), JSValue exactly fits two CPU registers, so it can be efficiently returned by C functions.

4.3.7 函数调用

引擎已经过优化,因此函数调用很快。系统堆栈包含Javascript参数和局部变量。

4.4 RegExp

开发了一个特定的正则表达式引擎。它既小又高效,并支持所有ES2019功能,包括Unicode属性。作为Javascript编译器,它直接生成没有解析树的字节码。

使用显式堆栈的回溯使得系统堆栈上没有递归。简单的量化器经过专门优化,以避免递归。

来自具有空项的量化器的无限递归被避免。

完整的正则表达式文件库的权重约为15 KiB(x86代码),不包括Unicode库。

4.5 Unicode

开发了一个特定的Unicode库,因此不依赖于外部大型Unicode库,例如ICU。压缩所有Unicode表,同时保持合理的访问速度。

该库支持大小写转换,Unicode规范化,Unicode脚本查询,Unicode常规类别查询和所有Unicode二进制属性。

完整的Unicode库大约重量为45 KiB(x86代码)。

4.6 BigInt 和 BigFloat

BigInt 和 BigFloat 是用libbflibbf 库实现的4。 它大概有60 KiB (x86 代码) 并提供任意精度的IEEE 754 浮点运算和具有精确舍入的超越函数。