PyPy Parser

Overview

The PyPy parser includes a tokenizer and a recursive descent parser.

Tokenizer

At the moment, the tokenizer is implemented as a single function(generate_tokens in pypy/interpreter/pyparser/pytokenizer.py) that buildsa list of tokens. The tokens are then fed to the parser.

Parser

The parser is a simple LL(1) parser that is similar to CPython’s.

Building the Python grammar

The python grammar is built at startup from the pristine CPython grammar file(see pypy/interpreter/pyparser/metaparser.py). The grammar builder firstrepresents the grammar as rules corresponding to a set of NondeterministicFinite Automatons (NFAs). It then converts them to a set of DeterministicFinite Automatons (DFAs). The difference between a NFA and a DFA is that a NFAmay have several possible next states for any given input while a DFA may onlyhave one. DFAs are therefore more limiting, but far more efficient to use inparsing. Finally, the assigns the grammar builder assigns each DFA state anumber and packs them into a list for the parser to use. The final product isan instance of the Grammar class in pypy/interpreter/pyparser/parser.py.

Parser implementation

The workhorse of the parser is the add_token method of the Parser class.It tries to find a transition from the current state to another state based onthe token it receives as a argument. If it can’t find a transition, it checksif the current state is accepting. If it’s not, a ParseError israised. When parsing is done without error, the parser has built a tree ofNode.

Parsing Python

The glue code between the tokenizer and the parser as well as extra Pythonspecific code is in pypy/interpreter/pyparser/pyparse.py. Theparsesource method takes a string of Python code and returns the parsetree. It also detects the coding cookie if there is one and decodes the source.Note that _future imports are handled before the parser is invoked bymanually parsing the source in pypy/interpreter/pyparser/future.py.

Compiler

The next step in generating Python bytecode is converting the parse tree into anAbstract Syntax Tree (AST).

Building AST

Python’s AST is described in pypy/interpreter/astcompiler/tools/Python.asdl.From this definition, pypy/interpreter/astcompiler/tools/asdl_py.py generatespypy/interpreter/astcompiler/ast.py, which RPython classes for the compileras well as bindings to application level code for the AST. Some customextensions to the AST classes are inpypy/interpreter/astcompiler/asthelpers.py.

pypy/interpreter/astcompiler/astbuilder.py is responsible for convertingparse trees into AST. It walks down the parse tree building nodes as it goes.The result is a toplevel mod node.

AST Optimization

pypy/interpreter/astcompiler/optimize.py contains the AST optimizer. It doesconstant folding of expressions, and other simple transformations like making aload of the name “None” into a constant.

Symbol analysis

Before writing bytecode, a symbol table is built inpypy/interpreter/astcompiler/symtable.py. It determines if every name in thesource is local, implicitly global (no global declaration), explicitly global(there’s a global declaration of the name in the scope), a cell (the name inused in nested scopes), or free (it’s used in a nested function).

Bytecode generation

Bytecode is emitted in pypy/interpreter/astcompiler/codegen.py. Eachbytecode is represented temporarily by the Instruction class inpypy/interpreter/astcompiler/assemble.py. After all bytecodes have beenemitted, it’s time to build the code object. Jump offsets and bytecodeinformation like the line number table and stack depth are computed. Finally,everything is passed to a brand new PyCode object.