Goals and Architecture Overview

This document gives an overview of the goals and architecture of PyPy. If you’reinterested in using PyPy or hacking on it,have a look at our getting started section.

Mission statement

We aim to provide a compliant, flexible and fast implementation of the PythonLanguage which uses the RPython toolchain to enable new advanced high-levelfeatures without having to encode the low-level details. We call this PyPy.

High Level Goals

Our main motivation for developing the translation framework is toprovide a full featured, customizable, fast andvery compliant Pythonimplementation, working on and interacting with a large variety ofplatforms and allowing the quick introduction of new advanced languagefeatures.

This Python implementation is written in RPython as a relatively simpleinterpreter, in some respects easier to understand than CPython, the Creference implementation of Python. We are using its high level andflexibility to quickly experiment with features or implementationtechniques in ways that would, in a traditional approach, requirepervasive changes to the source code. For example, PyPy’s Pythoninterpreter can optionally provide lazily computed objects - a smallextension that would require global changes in CPython. Another exampleis the garbage collection technique: changing CPython to use a garbagecollector not based on reference counting would be a major undertaking,whereas in PyPy it is an issue localized in the translation framework,and fully orthogonal to the interpreter source code.

PyPy Python Interpreter

PyPy’s Python Interpreter is written in RPython and implements thefull Python language. This interpreter very closely emulates thebehavior of CPython. It contains the following key components:

  • a bytecode compiler responsible for producing Python code objectsfrom the source code of a user application;
  • a bytecode evaluator responsible for interpretingPython code objects;
  • a standard object space, responsible for creating and manipulatingthe Python objects seen by the application.

The bytecode compiler is the preprocessing phase that produces acompact bytecode format via a chain of flexible passes (tokenizer,lexer, parser, abstract syntax tree builder, bytecode generator). Thebytecode evaluator interprets this bytecode. It does most of its workby delegating all actual manipulations of user objects to the objectspace. The latter can be thought of as the library of built-in types.It defines the implementation of the user objects, like integers andlists, as well as the operations between them, like addition ortruth-value-testing.

This division between bytecode evaluator and object space gives a lot offlexibility. One can plug in different object spaces to getdifferent or enriched behaviours of the Python objects.

Layers

RPython

RPython is the language in which we write interpreters.Not the entire PyPy project is written in RPython, only the parts that arecompiled in the translation process. The interesting point is that RPythonhas no parser, it’s compiled from the live python objects, which makes itpossible to do all kinds of metaprogramming during import time. In short,Python is a meta programming language for RPython.

The RPython standard library is to be found in the rlib subdirectory.

Consult Getting Started with RPython for further reading or RPython ByExample for another take on what can be done using RPython without writing aninterpreter over it.

Translation

The translation toolchain - this is the part that takes care of translatingRPython to flow graphs and then to C. There is more in thearchitecture document written about it.

It lives in the rpython directory: flowspace, annotatorand rtyper.

PyPy Interpreter

This is in the pypy directory. pypy/interpreter is a standardinterpreter for Python written in RPython. The fact that it isRPython is not apparent at first. Built-in modules are written inpypy/module/*. Some modules that CPython implements in C aresimply written in pure Python; they are in the top-level lib_pypydirectory. The standard library of Python (with a few changes toaccomodate PyPy) is in lib-python.

JIT Compiler

Just-in-Time Compiler (JIT): we have a tracing JIT that traces theinterpreter written in RPython, rather than the user program that itinterprets. As a result it applies to any interpreter, i.e. anylanguage. But getting it to work correctly is not trivial: itrequires a small number of precise “hints” and possibly some smallrefactorings of the interpreter. The JIT itself also has severalalmost-independent parts: the tracer itself in rpython/jit/metainterp, theoptimizer in rpython/jit/metainterp/optimizer that optimizes a list ofresidual operations, and the backend in rpython/jit/backend/<machine-name>that turns it into machine code. Writing a new backend is atraditional way to get into the project.

Garbage Collectors

Garbage Collectors (GC): as you may notice if you are used to CPython’sC code, there are no Py_INCREF/Py_DECREF equivalents in RPython code.Garbage Collection in RPython is insertedduring translation. Moreover, this is not reference counting; it is a realGC written as more RPython code. The best one we have so far is inrpython/memory/gc/incminimark.py.