Bytecode Interpreter

Introduction and Overview

This document describes the implementation of PyPy’sBytecode Interpreter and related Virtual Machine functionalities.

PyPy’s bytecode interpreter has a structure reminiscent of CPython’sVirtual Machine: It processes code objects parsed and compiled fromPython source code. It is implemented in the pypy/interpreter/ directory.People familiar with the CPython implementation will easily recognizesimilar concepts there. The major differences are the overall usage ofthe object space indirection to perform operations on objects, andthe organization of the built-in modules (described here).

Code objects are a nicely preprocessed, structured representation ofsource code, and their main content is bytecode. We use the samecompact bytecode format as CPython 2.7, with minor differences in the bytecodeset. Our bytecode compiler isimplemented as a chain of flexible passes (tokenizer, lexer, parser,abstract syntax tree builder and bytecode generator). The latter passesare based on the compiler package from the standard library ofCPython, with various improvements and bug fixes. The bytecode compiler(living under pypy/interpreter/astcompiler/) is now integrated and istranslated with the rest of PyPy.

Code objects containcondensed information about their respective functions, class andmodule body source codes. Interpreting such code objects meansinstantiating and initializing a Frame class and thencalling its frame.eval() method. This main entry pointinitialize appropriate namespaces and then interprets eachbytecode instruction. Python’s standard library containsthe lib-python/2.7/dis.py module which allows to inspectionof the virtual machine’s bytecode instructions:

  1. >>> import dis
  2. >>> def f(x):
  3. ... return x + 1
  4. >>> dis.dis(f)
  5. 2 0 LOAD_FAST 0 (x)
  6. 3 LOAD_CONST 1 (1)
  7. 6 BINARY_ADD
  8. 7 RETURN_VALUE

CPython and PyPy are stack-based virtual machines, i.e.they don’t have registers but instead push object to and pull objectsfrom a stack. The bytecode interpreter is only responsiblefor implementing control flow and pushing and pulling blackbox objects to and from this value stack. The bytecode interpreterdoes not know how to perform operations on those black box(wrapped) objects for which it delegates to the objectspace. In order to implement a conditional branch in a program’sexecution, however, it needs to gain minimal knowledge about awrapped object. Thus, each object space has to offer ais_true(w_obj) operation which returns aninterpreter-level boolean value.

For the understanding of the interpreter’s inner workings itis crucial to recognize the concepts of interpreter-level andapplication-level code. In short, interpreter-level is executeddirectly on the machine and invoking application-level functionsleads to an bytecode interpretation indirection. However,special care must be taken regarding exceptions becauseapplication level exceptions are wrapped into OperationErrorswhich are thus distinguished from plain interpreter-level exceptions.See application level exceptions for some more informationon OperationErrors.

The interpreter implementation offers mechanisms to allow acaller to be unaware of whether a particular function invocationleads to bytecode interpretation or is executed directly atinterpreter-level. The two basic kinds of Gateway classesexpose either an interpreter-level function toapplication-level execution (interp2app) or allowtransparent invocation of application-level helpers(app2interp) at interpreter-level.

Another task of the bytecode interpreter is to care for exposing itsbasic code, frame, module and function objects to application-levelcode. Such runtime introspection and modification abilities areimplemented via interpreter descriptors (also see Raymond Hettingershow-to guide for descriptors in Python, PyPy uses this model extensively).

A significant complexity lies in function argument parsing. Python as alanguage offers flexible ways of providing and receiving argumentsfor a particular function invocation. Not only does it take special careto get this right, it also presents difficulties for the annotationpass which performs a whole-program analysis on thebytecode interpreter, argument parsing and gatewaying codein order to infer the types of all values flowing across functioncalls.

It is for this reason that PyPy resorts to generatespecialized frame classes and functions at initializationtime in order to let the annotator only see rather staticprogram flows with homogeneous name-value assignments onfunction invocations.

Bytecode Interpreter Implementation Classes

Frame classes

The concept of Frames is pervasive in executing programs andon virtual machines in particular. They are sometimes calledexecution frame because they hold crucial informationregarding the execution of a Code object, which in turn isoften directly related to a Python Function. Frameinstances hold the following state:

  • the local scope holding name-value bindings, usually implementedvia a “fast scope” which is an array of wrapped objects
  • a blockstack containing (nested) information regarding thecontrol flow of a function (such as while and try constructs)
  • a value stack where bytecode interpretation pulls objectfrom and puts results on. (locals_stack_w is actually a singlelist containing both the local scope and the value stack.)
  • a reference to the globals dictionary, containingmodule-level name-value bindings
  • debugging information from which a current line-number andfile location can be constructed for tracebacks

Moreover the Frame class itself has a number of methods which implementthe actual bytecodes found in a code object. The methods of the PyFrameclass are added in various files:

Code Class

PyPy’s code objects contain the same information found in CPython’s code objects.They differ from Function objects in that they are only immutable representationsof source code and don’t contain execution state or references to the executionenvironment found in Frames. Frames and Functions have referencesto a code object. Here is a list of Code attributes:

  • co_flags flags if this code object has nested scopes/generators/etc.
  • co_stacksize the maximum depth the stack can reach while executing the code
  • co_code the actual bytecode string
  • co_argcount number of arguments this code object expects
  • co_varnames a tuple of all argument names pass to this code object
  • co_nlocals number of local variables
  • co_names a tuple of all names used in the code object
  • co_consts a tuple of prebuilt constant objects (“literals”) used in the code object
  • co_cellvars a tuple of Cells containing values for access from nested scopes
  • co_freevars a tuple of Cell names from “above” scopes
  • co_filename source file this code object was compiled from
  • co_firstlineno the first linenumber of the code object in its source file
  • co_name name of the code object (often the function name)
  • co_lnotab a helper table to compute the line-numbers corresponding to bytecodes

Function and Method classes

The PyPy Function class (in pypy/interpreter/function.py)represents a Python function. A Function carries the followingmain attributes:

  • func_doc the docstring (or None)
  • func_name the name of the function
  • func_code the Code object representing the function source code
  • func_defaults default values for the function (built at function definition time)
  • func_dict dictionary for additional (user-defined) function attributes
  • func_globals reference to the globals dictionary
  • func_closure a tuple of Cell references

Functions classes also provide a get descriptor which creates a Methodobject holding a binding to an instance or a class. Finally, Functionsand Methods both offer a call_args() method which executesthe function given an Arguments class instance.

Arguments Class

The Argument class (in pypy/interpreter/argument.py) isresponsible for parsing arguments passed to functions.Python has rather complex argument-passing concepts:

  • positional arguments
  • keyword arguments specified by name
  • default values for positional arguments, defined at functiondefinition time
  • “star args” allowing a function to accept remainingpositional arguments
  • “star keyword args” allow a function to accept additionalarbitrary name-value bindings

Moreover, a Function object can get bound to a class or instancein which case the first argument to the underlying function becomesthe bound object. The Arguments provides means to allow allthis argument parsing and also cares for error reporting.

Module Class

A Module instance represents execution state usually constructedfrom executing the module’s source file. In addition to such a module’sglobal dict dictionary it has the following application levelattributes:

  • doc the docstring of the module
  • file the source filename from which this module was instantiated
  • path state used for relative imports

Apart from the basic Module used for importingapplication-level files there is a more refinedMixedModule class (see pypy/interpreter/mixedmodule.py)which allows to define name-value bindings both at applicationlevel and at interpreter level. See the builtinmodule’s pypy/module/builtin/init.py file for anexample and the higher level chapter on Modules in the codingguide.

Gateway classes

A unique PyPy property is the ability to easily cross the barrierbetween interpreted and machine-level code (often referred to asthe difference between interpreter-level and application-level).Be aware that the according code (in pypy/interpreter/gateway.py)for crossing the barrier in both directions is somewhatinvolved, mostly due to the fact that the type-inferringannotator needs to keep track of the types of objects flowingacross those barriers.

Making interpreter-level functions available at application-level

In order to make an interpreter-level function available atapplication level, one invokes pypy.interpreter.gateway.interp2app(func).Such a function usually takes a space argument and any numberof positional arguments. Additionally, such functions can definean unwrapspec telling the interp2app logic howapplication-level provided arguments should be unwrappedbefore the actual interpreter-level function is invoked.For example, interpreter descriptors such as the Module._newmethod for allocating and constructing a Module instance aredefined with such code:

  1. Module.typedef = TypeDef("module",
  2. __new__ = interp2app(Module.descr_module__new__.im_func,
  3. unwrap_spec=[ObjSpace, W_Root, Arguments]),
  4. __init__ = interp2app(Module.descr_module__init__),
  5. # module dictionaries are readonly attributes
  6. __dict__ = GetSetProperty(descr_get_dict, cls=Module),
  7. __doc__ = 'module(name[, doc])\n\nCreate a module object...'
  8. )

The actual Module.descrmodulenew interpreter-level methodreferenced from the _new keyword argument above is definedlike this:

  1. def descr_module__new__(space, w_subtype, __args__):
  2. module = space.allocate_instance(Module, w_subtype)
  3. Module.__init__(module, space, None)
  4. return space.wrap(module)

Summarizing, the interp2app mechanism takes care to routean application level access or call to an internal interpreter-levelobject appropriately to the descriptor, providing enough precisionand hints to keep the type-inferring annotator happy.

Calling into application level code from interpreter-level

Application level code is often preferable. Therefore,we often like to invoke application level code from interpreter-level.This is done via the Gateway’s app2interp mechanismwhich we usually invoke at definition time in a module.It generates a hook which looks like an interpreter-levelfunction accepting a space and an arbitrary number of arguments.When calling a function at interpreter-level the caller sidedoes usually not need to be aware if its invoked functionis run through the PyPy interpreter or if it will directlyexecute on the machine (after translation).

Here is an example showing how we implement the Metaclassfinding algorithm of the Python language in PyPy:

  1. app = gateway.applevel(r'''
  2. def find_metaclass(bases, namespace, globals, builtin):
  3. if '__metaclass__' in namespace:
  4. return namespace['__metaclass__']
  5. elif len(bases) > 0:
  6. base = bases[0]
  7. if hasattr(base, '__class__'):
  8. return base.__class__
  9. else:
  10. return type(base)
  11. elif '__metaclass__' in globals:
  12. return globals['__metaclass__']
  13. else:
  14. try:
  15. return builtin.__metaclass__
  16. except AttributeError:
  17. return type
  18. ''', filename=__file__)
  19.  
  20. find_metaclass = app.interphook('find_metaclass')

The find_metaclass interpreter-level hook is invokedwith five arguments from the BUILD_CLASS opcode implementationin pypy/interpreter/pyopcode.py:

  1. def BUILD_CLASS(f):
  2. w_methodsdict = f.valuestack.pop()
  3. w_bases = f.valuestack.pop()
  4. w_name = f.valuestack.pop()
  5. w_metaclass = find_metaclass(f.space, w_bases,
  6. w_methodsdict, f.w_globals,
  7. f.space.wrap(f.builtin))
  8. w_newclass = f.space.call_function(w_metaclass, w_name,
  9. w_bases, w_methodsdict)
  10. f.valuestack.push(w_newclass)

Note that at a later point we can rewrite the find_metaclassimplementation at interpreter-level and we would not haveto modify the calling side at all.

Introspection and Descriptors

Python traditionally has a very far-reaching introspection modelfor bytecode interpreter related objects. In PyPy and in CPython readand write accesses to such objects are routed to descriptors.Of course, in CPython those are implemented in C while inPyPy they are implemented in interpreter-level Python code.

All instances of a Function, Code, Frame or Module classesare also W_Root instances which means they can be representedat application level. These days, a PyPy object space needs towork with a basic descriptor lookup when it encountersaccesses to an interpreter-level object: an object space asksa wrapped object for its type via a getclass method and thencalls the type’s lookup(name) function in order to receive a descriptorfunction. Most of PyPy’s internal object descriptors are defined at theend of pypy/interpreter/typedef.py. You can use these definitionsas a reference for the exact attributes of interpreter classes visibleat application level.