Porting Cython code to PyPy

Cython has basic support for cpyext, the layer inPyPy that emulates CPython’s C-API. This isachieved by making the generated C code adapt at C compile time, sothe generated code will compile in both CPython and PyPy unchanged.

However, beyond what Cython can cover and adapt internally, the cpyextC-API emulation involves some differences to the real C-API in CPythonthat have a visible impact on user code. This page lists majordifferences and ways to deal with them in order to write Cython codethat works in both CPython and PyPy.

Reference counts

A general design difference in PyPy is that the runtime does not usereference counting internally but always a garbage collector. Referencecounting is only emulated at the cpyext layer by counting referencesbeing held in C space. This implies that the reference count in PyPyis generally different from that in CPython because it does not countany references held in Python space.

Object lifetime

As a direct consequence of the different garbage collection characteristics,objects may see the end of their lifetime at other points than inCPython. Special care therefore has to be taken when objects are expectedto have died in CPython but may not in PyPy. Specifically, a deallocatormethod of an extension type (dealloc()) may get called at a muchlater point than in CPython, triggered rather by memory getting tighterthan by objects dying.

If the point in the code is known when an object is supposed to die (e.g.when it is tied to another object or to the execution time of a function),it is worth considering if it can be invalidated and cleaned up manually atthat point, rather than relying on a deallocator.

As a side effect, this can sometimes even lead to a better code design,e.g. when context managers can be used together with the with statement.

Borrowed references and data pointers

The memory management in PyPy is allowed to move objects around in memory.The C-API layer is only an indirect view on PyPy objects and often replicatesdata or state into C space that is then tied to the lifetime of a C-APIobject rather then the underlying PyPy object. It is important to understandthat these two objects are separate things in cpyext.

The effect can be that when data pointers or borrowed references are used,and the owning object is no longer directly referenced from C space, thereference or data pointer may become invalid at some point, even if theobject itself is still alive. As opposed to CPython, it is not enough tokeep the reference to the object alive in a list (or other Python container),because the contents of those is only managed in Python space and thus onlyreferences the PyPy object. A reference in a Python container will not keepthe C-API view on it alive. Entries in a Python class dict will obviouslynot work either.

One of the more visible places where this may happen is when accessing thechar* buffer of a byte string. In PyPy, this will only work aslong as the Cython code holds a direct reference to the byte string objectitself.

Another point is when CPython C-API functions are used directly that returnborrowed references, e.g. PyTuple_GET_ITEM() and similar functions,but also some functions that return borrowed references to built-in modules orlow-level objects of the runtime environment. The GIL in PyPy only guaranteesthat the borrowed reference stays valid up to the next call into PyPy (orits C-API), but not necessarily longer.

When accessing the internals of Python objects or using borrowed referenceslonger than up to the next call into PyPy, including reference counting oranything that frees the GIL, it is therefore required to additionally keepdirect owned references to these objects alive in C space, e.g. in localvariables in a function or in the attributes of an extension type.

When in doubt, avoid using C-API functions that return borrowed references,or surround the usage of a borrowed reference explicitly by a pair of callsto Py_INCREF() when getting the reference and Py_DECREF()when done with it to convert it into an owned reference.

Builtin types, slots and fields

The following builtin types are not currently available in cpyext inform of their C level representation: PyComplexObject,PyFloatObject and PyBoolObject.

Many of the type slot functions of builtin types are not initialisedin cpyext and can therefore not be used directly.

Similarly, almost none of the (implementation) specific struct fields ofbuiltin types is exposed at the C level, such as the ob_digit fieldof PyLongObject or the allocated field of thePyListObject struct etc. Although the ob_size field ofcontainers (used by the Py_SIZE() macro) is available, it isnot guaranteed to be accurate.

It is best not to access any of these struct fields and slots and touse the normal Python types instead as well as the normal Pythonprotocols for object operations. Cython will map them to an appropriateusage of the C-API in both CPython and cpyext.

GIL handling

Currently, the GIL handling function PyGILState_Ensure() is notre-entrant in PyPy and deadlocks when called twice. This means thatcode that tries to acquire the GIL “just in case”, because it might becalled with or without the GIL, will not work as expected in PyPy.See PyGILState_Ensure should not deadlock if GIL already held.

Efficiency

Simple functions and especially macros that are used for speed in CPythonmay exhibit substantially different performance characteristics in cpyext.

Functions returning borrowed references were already mentioned as requiringspecial care, but they also induce substantially more runtime overhead becausethey often create weak references in PyPy where they only return a plainpointer in CPython. A visible example is PyTuple_GET_ITEM().

Some more high-level functions may also show entirely different performancecharacteristics, e.g. PyDict_Next() for dict iteration. Whilebeing the fastest way to iterate over a dict in CPython, having linear timecomplexity and a low overhead, it currently has quadratic runtime in PyPybecause it maps to normal dict iteration, which cannot keep track of thecurrent position between two calls and thus needs to restart the iterationon each call.

The general advice applies here even more than in CPython, that it is alwaysbest to rely on Cython generating appropriately adapted C-API handling codefor you than to use the C-API directly - unless you really know what you aredoing. And if you find a better way of doing something in PyPy and cpyextthan Cython currently does, it’s best to fix Cython for everyone’s benefit.

Known problems

  • As of PyPy 1.9, subtyping builtin types can result in infinite recursionon method calls in some rare cases.
  • Docstrings of special methods are not propagated to Python space.
  • The Python 3.x adaptations in pypy3 only slowly start to include theC-API, so more incompatibilities can be expected there.

Bugs and crashes

The cpyext implementation in PyPy is much younger and substantially lessmature than the well tested C-API and its underlying native implementationin CPython. This should be remembered when running into crashes, as theproblem may not always be in your code or in Cython. Also, PyPy and itscpyext implementation are less easy to debug at the C level than CPythonand Cython, simply because they were not designed for it.