Frequently Asked Questions

Frequently Asked Questions

What is PyPy?

PyPy is a reimplementation of Python in Python, using the RPython translationtoolchain.

PyPy tries to find new answers about ease of creation, flexibility,maintainability and speed trade-offs for language implementations.For further details see our goal and architecture document.

Is PyPy a drop in replacement for CPython?

Almost!

The most likely stumbling block for any given project is support forextension modules. PyPy supports a continually growingnumber of extension modules, but so far mostly only those found in thestandard library.

The language features (including builtin types and functions) are veryrefined and well tested, so if your project doesn’t use manyextension modules there is a good chance that it will work with PyPy.

We list the known differences in cpython differences.

Module xyz does not work with PyPy: ImportError

A module installed for CPython is not automatically available for PyPy— just like a module installed for CPython 2.6 is not automaticallyavailable for CPython 2.7 if you installed both. In other words, youneed to install the module xyz specifically for PyPy.

On Linux, this means that you cannot use apt-get or some similarpackage manager: these tools are only meant for the version of CPythonprovided by the same package manager. So forget about them for nowand read on.

It is quite common nowadays that xyz is available on PyPI andinstallable with <pypy> -mpip install xyz. The simplest solution is touse virtualenv (as documented here). Then enter (activate) the virtualenvand type: pypy -mpip install xyz. If you don’t know or don’t wantvirtualenv, you can also use pip locally after pypy -m ensurepip.The ensurepip module is built-in to the PyPy downloads we provide.Best practices with pip is to always call it as <python> -mpip …,but if you wish to be able to call pip directly from the command line, youmust call pypy -mensurepip —default-pip.

If you get errors from the C compiler, the module is a CPython CExtension module using unsupported features. See below.

Alternatively, if either the module xyz is not available on PyPI or youdon’t want to use virtualenv, then download the source code of xyz,decompress the zip/tarball, and run the standard command: pypy setup.py install. (Note: pypy here instead of python.) As usualyou may need to run the command with sudo for a global installation.The other commands of setup.py are available too, like build.

Module xyz does not work in the sandboxed PyPy?

You cannot import any extension module in a sandboxed PyPy,sorry. Even the built-in modules available are very limited.Sandboxing in PyPy is a good proof of concept, and is without a doubtsafe IMHO, however it is only a proof of concept. It currently requiressome work from a motivated developer. However, until then it can only be used for “pure Python”example: programs that import mostly nothing (or only pure Pythonmodules, recursively).

Do CPython Extension modules work with PyPy?

First note that some Linux distributions (e.g. Ubuntu, Debian) splitPyPy into several packages. If you installed a package called “pypy”,then you may also need to install “pypy-dev” for the following to work.

We have experimental support for CPython extension modules, sothey run with minor changes. This has been a part of PyPy sincethe 1.4 release, but support is still in beta phase. CPythonextension modules in PyPy are often much slower than in CPython due tothe need to emulate refcounting. It is often faster to take out yourCPython extension and replace it with a pure python version that theJIT can see. If trying to install module xyz, and the module has botha C and a Python version of the same code, try first to disable the Cversion; this is usually easily done by changing some line in setup.py.

We fully support ctypes-based extensions. But for best performance, werecommend that you use the cffi module to interface with C code.

For information on which third party extensions work (or do not work)with PyPy see the compatibility wiki.

For more information about how we manage refcounting semamtics seerawrefcount

On which platforms does PyPy run?

PyPy currently supports:

x86 machines on most common operating systems (Linux 32/64 bits, Mac OS X 64 bits, Windows 32 bits, OpenBSD, FreeBSD),

newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux,

big- and little-endian variants of PPC64 running Linux,

s390x running Linux

PyPy is regularly and extensively tested on Linux machines. Itworks on Mac and Windows: it is tested there, but most of us are runningLinux so fixes may depend on 3rd-party contributions.

To bootstrap from sources, PyPy can use either CPython 2.7 oranother (e.g. older) PyPy. Cross-translation is not really supported:e.g. to build a 32-bit PyPy, you need to have a 32-bit environment.Cross-translation is only explicitly supported between a 32-bit IntelLinux and ARM Linux (see here).

Which Python version (2.x?) does PyPy implement?

PyPy currently aims to be fully compatible with Python 2.7. That means thatit contains the standard library of Python 2.7 and that it supports 2.7features (such as set comprehensions).

Does PyPy have a GIL? Why?

Yes, PyPy has a GIL. Removing the GIL is very hard. On top of CPython,you have two problems: (1) GC, in this case reference counting; (2) thewhole Python language.

For PyPy, the hard issue is (2): by that I mean issues like what occursif a mutable object is changed from one thread and read from anotherconcurrently. This is a problem for any mutable type: it needscareful review and fixes (fine-grained locks, mostly) through thewhole Python interpreter. It is a major effort, although notcompletely impossible, as Jython/IronPython showed. This includessubtle decisions about whether some effects are ok or not for the user(i.e. the Python programmer).

CPython has additionally the problem (1) of reference counting. WithPyPy, this sub-problem is simpler: we need to make our GCmultithread-aware. This is easier to do efficiently in PyPy than inCPython. It doesn’t solve the issue (2), though.

Note that since 2012 there is work going on on a still very experimentalSoftware Transactional Memory (STM) version of PyPy. Thisshould give an alternative PyPy which works without a GIL, while at thesame time continuing to give the Python programmer the complete illusionof having one. This work is currently a bit stalled because of its owntechnical difficulties.

What about numpy, numpypy, micronumpy?

Way back in 2011, the PyPy team started to reimplement numpy in PyPy. Ithas two pieces:

the builtin module pypy/module/micronumpy: this is written in RPython and roughly covers the content of the numpy.core.multiarray module. Confusingly enough, this is available in PyPy under the name _numpypy. It is included by default in all the official releases of PyPy (but it might be dropped in the future).

a fork of the official numpy repository maintained by us and informally called numpypy: even more confusing, the name of the repo on bitbucket is numpy. The main difference with the upstream numpy, is that it is based on the micronumpy module written in RPython, instead of of numpy.core.multiarray which is written in C.

Should I install numpy or numpypy?

TL;DR version: you should use numpy. You can install it by doing pypy -m pip install numpy. You might also be interested in using the experimental PyPybinary wheels to save compilation time.

The upstream numpy is written in C, and runs under the cpyextcompatibility layer. Nowadays, cpyext is mature enough that you can simplyuse the upstream numpy, since it passes 99.9% of the test suite. At themoment of writing (October 2017) the main drawback of numpy is that cpyextis infamously slow, and thus it has worse performance compared tonumpypy. However, we are actively working on improving it, as we expect toreach the same speed, eventually.

On the other hand, numpypy is more JIT-friendly and very fast to call,since it is written in RPython: but it is a reimplementation, and it’s hard tobe completely compatible: over the years the project slowly matured andeventually it was able to call out to the LAPACK and BLAS libraries to speedmatrix calculations, and reached around an 80% parity with the upstreamnumpy. However, 80% is far from 100%. Since cpyext/numpy compatibility isprogressing fast, we have discontinued support for numpypy.

Is PyPy more clever than CPython about Tail Calls?

No. PyPy follows the Python language design, including the built-indebugger features. This prevents tail calls, as summarized by Guidovan Rossum in two blog posts. Moreover, neither the JIT norStackless change anything to that.

How do I write extension modules for PyPy?

See Writing extension modules for pypy.

How fast is PyPy?

This really depends on your code.For pure Python algorithmic code, it is very fast. For more typicalPython programs we generally are 3 times the speed of CPython 2.7.You might be interested in our benchmarking site and ourjit documentation.

Your tests are not a benchmark: tests tend to be slow under PyPybecause they run exactly once; if they are good tests, they exercisevarious corner cases in your code. This is a bad case for JITcompilers. Note also that our JIT has a very high warm-up cost, meaningthat any program is slow at the beginning. If you want to compare thetimings with CPython, even relatively simple programs need to run atleast one second, preferrably at least a few seconds. Large,complicated programs need even more time to warm-up the JIT.

Couldn’t the JIT dump and reload already-compiled machine code?

No, we found no way of doing that. The JIT generates machine codecontaining a large number of constant addresses — constant at the timethe machine code is generated. The vast majority is probably not at allconstants that you find in the executable, with a nice link name. E.g.the addresses of Python classes are used all the time, but Pythonclasses don’t come statically from the executable; they are created anewevery time you restart your program. This makes saving and reloadingmachine code completely impossible without some very advanced way ofmapping addresses in the old (now-dead) process to addresses in the newprocess, including checking that all the previous assumptions about the(now-dead) object are still true about the new object.

Would type annotations help PyPy’s performance?

Two examples of type annotations that are being proposed for improvedperformance are Cython types and PEP 484 - Type Hints.

Cython types are, by construction, similar to C declarations. Forexample, a local variable or an instance attribute can be declared"cdef int" to force a machine word to be used. This changes theusual Python semantics (e.g. no overflow checks, and errors whentrying to write other types of objects there). It gives some extraperformance, but the exact benefits are unclear: right now(January 2015) for example we are investigating a technique that wouldstore machine-word integers directly on instances, giving part of thebenefits without the user-supplied "cdef int".

PEP 484 - Type Hints, on the other hand, is almost entirelyuseless if you’re looking at performance. First, as the name implies,they are hints: they must still be checked at runtime, like PEP 484says. Or maybe you’re fine with a mode in which you get very obscurecrashes when the type annotations are wrong; but even in that case thespeed benefits would be extremely minor.

There are several reasons for why. One of them is that annotationsare at the wrong level (e.g. a PEP 484 “int” corresponds to Python 3’sint type, which does not necessarily fits inside one machine word;even worse, an “int” annotation allows arbitrary int subclasses).Another is that a lot more information is needed to produce good code(e.g. “this f() called here really means this function there, andwill never be monkey-patched” – same with len() or list(),btw). The third reason is that some “guards” in PyPy’s JIT tracesdon’t really have an obvious corresponding type (e.g. “this dict is sofar using keys which don’t override hash so a more efficientimplementation was used”). Many guards don’t even have any correspondencewith types at all (“this class attribute was not modified”; “the loopcounter did not reach zero so we don’t need to release the GIL”; andso on).

As PyPy works right now, it is able to derive far more usefulinformation than can ever be given by PEP 484, and it worksautomatically. As far as we know, this is true even if we would addother techniques to PyPy, like a fast first-pass JIT.

Can I use PyPy’s translation toolchain for other languages besides Python?

Yes. The toolsuite that translates the PyPy interpreter is quitegeneral and can be used to create optimized versions of interpretersfor any language, not just Python. Of course, these interpreterscan make use of the same features that PyPy brings to Python:translation to various languages, stackless features,garbage collection, implementation of various things like arbitrarily longintegers, etc.

Currently, we have Topaz, a Ruby interpreter; Hippy, a PHPinterpreter; preliminary versions of a JavaScript interpreter(Leonardo Santagada as his Summer of PyPy project); a Prolog interpreter(Carl Friedrich Bolz as his Bachelor thesis); and a SmallTalk interpreter(produced during a sprint). On the PyPy bitbucket page there is also aScheme and an Io implementation; both of these are unfinished at the moment.

How do I get into PyPy development? Can I come to sprints?

Certainly you can come to sprints! We always welcome newcomers and tryto help them as much as possible to get started with the project. Weprovide tutorials and pair them with experienced PyPydevelopers. Newcomers should have some Python experience and read someof the PyPy documentation before coming to a sprint.

Coming to a sprint is usually the best way to get into PyPy development.If you get stuck or need advice, contact us. IRC isthe most immediate way to get feedback (at least during some parts of the day;most PyPy developers are in Europe) and the mailing list is better for longdiscussions.

OSError: … cannot restore segment prot after reloc… Help?

On Linux, if SELinux is enabled, you may get errors along the lines of“OSError: externmod.so: cannot restore segment prot after reloc: Permissiondenied.” This is caused by a slight abuse of the C compiler duringconfiguration, and can be disabled by running the following command with rootprivileges:

# setenforce 0

This will disable SELinux’s protection and allow PyPy to configure correctly.Be sure to enable it again if you need it!

How should I report a bug?

Our bug tracker is here: https://foss.heptapod.net/pypy/pypy/issues/

Missing features or incompatibilities with CPython are consideredbugs, and they are welcome. (See also our list of knownincompatibilities.)

For bugs of the kind “I’m getting a PyPy crash or a strangeexception”, please note that: We can’t do anything withoutreproducing the bug ourselves. We cannot do anything withtracebacks from gdb, or core dumps. This is not only because thestandard PyPy is compiled without debug symbols. The real reason isthat a C-level traceback is usually of no help at all in PyPy.Debugging PyPy can be annoying.

This is a clear and useful bug report. (Admittedly, sometimesthe problem is really hard to reproduce, but please try to.)

In more details:

First, please give the exact PyPy version, and the OS.
It might help focus our search if we know if the bug can bereproduced on a “pypy —jit off” or not. If “pypy —jit off” always works, then the problem might be in the JIT.Otherwise, we know we can ignore that part.
If you got the bug using only Open Source components, please give astep-by-step guide that we can follow to reproduce the problemourselves. Don’t assume we know anything about any program otherthan PyPy. We would like a guide that we can follow point by point(without guessing or having to figure things out)on a machine similar to yours, starting from a bare PyPy, until wesee the same problem. (If you can, you can try to reduce the numberof steps and the time it needs to run, but that is not mandatory.)
If the bug involves Closed Source components, or just too many OpenSource components to install them all ourselves, then maybe you cangive us some temporary ssh access to a machine where the bug can bereproduced. Or, maybe we can download a VirtualBox or VMWarevirtual machine where the problem occurs.
If giving us access would require us to use tools other than ssh,make appointments, or sign a NDA, then we can consider a commericalsupport contract for a small sum of money.
If even that is not possible for you, then sorry, we can’t help.

Of course, you can try to debug the problem yourself, and we can helpyou get started if you ask on the #pypy IRC channel, but be prepared:debugging an annoying PyPy problem usually involves quite a lot of gdbin auto-generated C code, and at least some knowledge about thevarious components involved, from PyPy’s own RPython source code tothe GC and possibly the JIT.

Why doesn’t PyPy use Git and move to GitHub?

We discussed it during the switch away from bitbucket. We concluded that (1)the Git workflow is not as well suited as the Mercurial workflow for our style,and (2) moving to github “just because everybody else does” is a argument onthin grounds.

For (1), there are a few issues, but maybe the most important one is that thePyPy repository has got thousands of named branches. Git has no equivalentconcept. Git has branches, of course, which in Mercurial are calledbookmarks. We’re not talking about bookmarks.

The difference between git branches and named branches is not that important ina repo with 10 branches (no matter how big). But in the case of PyPy, we haveat the moment 1840 branches. Most are closed by now, of course. But we wouldreally like to retain (both now and in the future) the ability to look at acommit from the past, and know in which branch it was made. Please make sureyou understand the difference between the Git and the Mercurial branches torealize that this is not always possible with Git— we looked hard, and thereis no built-in way to get this workflow.

Still not convinced? Consider this git repo with three commits: commit #2 withparent #1 and head of git branch “A”; commit #3 with also parent #1 but head ofgit branch “B”. When commit #1 was made, was it in the branch “A” or “B”?(It could also be yet another branch whose head was also moved forward, or evencompletely deleted.)

What is needed for Windows 64 support of PyPy?

First, please note that the Windows 32 PyPy binary works just fine on Windows64. The only problem is that it only supports up to 4GB of heap per process.

As to real Windows 64 support: Currently we don’t have an active PyPy developerwhose main development platform is Windows. So if you are interested in gettingWindows 64 support, we encourage you to volunteer to make it happen! Anotheroption would be to pay some PyPy developers to implement Windows 64 support,but so far there doesn’t seem to be an overwhelming commercial interest in it.

How long will PyPy support Python2?

Since RPython is built on top of Python2 and that is extremely unlikely tochange, the Python2 version of PyPy will be around “forever”, i.e. as long asPyPy itself is around.