Differences between PyPy and CPython

Differences between PyPy and CPython

This page documents the few differences and incompatibilities betweenthe PyPy Python interpreter and CPython. Some of these differencesare “by design”, since we think that there are cases in which thebehaviour of CPython is buggy, and we do not want to copy bugs.

Differences that are not listed here should be considered bugs ofPyPy.

Differences related to garbage collection strategies

The garbage collectors used or implemented by PyPy are not based onreference counting, so the objects are not freed instantly when they are nolonger reachable. The most obvious effect of this is that files (and sockets, etc) are notpromptly closed when they go out of scope. For files that are opened forwriting, data can be left sitting in their output buffers for a while, makingthe on-disk file appear empty or truncated. Moreover, you might reach yourOS’s limit on the number of concurrently opened files.

If you are debugging a case where a file in your program is not closedproperly, you can use the -X track-resources command line option. If it isgiven, a ResourceWarning is produced for every file and socket that thegarbage collector closes. The warning will contain the stack trace of theposition where the file or socket was created, to make it easier to see whichparts of the program don’t close files explicitly.

Fixing this difference to CPython is essentially impossible without forcing areference-counting approach to garbage collection. The effect that youget in CPython has clearly been described as a side-effect of theimplementation and not a language design decision: programs relying onthis are basically bogus. It would be a too strong restriction to try to enforceCPython’s behavior in a language spec, given that it has no chance to beadopted by Jython or IronPython (or any other port of Python to Java or.NET).

Even the naive idea of forcing a full GC when we’re getting dangerouslyclose to the OS’s limit can be very bad in some cases. If your programleaks open files heavily, then it would work, but force a complete GCcycle every n’th leaked file. The value of n is a constant, but theprogram can take an arbitrary amount of memory, which makes a completeGC cycle arbitrarily long. The end result is that PyPy would spend anarbitrarily large fraction of its run time in the GC — slowing downthe actual execution, not by 10% nor 100% nor 1000% but by essentiallyany factor.

To the best of our knowledge this problem has no better solution thanfixing the programs. If it occurs in 3rd-party code, this means goingto the authors and explaining the problem to them: they need to closetheir open files in order to run on any non-CPython-based implementationof Python.

Here are some more technical details. This issue affects the precisetime at which del methods are called, whichis not reliable or timely in PyPy (nor Jython nor IronPython). It also means thatweak references may stay alive for a bit longer than expected. Thismakes “weak proxies” (as returned by weakref.proxy()) somewhat lessuseful: they will appear to stay alive for a bit longer in PyPy, andsuddenly they will really be dead, raising a ReferenceError on thenext access. Any code that uses weak proxies must carefully catch suchReferenceError at any place that uses them. (Or, better yet, don’t useweakref.proxy() at all; use weakref.ref().)

Note a detail in the documentation for weakref callbacks:

If callback is provided and not None, and the returned weakref object is still alive, the callback will be called when the object is about to be finalized.

There are cases where, due to CPython’s refcount semantics, a weakrefdies immediately before or after the objects it points to (typicallywith some circular reference). If it happens to die just after, thenthe callback will be invoked. In a similar case in PyPy, both theobject and the weakref will be considered as dead at the same time,and the callback will not be invoked. (Issue #2030)

There are a few extra implications from the difference in the GC. Mostnotably, if an object has a del, the del is never called morethan once in PyPy; but CPython will call the same del several timesif the object is resurrected and dies again (at least it is reliably so inolder CPythons; newer CPythons try to call destructors not more than once,but there are counter-examples). The del methods arecalled in “the right” order if they are on objects pointing to eachother, as in CPython, but unlike CPython, if there is a dead cycle ofobjects referencing each other, their del methods are called anyway;CPython would instead put them into the list garbage of the gcmodule. More information is available on the blog [1] [2].

Note that this difference might show up indirectly in some cases. Forexample, a generator left pending in the middle is — again —garbage-collected later in PyPy than in CPython. You can see thedifference if the yield keyword it is suspended at is itselfenclosed in a try: or a with: block. This shows up for exampleas issue 736.

Using the default GC (called minimark), the built-in function id()works like it does in CPython. With other GCs it returns numbers thatare not real addresses (because an object can move around several times)and calling it a lot can lead to performance problem.

Note that if you have a long chain of objects, each with a reference tothe next one, and each with a del, PyPy’s GC will perform badly. Onthe bright side, in most other cases, benchmarks have shown that PyPy’sGCs perform much better than CPython’s.

Another difference is that if you add a del to an existing class it willnot be called:

>>>> class A(object):
....     pass
....
>>>> A.__del__ = lambda self: None
__main__:1: RuntimeWarning: a __del__ method added to an existing type will not be called

Even more obscure: the same is true, for old-style classes, if you attachthe del to an instance (even in CPython this does not work withnew-style classes). You get a RuntimeWarning in PyPy. To fix these casesjust make sure there is a del method in the class to start with(even containing only pass; replacing or overriding it later works fine).

Last note: CPython tries to do a gc.collect() automatically when theprogram finishes; not PyPy. (It is possible in both CPython and PyPy todesign a case where several gc.collect() are needed before all objectsdie. This makes CPython’s approach only work “most of the time” anyway.)

Subclasses of built-in types

Officially, CPython has no rule at all for when exactlyoverridden method of subclasses of built-in types getimplicitly called or not. As an approximation, these methodsare never called by other built-in methods of the same object.For example, an overridden getitem() in a subclass ofdict will not be called by e.g. the built-in get()method.

The above is true both in CPython and in PyPy. Differencescan occur about whether a built-in function or method willcall an overridden method of another object than self.In PyPy, they are often called in cases where CPython would not.Two examples:

class D(dict):
    def __getitem__(self, key):
        return "%r from D" % (key,)
 
class A(object):
    pass
 
a = A()
a.__dict__ = D()
a.foo = "a's own foo"
print a.foo
# CPython => a's own foo
# PyPy => 'foo' from D
 
glob = D(foo="base item")
loc = {}
exec "print foo" in glob, loc
# CPython => base item
# PyPy => 'foo' from D

Mutating classes of objects which are already used as dictionary keys

Consider the following snippet of code:

class X(object):
    pass
 
def __evil_eq__(self, other):
    print 'hello world'
    return False
 
def evil(y):
    d = {X(): 1}
    X.__eq__ = __evil_eq__
    d[y] # might trigger a call to __eq__?

In CPython, evil_eq might be called, although there is no way to writea test which reliably calls it. It happens if y is not x and hash(y) == hash(x), where hash(x) is computed when x is inserted into thedictionary. If by chance the condition is satisfied, then evil_eqis called.

PyPy uses a special strategy to optimize dictionaries whose keys are instancesof user-defined classes which do not override the default hash,eq and cmp: when using this strategy, eq andcmp are never called, but instead the lookup is done by identity, soin the case above it is guaranteed that eq won’t be called.

Note that in all other cases (e.g., if you have a custom hash andeq in y) the behavior is exactly the same as CPython.

Ignored exceptions

In many corner cases, CPython can silently swallow exceptions.The precise list of when this occurs is rather long, eventhough most cases are very uncommon. The most well-knownplaces are custom rich comparison methods (like eq);dictionary lookup; calls to some built-in functions likeisinstance().

Unless this behavior is clearly present by design anddocumented as such (as e.g. for hasattr()), in most cases PyPylets the exception propagate instead.

Object Identity of Primitive Values, is and id

Object identity of primitive values works by value equality, not by identity ofthe wrapper. This means that x + 1 is x + 1 is always true, for arbitraryintegers x. The rule applies for the following types:

int

float

long

complex

str (empty or single-character strings only)

unicode (empty or single-character strings only)

tuple (empty tuples only)

frozenset (empty frozenset only)

unbound method objects (for Python 2 only)

This change requires some changes to id as well. id fulfills thefollowing condition: x is y <=> id(x) == id(y). Therefore id of theabove types will return a value that is computed from the argument, and canthus be larger than sys.maxint (i.e. it can be an arbitrary long).

Note that strings of length 2 or greater can be equal without beingidentical. Similarly, x is (2,) is not necessarily true even ifx contains a tuple and x == (2,). The uniqueness rules applyonly to the particular cases described above. The str, unicode,tuple and frozenset rules were added in PyPy 5.4; before that, atest like if x is "?" or if x is () could fail even if x wasequal to "?" or (). The new behavior added in PyPy 5.4 iscloser to CPython’s, which caches precisely the empty tuple/frozenset,and (generally but not always) the strings and unicodes of length <= 1.

Note that for floats there “is” only one object per “bit pattern”of the float. So float('nan') is float('nan') is true on PyPy,but not on CPython because they are two objects; but 0.0 is -0.0is always False, as the bit patterns are different. As usual,float('nan') == float('nan') is always False. When used incontainers (as list items or in sets for example), the exact rule ofequality used is “if x is y or x == y” (on both CPython and PyPy);as a consequence, because all nans are identical in PyPy, youcannot have several of them in a set, unlike in CPython. (Issue #1974).Another consequence is that cmp(float('nan'), float('nan')) == 0, becausecmp checks with is first whether the arguments are identical (there isno good value to return from this call to cmp, because cmp pretendsthat there is a total order on floats, but that is wrong for NaNs).

C-API Differences

The external C-API has been reimplemented in PyPy as an internal cpyext module.We support most of the documented C-API, but sometimes internal C-abstractionsleak out on CPython and are abused, perhaps even unknowingly. For instance,assignment to a PyTupleObject is not supported after the tuple isused internally, even by another C-API function call. On CPython this willsucceed as long as the refcount is 1. On PyPy this will always raise aSystemError('PyTuple_SetItem called on tuple after use of tuple")exception (explicitly listed here for search engines).

Another similar problem is assignment of a new function pointer to any of thetpas* structures after calling PyTypeReady. For instance, overridingtpasnumber.nbint with a different function after calling PyTypeReadyon CPython will result in the old function being called for x._int()(via class __dict lookup) and the new function being called for int(x)(via slot lookup). On PyPy we will always call the __new function, not theold, this quirky behaviour is unfortunately necessary to fully support NumPy.

Performance Differences

CPython has an optimization that can make repeated string concatenation notquadratic. For example, this kind of code runs in O(n) time:

s = ''
for string in mylist:
    s += string

In PyPy, this code will always have quadratic complexity. Note also, that theCPython optimization is brittle and can break by having slight variations inyour code anyway. So you should anyway replace the code with:

parts = []
for string in mylist:
    parts.append(string)
s = "".join(parts)

Miscellaneous

Hash randomization (-R) is ignored in PyPy. In CPythonbefore 3.4 it has little point. Both CPython >= 3.4 and PyPy3implement the randomized SipHash algorithm and ignore -R.
You can’t store non-string keys in type objects. For example:

class A(object):
    locals()[42] = 3

won’t work.

sys.setrecursionlimit(n) sets the limit only approximately,by setting the usable stack space to n * 768 bytes. On Linux,depending on the compiler settings, the default of 768KB is enoughfor about 1400 calls.
since the implementation of dictionary is different, the exact numberof times that hash and eq are called is different.Since CPythondoes not give any specific guarantees either, don’t rely on it.
assignment to class is limited to the cases where itworks on CPython 2.5. On CPython 2.6 and 2.7 it works in a bitmore cases, which are not supported by PyPy so far. (If needed,it could be supported, but then it will likely work in manymore case on PyPy than on CPython 2.6/2.7.)
the builtins name is always referencing the builtin module,never a dictionary as it sometimes is in CPython. Assigning tobuiltins has no effect. (For usages of tools likeRestrictedPython, see issue #2653.)
directly calling the internal magic methods of a few built-in typeswith invalid arguments may have a slightly different result. Forexample, [].add(None) and (2).add(None) both returnNotImplemented on PyPy; on CPython, only the latter does, and theformer raises TypeError. (Of course, []+None and 2+Noneboth raise TypeError everywhere.) This difference is animplementation detail that shows up because of internal C-level slotsthat PyPy does not have.
on CPython, [].add is a method-wrapper, andlist.add is a slot wrapper. On PyPy these are normalbound or unbound method objects. This can occasionally confuse sometools that inspect built-in types. For example, the standardlibrary inspect module has a function ismethod() that returnsTrue on unbound method objects but False on method-wrappers or slotwrappers. On PyPy we can’t tell the difference, soismethod([].add) == ismethod(list.add) == True.
in CPython, the built-in types have attributes that can beimplemented in various ways. Depending on the way, if you try towrite to (or delete) a read-only (or undeletable) attribute, you geteither a TypeError or an AttributeError. PyPy tries tostrike some middle ground between full consistency and fullcompatibility here. This means that a few corner cases don’t raisethe same exception, like del (lambda:None).closure.
in pure Python, if you write class A(object): def f(self): passand have a subclass B which doesn’t override f(), thenB.f(x) still checks that x is an instance of B. InCPython, types written in C use a different rule. If A iswritten in C, any instance of A will be accepted by B.f(x)(and actually, B.f is A.f in this case). Some code that couldwork on CPython but not on PyPy includes:datetime.datetime.strftime(datetime.date.today(), …) (here,datetime.date is the superclass of datetime.datetime).Anyway, the proper fix is arguably to use a regular method call inthe first place: datetime.date.today().strftime(…)
some functions and attributes of the gc module behave in aslightly different way: for example, gc.enable andgc.disable are supported, but “enabling and disabling the GC” hasa different meaning in PyPy than in CPython. These functionsactually enable and disable the major collections and theexecution of finalizers.
PyPy prints a random line from past #pypy IRC topics at startup ininteractive mode. In a released version, this behaviour is suppressed, butsetting the environment variable PYPY_IRC_TOPIC will bring it back. Note thatdownstream package providers have been known to totally disable this feature.
PyPy’s readline module was rewritten from scratch: it is not GNU’sreadline. It should be mostly compatible, and it adds multilinesupport (see multiline_input()). On the other hand,parse_and_bind() calls are ignored (issue #2072).
sys.getsizeof() always raises TypeError. This is because amemory profiler using this function is most likely to give resultsinconsistent with reality on PyPy. It would be possible to havesys.getsizeof() return a number (with enough work), but that mayor may not represent how much memory the object uses. It doesn’t evenmake really sense to ask how much one object uses, in isolation withthe rest of the system. For example, instances have maps, which areoften shared across many instances; in this case the maps wouldprobably be ignored by an implementation of sys.getsizeof(), buttheir overhead is important in some cases if they are many instanceswith unique maps. Conversely, equal strings may share their internalstring data even if they are different objects—or empty containersmay share parts of their internals as long as they are empty. Evenstranger, some lists create objects as you read them; if you try toestimate the size in memory of range(10**6) as the sum of allitems’ size, that operation will by itself create one million integerobjects that never existed in the first place. Note that some ofthese concerns also exist on CPython, just less so. For this reasonwe explicitly don’t implement sys.getsizeof().
The timeit module behaves differently under PyPy: it prints the averagetime and the standard deviation, instead of the minimum, since the minimum isoften misleading.
The get_config_vars method of sysconfig and distutils.sysconfigare not complete. On POSIX platforms, CPython fishes configuration variablesfrom the Makefile used to build the interpreter. PyPy should bake the valuesin during compilation, but does not do that yet.
"%d" % x and "%x" % x and similar constructs, where x isan instance of a subclass of long that overrides the specialmethods str or hex or oct: PyPy doesn’t callthe special methods; CPython does—but only if it is a subclass oflong, not int. CPython’s behavior is really messy: e.g. for%x it calls hex(), which is supposed to return a stringlike -0x123L; then the 0x and the final L are removed, andthe rest is kept. If you return an unexpected string fromhex() you get an exception (or a crash before CPython 2.7.13).
In PyPy, dictionaries passed as kwargs can contain only string keys,even for dict() and dict.update(). CPython 2.7 allows non-stringkeys in these two cases (and only there, as far as we know). E.g. thiscode produces a TypeError, on CPython 3.x as well as on any PyPy:dict({1: 2}). (Note that dict(**d1) is equivalent todict(d1).)
PyPy3: class attribute assignment between heaptypes and non heaptypes.CPython allows that for module subtypes, but not for e.g. intor float subtypes. Currently PyPy does not support theclass attribute assignment for any non heaptype subtype.
In PyPy, module and class dictionaries are optimized under the assumptionthat deleting attributes from them are rare. Because of this, e.g.del foo.bar where foo is a module (or class) that contains thefunction bar, is significantly slower than CPython.
Various built-in functions in CPython accept only positional argumentsand not keyword arguments. That can be considered a long-runninghistorical detail: newer functions tend to accept keyword argumentsand older function are occasionally fixed to do so as well. In PyPy,most built-in functions accept keyword arguments (help() shows theargument names). But don’t rely on it too much because futureversions of PyPy may have to rename the arguments if CPython startsaccepting them too.
PyPy3: distutils has been enhanced to allow finding VsDevCmd.bat in thedirectory pointed to by the VS%0.f0COMNTOOLS (typically VS140COMNTOOLS)environment variable. CPython searches for vcvarsall.bat somewhere abovethat value.
SyntaxError s try harder to give details about the cause of the failure, sothe error messages are not the same as in CPython
Dictionaries and sets are ordered on PyPy. On CPython < 3.6 they are not;on CPython >= 3.6 dictionaries (but not sets) are ordered.
PyPy2 refuses to load lone .pyc files, i.e. .pyc files that arestill there after you deleted the .py file. PyPy3 instead behaves likeCPython. We could be amenable to fix this difference in PyPy2: the currentversion reflects our annoyance with this detail of CPython, which bitus too often while developing PyPy. (It is as easy as passing the—lonepycfile flag when translating PyPy, if you really need it.)

Extension modules

List of extension modules that we support:

Supported as built-in modules (in pypy/module/):

builtin pypy _ast _codecs _collections _continuation _ffi _hashlib _io _locale _lsprof _md5 _minimal_curses _multiprocessing _random _rawffi _sha _socket _sre _ssl _warnings _weakref _winreg array binascii bz2 cStringIO cmath cpyext crypt errno exceptions fcntl gc imp itertools marshal math mmap operator parser posix pyexpat select signal struct symbol sys termios thread time token unicodedata zipimport zlib

When translated on Windows, a few Unix-only modules are skipped,and the following module is built instead:

_winreg

Supported by being rewritten in pure Python (possibly using cffi):see the lib_pypy/ directory. Examples of modules that wesupport this way: ctypes, cPickle, cmath, dbm, datetime…Note that some modules are both in there and in the list above;by default, the built-in module is used (but can be disabledat translation time).

The extension modules (i.e. modules written in C, in the standard CPython)that are neither mentioned above nor in lib_pypy/ are not available in PyPy.(You may have a chance to use them anyway with cpyext.)