Potential Project List

Getting involved

We are happy to discuss ideas around the PyPy ecosystem.If you are interested in playing with RPython or PyPy, or have a new idea notmentioned here please join us on irc, channel #pypy (freenode). If you are unsure,but still think that you can make a valuable contribution to PyPy, donthesitate to contact us on #pypy or on our mailing list. Here are some ideasto get you thinking:

  • Optimize PyPy Memory Usage: Sometimes PyPy consumes more memory than CPython.Two examples: 1) PyPy seems to allocate and keep alive more strings whenimporting a big Python modules. 2) The base interpreter size (cold VM startedfrom a console) of PyPy is bigger than the one of CPython. The generalprocedure of this project is: Run both CPython and PyPy of the same Pythonversion and compare the memory usage (using Massif or other tools).If PyPy consumes a lot more memory then find and resolve the issue.
  • VMProf + memory profiler: vmprof is a statistical memory profiler. Wewant extend it with new features and resolve some current limitations.
  • VMProf visualisations: vmprof shows a flame graph of the statisticalprofile and some more information about specific call sites. It would bevery interesting to experiment with different information (such as memory,or even information generated by our jit compiler).
  • Explicit typing in RPython: PyPy wants to have better ways to specifythe signature and class attribute types in RPython. See more informationabout this topic below on this page.
  • Virtual Reality (VR) visualisations for vmprof: This is a very opentopic with lots of freedom to explore data visualisation for profiles. NoVR hardware would be needed for this project. Either universities providesuch hardware or in any other case we potentially can lend the VR hardwaresetup.

Simple tasks for newcomers

Mid-to-large tasks

Below is a list of projects that are interesting for potential contributorswho are seriously interested in the PyPy project. They mostly share commonpatterns - they’re mid-to-large in size, they’re usually well defined asa standalone projects and they’re not being actively worked on. For smallprojects that you might want to work on look above or either lookat the issue tracker, pop up on #pypy on irc.freenode.net or write to themailing list. This is simply for the reason that small possible projectstend to change very rapidly.

This list is mostly for having an overview on potential projects. This list isby definition not exhaustive and we’re pleased if people come up with theirown improvement ideas. In any case, if you feel like working on some of thoseprojects, or anything else in PyPy, pop up on IRC or write to us on themailing list.

Explicit typing in RPython

RPython is mostly based around type inference, but there are many cases wherespecifying types explicitly is useful. We would like to be able to optionallyspecify the exact types of the arguments to any function. We already havesolutions in that space, @rpython.rlib.objectmodel.enforceargs and@rpython.rlib.signature.signature, but they are inconvenient and limited.For instance, they do not easily allow to express the type “dict with ints askeys and lists of instances of Foo as values”.

Additionally, we would like to be able to specify the types of instanceattributes. Unlike the function case, this is likely to require somerefactoring of the annotator.

Make bytearray type fast

PyPy’s bytearray type is very inefficient. It would be an interestingtask to look into possible optimizations on this. (XXX current statusunknown; ask on #pypy for updates on this.)

Implement copy-on-write list slicing

The idea is to have a special implementation of list objects which is usedwhen doing myslice = mylist[a:b]: the new list is not constructedimmediately, but only when (and if) myslice or mylist are mutated.

NumPy rebooted

Our cpyext C-API compatiblity layer can now run upstream NumPy unmodified.Release PyPy2.7-v6.0 still fails about 10 of the ~6000 test in the NumPytest suite. We need to improve our ctypes structure -> memoryview conversions,and to refactor the way NumPy adds docstrings.

We also are looking for help in how to hijack NumPy dtype conversion andufunc calls to allow the JIT to make them fast, using our internal _numpypymodule.

Improving the jitviewer

Analyzing performance of applications is always tricky. We have varioustools, for example a jitviewer that help us analyze performance.

The old tool was partly rewritten and combined with vmprof. The service ishosted at vmprof.com.

The following shows an old image of the jitviewer.The code generated by the PyPy JIT in a hierarchical way:

  • at the bottom level, it shows the Python source code of the compiled loops
  • for each source code line, it shows the corresponding Python bytecode
  • for each opcode, it shows the corresponding jit operations, which are the ones actually sent to the backend for compiling (such as i15 = i10 < 2000 in the example)

_images/jitviewer.png The jitviewer is a web application based on django and angularjs:if you have great web developing skills and want to help PyPy,this is an ideal task to get started, because it does not require any deepknowledge of the internals. Head over to vmprof-python, vmprof-server andvmprof-integration to find open issues and documentation.

Optimized Unicode Representation

CPython 3.3 will use an optimized unicode representation (see PEP 0393) which switches betweendifferent ways to represent a unicode string, depending on whether the stringfits into ASCII, has only two-byte characters or needs four-byte characters.

The actual details would be rather different in PyPy, but we would like to havethe same optimization implemented.

Or maybe not. We can also play around with the idea of using a singlerepresentation: as a byte string in utf-8. (This idea needs some extra logicfor efficient indexing, like a cache.) Work has begun on the unicode-utfand unicode-utf8-py3 branches. More is needed, for instance there areSIMD optimizations that are not yet used.

Convert RPython to Python3

The world is moving on, we should too.

Improve performance

  • Make uninlined Python-level calls faster
  • Switch to a sea-of-nodes IR, or a Lua-Jit-like IR which iterates onon the sea-of-nodes approach
  • Use real register-allocation
  • Improve instruction selection / scheduling
  • Create a hybrid tracing/method JIT

Improve warmup

  • Interpreter speed-ups
  • Optimize while tracing
  • Cache information between runs

Translation Toolchain

(XXX this is unlikely to be feasible.)

  • Incremental or distributed translation.
  • Allow separate compilation of extension modules.

Various GCs

PyPy has pluggable garbage collection policy. This means that various garbagecollectors can be written for specialized purposes, or even variousexperiments can be done for the general purpose. Examples:

  • A garbage collector that compact memory better for mobile devices
  • A concurrent garbage collector (a lot of work)
  • A collector that keeps object flags in separate memory pages, to avoidun-sharing all pages between several fork()ed processes

STM (Software Transactional Memory)

This is work in progress. Besides the main development path, whose goal isto make a (relatively fast) version of pypy which includes STM, there areindependent topics that can already be experimented with on the existing,JIT-less pypy-stm version:

  • What kind of conflicts do we get in real use cases? And, sometimes,which data structures would be more appropriate? For example, a dictimplemented as a hash table will suffer “stm collisions” in all threadswhenever one thread writes anything to it; but there could be otherimplementations. Maybe alternate strategies can be implemented at thelevel of the Python interpreter (see list/dict strategies,pypy/objspace/std/{list,dict}object.py).
  • More generally, there is the idea that we would need some kind of“debugger”-like tool to “debug” things that are not bugs, but stmconflicts. How would this tool look like to the end Pythonprogrammers? Like a profiler? Or like a debugger with breakpointson aborted transactions? It would probably be all app-level, witha few hooks e.g. for transaction conflicts.
  • Find good ways to have libraries using internally threads and atomics,but not exposing threads to the user. Right now there is a rough draftin lib_pypy/transaction.py, but much better is possible. For examplewe could probably have an iterator-like concept that allows each loopiteration to run in parallel.

Introduce new benchmarks

Our benchmark runner is showing its age. We should merge with the CPython site

Additionally, we’re usually happy to introduce new benchmarks. Please consult usbefore, but in general something that’s real-world python codeand is not already represented is welcome. We need at least a standalonescript that can run without parameters. Example ideas (benchmarks needto be got from them!):

  • hg

Interfacing with C

While we could make cpyext faster, we would also like to explore otherideas. It seems cffi is only appropriate for small to medium-sized extensions,and it is hard to imagine NumPy abandoning the C-API. Here are a few ideas: Extend Cython to have a backend that can be understood by the JIT Collaborate with C-extension authors to ensure full PyPy support (see below)* Put PyPy compatible packages on PyPI and in conda

Support more platforms

We have a plan for a Windows 64 port.

Make more python modules pypy-friendly

A lot of work has gone into PyPy’s implementation of CPython’s C-API, cpyext,over the last years to let it reach a practical level of compatibility, so thatC extensions for CPython work on PyPy without major rewrites. However, thereare still many edges and corner cases where it misbehaves.

For any popular extension that does not already advertise full PyPycompatibility, it would thus be useful to take a close look at it in order tomake it fully compatible with PyPy. The general process is something like:

  • Run the extension’s tests on PyPy and look at the test failures.
  • Some of the failures may be solved by identifying cases where the extensionrelies on undocumented or internal details of CPython, and rewriting therelevant code to follow documented best practices. Open issues and send pullrequests as appropriate given the extension’s development process.
  • Other failures may highlight incompatibilities between cpyext and CPython.Please report them to us and try to fix them.
  • Run benchmarks, either provided by the extension developers or created byyou. Any case where PyPy is significantly slower than CPython is to beconsidered a bug and solved as above.

Alternatively, an approach we used to recommend was to rewrite C extensionsusing more pypy-friendly technologies, e.g. cffi. Here is a partial list ofgood work that needs to be finished:

wxPython https://bitbucket.org/amauryfa/wxpython-cffi

Status: A project by a PyPy developer to adapt the Phoenix sip build system to cffi

The project is a continuation of a 2013 GSOC https://bitbucket.org/waedt/wxpython_cffi

TODO: Merge the latest version of the wrappers and finish the sip conversion

pygame https://github.com/CTPUG/pygame_cffi

Status: see blog post <http://morepypy.blogspot.com/2014/03/pygamecffi-pygame-on-pypy.html>

TODO: see the end of the blog post

pyopengl https://bitbucket.org/duangle/pyopengl-cffi

Status: unknown