PyPy’s sandboxing features

Warning

This describes the old, unmaintained version. A new versionis in progress and should be merged back to trunk at some point soon.Please see its description here:https://mail.python.org/pipermail/pypy-dev/2019-August/015797.html

Introduction

PyPy offers sandboxing at a level similar to OS-level sandboxing (e.g.SECCOMP on Linux), but implemented in a fully portable way. To use it,a (regular, trusted) program launches a subprocess that is a specialsandboxed version of PyPy. This subprocess can run arbitrary untrustedPython code, but all its input/output is serialized to a stdin/stdoutpipe instead of being directly performed. The outer process reads thepipe and decides which commands are allowed or not (sandboxing), or evenreinterprets them differently (virtualization). A potential attackercan have arbitrary code run in the subprocess, but cannot actually doany input/output not controlled by the outer process. Additionalbarriers are put to limit the amount of RAM and CPU time used.

Note that this is very different from sandboxing at the Python languagelevel, i.e. placing restrictions on what kind of Python code theattacker is allowed to run (why? read about pysandbox).

Another point of comparison: if we were instead to try to plug CPythoninto a special virtualizing C library, we would get a resultthat is not only OS-specific, but unsafe, because CPython can besegfaulted (in many ways, all of them really, really obscure).Given enough efforts, an attacker can turn almost anysegfault into a vulnerability. The C code generated byPyPy is not segfaultable, as long as our code generators are correct -that’s a lower number of lines of code to trust. For the paranoid,PyPy translated with sandboxing also contains systematic run-timechecks (against buffer overflows for example)that are normally only present in debugging versions.

Warning

The hard work from the PyPy side is done — you get a fully secureversion. What is only experimental and unpolished is the library touse this sandboxed PyPy from a regular Python interpreter (CPython, oran unsandboxed PyPy). Contributions welcome.

Warning

Tested with PyPy2. May not work out of the box with PyPy3.

Overview

One of PyPy’s translation aspects is a sandboxing feature. It’s “sandboxing” asin “full virtualization”, but done in normal C with no OS support at all. It’sa two-processes model: we can translate PyPy to a special “pypy-c-sandbox”executable, which is safe in the sense that it doesn’t do any library orsystem calls - instead, whenever it would like to perform such an operation, itmarshals the operation name and the arguments to its stdout and it waits forthe marshalled result on its stdin. This pypy-c-sandbox process is meant to berun by an outer “controller” program that answers these operation requests.

The pypy-c-sandbox program is obtained by adding a transformation duringtranslation, which turns all RPython-level external function calls intostubs that do the marshalling/waiting/unmarshalling. An attacker thattries to escape the sandbox is stuck within a C program that contains noexternal function calls at all except for writing to stdout and reading fromstdin. (It’s still attackable in theory, e.g. by exploiting segfault-likesituations, but as explained in the introduction we think that PyPy israther safe against such attacks.)

The outer controller is a plain Python program that can run in CPythonor a regular PyPy. It can perform any virtualization it likes, bygiving the subprocess any custom view on its world. For example, whilethe subprocess thinks it’s using file handles, in reality the numbersare created by the controller process and so they need not be (andprobably should not be) real OS-level file handles at all. In the democontroller I’ve implemented there is simply a mapping from numbers tofile-like objects. The controller answers to the “os_open” operation bytranslating the requested path to some file or file-like object in somevirtual and completely custom directory hierarchy. The file-like objectis put in the mapping with any unused number >= 3 as a key, and thelatter is returned to the subprocess. The “os_read” operation works bymapping the pseudo file handle given by the subprocess back to afile-like object in the controller, and reading from the file-likeobject.

Translating an RPython program with sandboxing enabled also uses a special flagthat enables all sorts of C-level assertions against index-out-of-boundsaccesses.

By the way, as you should have realized, it’s really independent fromthe fact that it’s PyPy that we are translating. Any RPython programshould do. I’ve successfully tried it on the JS interpreter. Thecontroller is only called “pypy_interact” because it emulates a filehierarchy that makes pypy-c-sandbox happy - it contains (read-only)virtual directories like /bin/lib/pypy1.2/lib-python and/bin/lib/pypy1.2/lib_pypy and itpretends that the executable is /bin/pypy-c.

Howto

Grab a copy of the pypy repository. In the directory pypy/goal, run:

  1. ../../rpython/bin/rpython -O2 --sandbox targetpypystandalone.py

If you don’t have a regular PyPy installed, you should, because it’sfaster to translate; but you can also run the same line with pythonin front.

To run it, use the tools in the pypy/sandbox directory:

  1. ./pypy_interact.py /some/path/pypy-c-sandbox [args...]

Just like with pypy-c, if you pass no argument you get the interactiveprompt. In theory it’s impossible to do anything bad or read a randomfile on the machine from this prompt. To pass a script as an argument you needto put it in a directory along with all its dependencies, and askpypy_interact to export this directory (read-only) to the subprocess’virtual /tmp directory with the —tmp=DIR option. Example:

  1. mkdir myexported
  2. cp script.py myexported/
  3. ./pypy_interact.py --tmp=myexported /some/path/pypy-c-sandbox /tmp/script.py

This is safe to do even if script.py comes from some randomuntrusted source, e.g. if it is done by an HTTP server.

To limit the used heapsize, use the —heapsize=N option topypy_interact.py. You can also give a limit to the CPU time (real time) byusing the —timeout=N option.

Not all operations are supported; e.g. if you type os.readlink(‘…’),the controller crashes with an exception and the subprocess is killed.Other operations make the subprocess die directly with a “Fatal RPythonerror”. None of this is a security hole. More importantly, most otherbuilt-in modules are not enabled. Please read all the warnings in thispage before complaining about this. Contributions welcome.