Faster code via static typing

Cython is a Python compiler. This means that it can compile normalPython code without changes (with a few obvious exceptions of some as-yetunsupported language features, see Cython limitations).However, for performance critical code, it is often helpful to addstatic type declarations, as they will allow Cython to step out of thedynamic nature of the Python code and generate simpler and faster C code- sometimes faster by orders of magnitude.

It must be noted, however, that type declarations can make the sourcecode more verbose and thus less readable. It is therefore discouragedto use them without good reason, such as where benchmarks provethat they really make the code substantially faster in a performancecritical section. Typically a few types in the right spots go a long way.

All C types are available for type declarations: integer and floatingpoint types, complex numbers, structs, unions and pointer types.Cython can automatically and correctly convert between the types onassignment. This also includes Python’s arbitrary size integer types,where value overflows on conversion to a C type will raise a PythonOverflowError at runtime. (It does not, however, check for overflowwhen doing arithmetic.) The generated C code will handle theplatform dependent sizes of C types correctly and safely in this case.

Types are declared via the cdef keyword.

Typing Variables

Consider the following pure Python code:

  1. def f(x):
  2. return x ** 2 - x
  3.  
  4.  
  5. def integrate_f(a, b, N):
  6. s = 0
  7. dx = (b - a) / N
  8. for i in range(N):
  9. s += f(a + i * dx)
  10. return s * dx

Simply compiling this in Cython merely gives a 35% speedup. This isbetter than nothing, but adding some static types can make a much largerdifference.

With additional type declarations, this might look like:

  1. def f(double x):
  2. return x ** 2 - x
  3.  
  4.  
  5. def integrate_f(double a, double b, int N):
  6. cdef int i
  7. cdef double s, dx
  8. s = 0
  9. dx = (b - a) / N
  10. for i in range(N):
  11. s += f(a + i * dx)
  12. return s * dx

Since the iterator variable i is typed with C semantics, the for-loop will be compiledto pure C code. Typing a, s and dx is important as they are involvedin arithmetic within the for-loop; typing b and N makes less of adifference, but in this case it is not much extra work to beconsistent and type the entire function.

This results in a 4 times speedup over the pure Python version.

Typing Functions

Python function calls can be expensive – in Cython doubly so becauseone might need to convert to and from Python objects to do the call.In our example above, the argument is assumed to be a C double both inside f()and in the call to it, yet a Python float object must be constructed around theargument in order to pass it.

Therefore Cython provides a syntax for declaring a C-style function,the cdef keyword:

  1. cdef double f(double x) except? -2:
  2. return x ** 2 - x

Some form of except-modifier should usually be added, otherwise Cythonwill not be able to propagate exceptions raised in the function (or afunction it calls). The except? -2 means that an error will be checkedfor if -2 is returned (though the ? indicates that -2 may alsobe used as a valid return value).Alternatively, the slower except * is alwayssafe. An except clause can be left out if the function returns a Pythonobject or if it is guaranteed that an exception will not be raisedwithin the function call.

A side-effect of cdef is that the function is no longer available fromPython-space, as Python wouldn’t know how to call it. It is also nolonger possible to change f() at runtime.

Using the cpdef keyword instead of cdef, a Python wrapper is alsocreated, so that the function is available both from Cython (fast, passingtyped values directly) and from Python (wrapping values in Pythonobjects). In fact, cpdef does not just provide a Python wrapper, it alsoinstalls logic to allow the method to be overridden by python methods, evenwhen called from within cython. This does add a tiny overhead compared to cdefmethods.

Speedup: 150 times over pure Python.

Determining where to add types

Because static typing is often the key to large speed gains, beginnersoften have a tendency to type everything in sight. This cuts down on bothreadability and flexibility, and can even slow things down (e.g. by addingunnecessary type checks, conversions, or slow buffer unpacking).On the other hand, it is easy to killperformance by forgetting to type a critical loop variable. Two essentialtools to help with this task are profiling and annotation.Profiling should be the first step of any optimization effort, and cantell you where you are spending your time. Cython’s annotation can thentell you why your code is taking time.

Using the -a switch to the cython command line program (orfollowing a link from the Sage notebook) results in an HTML reportof Cython code interleaved with the generated C code. Lines arecolored according to the level of “typedness” –white lines translate to pure C,while lines that require the Python C-API are yellow(darker as they translate to more C-API interaction).Lines that translate to C code have a plus (+) in frontand can be clicked to show the generated code.

This report is invaluable when optimizing a function for speed,and for determining when to release the GIL:in general, a nogil block may contain only “white” code.

../../_images/htmlreport.png

Note that Cython deduces the type of local variables based on their assignments(including as loop variable targets) which can also cut down on the need toexplicitly specify types everywhere.For example, declaring dx to be of type double above is unnecessary,as is declaring the type of s in the last version (where the return typeof f is known to be a C double.) A notable exception, however, isinteger types used in arithmetic expressions, as Cython is unable to ensurethat an overflow would not occur (and so falls back to object in casePython’s bignums are needed). To allow inference of C integer types, set theinfer_types directive to True. This directivedoes a work similar to the auto keyword in C++ for the readers who are familiarwith this language feature. It can be of great help to cut down on the need to typeeverything, but it also can lead to surprises. Especially if one isn’t familiar witharithmetic expressions with c types. A quick overview of thosecan be found here.