Working with NumPy

Note

Cython 0.16 introduced typed memoryviews as a successor to the NumPyintegration described here. They are easier to use than the buffer syntaxbelow, have less overhead, and can be passed around without requiring the GIL.They should be preferred to the syntax presented in this page.See Cython for NumPy users.

You can use NumPy from Cython exactly the same as in regular Python, but bydoing so you are losing potentially high speedups because Cython has supportfor fast access to NumPy arrays. Let’s see how this works with a simpleexample.

The code below does 2D discrete convolution of an image with a filter (and I’msure you can do better!, let it serve for demonstration purposes). It is bothvalid Python and valid Cython code. I’ll refer to it as bothconvolve_py.py for the Python version and convolve1.pyx forthe Cython version – Cython uses “.pyx” as its file suffix.

  1. import numpy as np
  2.  
  3.  
  4. def naive_convolve(f, g):
  5. # f is an image and is indexed by (v, w)
  6. # g is a filter kernel and is indexed by (s, t),
  7. # it needs odd dimensions
  8. # h is the output image and is indexed by (x, y),
  9. # it is not cropped
  10. if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1:
  11. raise ValueError("Only odd dimensions on filter supported")
  12. # smid and tmid are number of pixels between the center pixel
  13. # and the edge, ie for a 5x5 filter they will be 2.
  14. #
  15. # The output size is calculated by adding smid, tmid to each
  16. # side of the dimensions of the input image.
  17. vmax = f.shape[0]
  18. wmax = f.shape[1]
  19. smax = g.shape[0]
  20. tmax = g.shape[1]
  21. smid = smax // 2
  22. tmid = tmax // 2
  23. xmax = vmax + 2 * smid
  24. ymax = wmax + 2 * tmid
  25. # Allocate result image.
  26. h = np.zeros([xmax, ymax], dtype=f.dtype)
  27. # Do convolution
  28. for x in range(xmax):
  29. for y in range(ymax):
  30. # Calculate pixel value for h at (x,y). Sum one component
  31. # for each pixel (s, t) of the filter g.
  32. s_from = max(smid - x, -smid)
  33. s_to = min((xmax - x) - smid, smid + 1)
  34. t_from = max(tmid - y, -tmid)
  35. t_to = min((ymax - y) - tmid, tmid + 1)
  36. value = 0
  37. for s in range(s_from, s_to):
  38. for t in range(t_from, t_to):
  39. v = x - smid + s
  40. w = y - tmid + t
  41. value += g[smid - s, tmid - t] * f[v, w]
  42. h[x, y] = value
  43. return h

This should be compiled to produce yourmod.so (for Linux systems, on Windowssystems, it will be yourmod.pyd). Werun a Python session to test both the Python version (imported from.py-file) and the compiled Cython module.

  1. In [1]: import numpy as np
  2. In [2]: import convolve_py
  3. In [3]: convolve_py.naive_convolve(np.array([[1, 1, 1]], dtype=np.int),
  4. ... np.array([[1],[2],[1]], dtype=np.int))
  5. Out [3]:
  6. array([[1, 1, 1],
  7. [2, 2, 2],
  8. [1, 1, 1]])
  9. In [4]: import convolve1
  10. In [4]: convolve1.naive_convolve(np.array([[1, 1, 1]], dtype=np.int),
  11. ... np.array([[1],[2],[1]], dtype=np.int))
  12. Out [4]:
  13. array([[1, 1, 1],
  14. [2, 2, 2],
  15. [1, 1, 1]])
  16. In [11]: N = 100
  17. In [12]: f = np.arange(N*N, dtype=np.int).reshape((N,N))
  18. In [13]: g = np.arange(81, dtype=np.int).reshape((9, 9))
  19. In [19]: %timeit -n2 -r3 convolve_py.naive_convolve(f, g)
  20. 2 loops, best of 3: 1.86 s per loop
  21. In [20]: %timeit -n2 -r3 convolve1.naive_convolve(f, g)
  22. 2 loops, best of 3: 1.41 s per loop

There’s not such a huge difference yet; because the C code still does exactlywhat the Python interpreter does (meaning, for instance, that a new object isallocated for each number used). Look at the generated html file and see whatis needed for even the simplest statements you get the point quickly. We needto give Cython more information; we need to add types.

Adding types

To add types we use custom Cython syntax, so we are now breaking Python sourcecompatibility. Consider this code (read the comments!) :

  1. # tag: numpy
  2. # You can ignore the previous line.
  3. # It's for internal testing of the cython documentation.
  4.  
  5. import numpy as np
  6.  
  7. # "cimport" is used to import special compile-time information
  8. # about the numpy module (this is stored in a file numpy.pxd which is
  9. # currently part of the Cython distribution).
  10. cimport numpy as np
  11.  
  12. # We now need to fix a datatype for our arrays. I've used the variable
  13. # DTYPE for this, which is assigned to the usual NumPy runtime
  14. # type info object.
  15. DTYPE = np.int
  16.  
  17. # "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For
  18. # every type in the numpy module there's a corresponding compile-time
  19. # type with a _t-suffix.
  20. ctypedef np.int_t DTYPE_t
  21.  
  22. # "def" can type its arguments but not have a return type. The type of the
  23. # arguments for a "def" function is checked at run-time when entering the
  24. # function.
  25. #
  26. # The arrays f, g and h is typed as "np.ndarray" instances. The only effect
  27. # this has is to a) insert checks that the function arguments really are
  28. # NumPy arrays, and b) make some attribute access like f.shape[0] much
  29. # more efficient. (In this example this doesn't matter though.)
  30. def naive_convolve(np.ndarray f, np.ndarray g):
  31. if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1:
  32. raise ValueError("Only odd dimensions on filter supported")
  33. assert f.dtype == DTYPE and g.dtype == DTYPE
  34.  
  35. # The "cdef" keyword is also used within functions to type variables. It
  36. # can only be used at the top indentation level (there are non-trivial
  37. # problems with allowing them in other places, though we'd love to see
  38. # good and thought out proposals for it).
  39. #
  40. # For the indices, the "int" type is used. This corresponds to a C int,
  41. # other C types (like "unsigned int") could have been used instead.
  42. # Purists could use "Py_ssize_t" which is the proper Python type for
  43. # array indices.
  44. cdef int vmax = f.shape[0]
  45. cdef int wmax = f.shape[1]
  46. cdef int smax = g.shape[0]
  47. cdef int tmax = g.shape[1]
  48. cdef int smid = smax // 2
  49. cdef int tmid = tmax // 2
  50. cdef int xmax = vmax + 2 * smid
  51. cdef int ymax = wmax + 2 * tmid
  52. cdef np.ndarray h = np.zeros([xmax, ymax], dtype=DTYPE)
  53. cdef int x, y, s, t, v, w
  54.  
  55. # It is very important to type ALL your variables. You do not get any
  56. # warnings if not, only much slower code (they are implicitly typed as
  57. # Python objects).
  58. cdef int s_from, s_to, t_from, t_to
  59.  
  60. # For the value variable, we want to use the same data type as is
  61. # stored in the array, so we use "DTYPE_t" as defined above.
  62. # NB! An important side-effect of this is that if "value" overflows its
  63. # datatype size, it will simply wrap around like in C, rather than raise
  64. # an error like in Python.
  65. cdef DTYPE_t value
  66. for x in range(xmax):
  67. for y in range(ymax):
  68. s_from = max(smid - x, -smid)
  69. s_to = min((xmax - x) - smid, smid + 1)
  70. t_from = max(tmid - y, -tmid)
  71. t_to = min((ymax - y) - tmid, tmid + 1)
  72. value = 0
  73. for s in range(s_from, s_to):
  74. for t in range(t_from, t_to):
  75. v = x - smid + s
  76. w = y - tmid + t
  77. value += g[smid - s, tmid - t] * f[v, w]
  78. h[x, y] = value
  79. return h

After building this and continuing my (very informal) benchmarks, I get:

  1. In [21]: import convolve2
  2. In [22]: %timeit -n2 -r3 convolve2.naive_convolve(f, g)
  3. 2 loops, best of 3: 828 ms per loop

Efficient indexing

There’s still a bottleneck killing performance, and that is the array lookupsand assignments. The []-operator still uses full Python operations –what we would like to do instead is to access the data buffer directly at Cspeed.

What we need to do then is to type the contents of the ndarray objects.We do this with a special “buffer” syntax which must be told the datatype(first argument) and number of dimensions (“ndim” keyword-only argument, ifnot provided then one-dimensional is assumed).

These are the needed changes:

  1. ...
  2. def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g):
  3. ...
  4. cdef np.ndarray[DTYPE_t, ndim=2] h = ...

Usage:

  1. In [18]: import convolve3
  2. In [19]: %timeit -n3 -r100 convolve3.naive_convolve(f, g)
  3. 3 loops, best of 100: 11.6 ms per loop

Note the importance of this change.

Gotcha: This efficient indexing only affects certain index operations,namely those with exactly ndim number of typed integer indices. So ifv for instance isn’t typed, then the lookup f[v, w] isn’toptimized. On the other hand this means that you can continue using Pythonobjects for sophisticated dynamic slicing etc. just as when the array is nottyped.

Tuning indexing further

The array lookups are still slowed down by two factors:

  • Bounds checking is performed.

  • Negative indices are checked for and handled correctly. The code above isexplicitly coded so that it doesn’t use negative indices, and it(hopefully) always access within bounds. We can add a decorator to disablebounds checking:

  1. ...
  2. cimport cython
  3. @cython.boundscheck(False) # turn off bounds-checking for entire function
  4. @cython.wraparound(False) # turn off negative index wrapping for entire function
  5. def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g):
  6. ...

Now bounds checking is not performed (and, as a side-effect, if you ‘’do’’happen to access out of bounds you will in the best case crash your programand in the worst case corrupt data). It is possible to switch bounds-checkingmode in many ways, see Compiler directives for moreinformation.

Also, we’ve disabled the check to wrap negative indices (e.g. g[-1] givingthe last value). As with disabling bounds checking, bad things will happenif we try to actually use negative indices with this disabled.

The function call overhead now starts to play a role, so we compare the lattertwo examples with larger N:

  1. In [11]: %timeit -n3 -r100 convolve4.naive_convolve(f, g)
  2. 3 loops, best of 100: 5.97 ms per loop
  3. In [12]: N = 1000
  4. In [13]: f = np.arange(N*N, dtype=np.int).reshape((N,N))
  5. In [14]: g = np.arange(81, dtype=np.int).reshape((9, 9))
  6. In [17]: %timeit -n1 -r10 convolve3.naive_convolve(f, g)
  7. 1 loops, best of 10: 1.16 s per loop
  8. In [18]: %timeit -n1 -r10 convolve4.naive_convolve(f, g)
  9. 1 loops, best of 10: 597 ms per loop

(Also this is a mixed benchmark as the result array is allocated within thefunction call.)

Warning

Speed comes with some cost. Especially it can be dangerous to set typedobjects (like f, g and h in our sample code) toNone. Setting such objects to None is entirelylegal, but all you can do with them is check whether they are None. Allother use (attribute lookup or indexing) can potentially segfault orcorrupt data (rather than raising exceptions as they would in Python).

The actual rules are a bit more complicated but the main message is clear:Do not use typed objects without knowing that they are not set to None.

More generic code

It would be possible to do:

  1. def naive_convolve(object[DTYPE_t, ndim=2] f, ...):

i.e. use object rather than np.ndarray. Under Python 3.0 thiscan allow your algorithm to work with any libraries supporting the bufferinterface; and support for e.g. the Python Imaging Library may easily be addedif someone is interested also under Python 2.x.

There is some speed penalty to this though (as one makes more assumptionscompile-time if the type is set to np.ndarray, specifically it isassumed that the data is stored in pure strided mode and not in indirectmode).