PyCUDA/CUDAMat/Gnumpy compatibility

PyCUDA

Currently, PyCUDA and Theano have different objects to store GPUdata. The two implementations do not support the same set of features.Theano’s implementation is called CudaNdarray and supportsstrides. It also only supports the float32 dtype. PyCUDA’s implementationis called GPUArray and doesn’t support strides. However, it can deal withall NumPy and CUDA dtypes.

We are currently working on having the same base object for both that willalso mimic Numpy. Until this is ready, here is some information on how touse both objects in the same script.

Transfer

You can use the theano.misc.pycudautils module to convert GPUArray to andfrom CudaNdarray. The functions to_cudandarray(x, copyif=False) andto_gpuarray(x) return a new object that occupies the same memory spaceas the original. Otherwise it raises a _ValueError. Because GPUArrays don’tsupport strides, if the CudaNdarray is strided, we could copy it tohave a non-strided copy. The resulting GPUArray won’t share the samememory region. If you want this behavior, set copyif=True into_gpuarray.

Compiling with PyCUDA

You can use PyCUDA to compile CUDA functions that work directly onCudaNdarrays. Here is an example from the file theano/misc/tests/test_pycuda_theano_simple.py:

  1. import sys
  2. import numpy
  3. import theano
  4. import theano.sandbox.cuda as cuda_ndarray
  5. import theano.misc.pycuda_init
  6. import pycuda
  7. import pycuda.driver as drv
  8. import pycuda.gpuarray
  9.  
  10.  
  11. def test_pycuda_theano():
  12. """Simple example with pycuda function and Theano CudaNdarray object."""
  13. from pycuda.compiler import SourceModule
  14. mod = SourceModule("""
  15. __global__ void multiply_them(float *dest, float *a, float *b)
  16. {
  17. const int i = threadIdx.x;
  18. dest[i] = a[i] * b[i];
  19. }
  20. """)
  21.  
  22. multiply_them = mod.get_function("multiply_them")
  23.  
  24. a = numpy.random.randn(100).astype(numpy.float32)
  25. b = numpy.random.randn(100).astype(numpy.float32)
  26.  
  27. # Test with Theano object
  28. ga = cuda_ndarray.CudaNdarray(a)
  29. gb = cuda_ndarray.CudaNdarray(b)
  30. dest = cuda_ndarray.CudaNdarray.zeros(a.shape)
  31. multiply_them(dest, ga, gb,
  32. block=(400, 1, 1), grid=(1, 1))
  33. assert (numpy.asarray(dest) == a * b).all()

Theano Op using a PyCUDA function

You can use a GPU function compiled with PyCUDA in a Theano op:

  1. import numpy, theano
  2. import theano.misc.pycuda_init
  3. from pycuda.compiler import SourceModule
  4. import theano.sandbox.cuda as cuda
  5.  
  6. class PyCUDADoubleOp(theano.Op):
  7. __props__ = ()
  8. def make_node(self, inp):
  9. inp = cuda.basic_ops.gpu_contiguous(
  10. cuda.basic_ops.as_cuda_ndarray_variable(inp))
  11. assert inp.dtype == "float32"
  12. return theano.Apply(self, [inp], [inp.type()])
  13. def make_thunk(self, node, storage_map, _, _2, impl=None):
  14. mod = SourceModule("""
  15. __global__ void my_fct(float * i0, float * o0, int size) {
  16. int i = blockIdx.x * blockDim.x + threadIdx.x;
  17. if(i<size){
  18. o0[i] = i0[i] * 2;
  19. }
  20. }""")
  21. pycuda_fct = mod.get_function("my_fct")
  22. inputs = [ storage_map[v] for v in node.inputs]
  23. outputs = [ storage_map[v] for v in node.outputs]
  24. def thunk():
  25. z = outputs[0]
  26. if z[0] is None or z[0].shape!=inputs[0][0].shape:
  27. z[0] = cuda.CudaNdarray.zeros(inputs[0][0].shape)
  28. grid = (int(numpy.ceil(inputs[0][0].size / 512.)),1)
  29. pycuda_fct(inputs[0][0], z[0], numpy.intc(inputs[0][0].size),
  30. block=(512, 1, 1), grid=grid)
  31. thunk.lazy = False
  32. return thunk

CUDAMat

There are functions for conversion between CUDAMat objects and Theano’s CudaNdArray objects.They obey the same principles as Theano’s PyCUDA functions and can be found intheano.misc.cudamat_utils.py.

WARNING: There is a peculiar problem associated with stride/shape with those converters.In order to work, the test needs a transpose and reshape

Gnumpy

There are conversion functions between Gnumpy garray objects and Theano CudaNdArray objects.They are also similar to Theano’s PyCUDA functions and can be found in theano.misc.gnumpy_utils.py.