List of gpuarray Ops implemented

Normally you should not call directly those Ops! Theano shouldautomatically transform CPU ops to their GPU equivalent. So this listis just useful to let people know what is implemented on the GPU.

Basic Op

  • class theano.gpuarray.basicops.CGpuKernelBase(_func_files, func_name=None)[source]
  • Class to combine GpuKernelBase and COp.

It adds a new section type ‘kernels’ where you can define kernelswith the ‘#kernel’ tag

  • class theano.gpuarray.basicops.GpuAlloc(_context_name, memset_0=False)[source]
  • Allocate initialized memory on the GPU.

Parameters:

  • context_name (str) – The name of the context in which to allocate memory
  • memset_0 (bool) – It’s only an optimized version. True, it means thevalue is always 0, so the c code call memset as it is faster.
  • class theano.gpuarray.basicops.GpuAllocEmpty(_dtype, context_name)[source]
  • Allocate uninitialized memory on the GPU.
  • class theano.gpuarray.basic_ops.GpuContiguous[source]
  • Return a C contiguous version of the input.

This may either pass the object as-is (if already C contiguous) ormake a copy.

  • class theano.gpuarray.basicops.GpuEye(_dtype=None, context_name=None)[source]
  • Eye for GPU.
  • class theano.gpuarray.basicops.GpuFromHost(_context_name)[source]
  • Transfer data to GPU.
  • class theano.gpuarray.basicops.GpuJoin(_view=-1)[source]
  • Join for GPU.
  • class theano.gpuarray.basic_ops.GpuKernelBase[source]
  • Base class for operations that need to compile kernels.

It is not mandatory to use this class, but it helps with a lot ofthe small things that you have to pay attention to.

  • gpukernels(_node, name)[source]
  • This is the method to override. This should return an iterableof Kernel objects that describe the kernels this op will need.

  • kernelversion(_node)[source]

  • If you override c_code_cache_version_apply(), call thismethod to have the version of the kernel support code.

Parameters:node (apply node) – The node that we need the cache version for.

  • class theano.gpuarray.basicops.GpuReshape(_ndim, name=None)[source]
  • Reshape for GPU variables.
  • class theano.gpuarray.basicops.GpuSplit(_len_splits)[source]
  • Split for GPU.
  • class theano.gpuarray.basicops.GpuToGpu(_context_name)[source]
  • Transfer data between GPUs.
  • class theano.gpuarray.basic_ops.HostFromGpu[source]
  • Transfer data to CPU.
  • class theano.gpuarray.basicops.Kernel(_code, params, name, flags, codevar=None, objvar=None, fname=None, sname=None)[source]
  • This class groups together all the attributes of a gpu kernel.

params should contain the data type for each argument. Bufferarguments should use the GpuArray class as the data type andscalar should use their equivalent numpy dtype. For ga_size andga_ssize, use gpuarray.SIZE and gpuarray.SSIZE.

If the ctypes flags is set to True then it should be a Cstring which represent the typecode to use.

flags can contain the following keys whose values are booleans:

have_double
the kernel uses double-typed variables somewhere
have_small
the kernel uses variables whose type takes less than 4 bytes somewhere
have_complex
the kernel uses complex values somewhere
have_half
the kernel uses half-floats somewhere
ctypes
the params list consists of C typecodes

It can also have the key cflags which is a string of C flagvalues like this “GA_USE_DOUBLE|GA_USE_SMALL”.

Parameters:

  • code (str) – The source code of the kernel.
  • params (list) – list of parameter types.
  • name (str) – the name of the kernel function in the source.
  • flags (dict) – dictionary of flags
  • codevar (str) – the name of the variable for the code object.(defaults to kcode_ + name)
  • objvar (str) – the name of the variable for the kernel object.(defaults to k_ + name)
  • fname (str) – the name of the function wrapper.(defaults to name + _call)
  • sname (str) – the name of the scheduled call function(defaults to name __scall)
  • theano.gpuarray.basicops.as_gpuarray_variable(_x, context_name)[source]
  • This will attempt to convert x into a variable on the GPU.

It can take either a value of another variable. If x is alreadysuitable, it will be returned as-is.

Parameters:

  • x – Object to convert
  • context_name (str or None) – target context name for the result
  • theano.gpuarray.basicops.infer_context_name(*vars_)[source]
  • Infer the context name to use from the inputs given

Blas Op

  • class theano.gpuarray.blas.BaseGpuCorr3dMM(border_mode='valid', subsample=(1, 1, 1), filter_dilation=(1, 1, 1), num_groups=1)[source]
  • Base class for GpuCorr3dMM, GpuCorr3dMM_gradWeights andGpuCorr3dMM_gradInputs. Cannot be used directly.

Parameters:

  • border_mode ({'valid', 'full', 'half'}) – Additionally, the padding size could be directly specified by an integeror a pair of integers
  • subsample – Perform subsampling of the output (default: (1, 1, 1)).
  • filter_dilation – Perform subsampling of the input, also known as dilation (default: (1, 1, 1)).
  • num_groups – Divides the image, kernel and output tensors into num_groupsseparate groups. Each which carry out convolutions separately (default : 1).
  • ccode_helper(_bottom, weights, top, direction, sub, height=None, width=None, depth=None)[source]
  • This generates the C code for GpuCorr3dMM (direction=”forward”),GpuCorr3dMM_gradWeights (direction=”backprop weights”), andGpuCorr3dMM_gradInputs (direction=”backprop inputs”).Depending on the direction, one of bottom, weights, top willreceive the output, while the other two serve as inputs.

Parameters:

  1. - **bottom** Variable name of the input images in the forward pass,or the gradient of the input images in backprop wrt. inputs
  2. - **weights** Variable name of the filters in the forward pass,or the gradient of the filters in backprop wrt. weights
  3. - **top** Variable name of the output images / feature maps in theforward pass, or the gradient of the outputs in the backprop passes
  4. - **direction** (_{'forward'__, __'backprop weights'__, __'backprop inputs'}_) forward to correlate bottom with weights and store results in top,“backprop weights to do a valid convolution of bottom with top(swapping the first two dimensions) and store results in weights,and backprop inputs to do a full convolution of top with weights(swapping the first two dimensions) and store results in bottom.
  5. - **sub** Dictionary of substitutions useable to help generating the C code.
  6. - **height** Required if self.subsample[0] != 1, a variable giving the height ofthe filters for direction=”backprop weights or the height of theinput images for direction=”backprop inputs”.Required if self.border_mode == half’, a variable giving the heightof the filters for direction=”backprop weights”.Not required otherwise, but if a value is given this will be checked.
  7. - **width** Required if self.subsample[1] != 1, a variable giving the width ofthe filters for direction=”backprop weights or the width of theinput images for direction=”backprop inputs”.Required if self.border_mode == half’, a variable giving the widthof the filters for direction=”backprop weights”.Not required otherwise, but if a value is given this will be checked.
  8. - **depth** Required if self.subsample[2] != 1, a variable giving the depth ofthe filters for direction=”backprop weights or the depth of theinput images for direction=”backprop inputs”.Required if self.border_mode == half’, a variable giving the depthof the filters for direction=”backprop weights”.Not required otherwise, but if a value is given this will be checked.
  • flops(inp, outp)[source]
  • Useful with the hack in profilemode to print the MFlops.
  • class theano.gpuarray.blas.BaseGpuCorrMM(border_mode='valid', subsample=(1, 1), filter_dilation=(1, 1), num_groups=1, unshared=False)[source]
  • Base class for GpuCorrMM, GpuCorrMM_gradWeights andGpuCorrMM_gradInputs. Cannot be used directly.

Parameters:

  • border_mode ({'valid', 'full', 'half'}) – Additionally, the padding size could be directly specified by an integer,a pair of integers, or two pairs of integers.
  • subsample – Perform subsampling of the output (default: (1, 1)).
  • filter_dilation – Perform subsampling of the input, also known as dilation (default: (1, 1)).
  • num_groups – Divides the image, kernel and output tensors into num_groupsseparate groups. Each which carry out convolutions separately (default : 1).
  • unshared – Perform unshared correlation (default: False)
  • ccode_helper(_bottom, weights, top, direction, sub, height=None, width=None)[source]
  • This generates the C code for GpuCorrMM (direction=”forward”),GpuCorrMM_gradWeights (direction=”backprop weights”), andGpuCorrMM_gradInputs (direction=”backprop inputs”).Depending on the direction, one of bottom, weights, top willreceive the output, while the other two serve as inputs.

Parameters:

  1. - **bottom** Variable name of the input images in the forward pass,or the gradient of the input images in backprop wrt. inputs
  2. - **weights** Variable name of the filters in the forward pass,or the gradient of the filters in backprop wrt. weights
  3. - **top** Variable name of the output images / feature maps in theforward pass, or the gradient of the outputs in the backprop passes
  4. - **direction** (_{'forward'__, __'backprop weights'__, __'backprop inputs'}_) forward to correlate bottom with weights and store results in top,“backprop weights to do a valid convolution of bottom with top(swapping the first two dimensions) and store results in weights,and backprop inputs to do a full convolution of top with weights(swapping the first two dimensions) and store results in bottom.
  5. - **sub** Dictionary of substitutions useable to help generating the C code.
  6. - **height** Required if self.subsample[0] != 1, a variable giving the height ofthe filters for direction=”backprop weights or the height of theinput images for direction=”backprop inputs”.Required if self.border_mode == half’, a variable giving the heightof the filters for direction=”backprop weights”.Not required otherwise, but if a value is given this will be checked.
  7. - **width** Required if self.subsample[1] != 1, a variable giving the width ofthe filters for direction=”backprop weights or the width of theinput images for direction=”backprop inputs”.Required if self.border_mode == half’, a variable giving the widthof the filters for direction=”backprop weights”.Not required otherwise, but if a value is given this will be checked.
  • flops(inp, outp)[source]
  • Useful with the hack in profilemode to print the MFlops.
  • class theano.gpuarray.blas.GpuCorr3dMM(border_mode='valid', subsample=(1, 1, 1), filter_dilation=(1, 1, 1), num_groups=1)[source]
  • GPU correlation implementation using Matrix Multiplication.

Parameters:

  • border_mode – The width of a border of implicit zeros to pad theinput with. Must be a tuple with 3 elements giving the width ofthe padding on each side, or a single integer to pad the sameon all sides, or a string shortcut setting the padding at runtime:'valid' for (0, 0, 0) (valid convolution, no padding), 'full'for (kernel_rows - 1, kernel_columns - 1, kernel_depth - 1)(full convolution), 'half' for (kernel_rows // 2, kernel_columns // 2, kernel_depth // 2) (same convolution forodd-sized kernels). Note that the three widths are eachapplied twice, once per side (left and right, top and bottom, frontand back).
  • subsample – The subsample operation applied to each output image. Should be a tuplewith 3 elements. (sv, sh, sl) is equivalent toGpuCorrMM(…)(…)[:,:,::sv, ::sh, ::sl], but faster.Set to (1, 1, 1) to disable subsampling.
  • filter_dilation – The filter dilation operation applied to each input image.Should be a tuple with 3 elements.Set to (1, 1, 1) to disable filter dilation.
  • num_groups – The number of distinct groups the image and kernel must bedivided into.should be an intset to 1 to disable grouped convolution

Notes

Currently, the Op requires the inputs, filters and outputs to beC-contiguous. Use gpu_contiguous on these argumentsif needed.

You can either enable the Theano flag optimizer_including=conv_gemm_to automatically replace all convolution operations with _GpuCorr3dMM_or one of its gradients, or you can use it as a replacement forconv2d, called as_GpuCorr3dMM(subsample=…)(image, filters). The latter is currentlyfaster, but note that it computes a correlation – if you need tocompute a convolution, flip the filters as filters[:,:,::-1,::-1,::-1].

  • class theano.gpuarray.blas.GpuCorr3dMMgradInputs(_border_mode='valid', subsample=(1, 1, 1), filter_dilation=(1, 1, 1), num_groups=1)[source]
  • Gradient wrt. inputs for GpuCorr3dMM.

Notes

You will not want to use this directly, but rely on Theano’s automaticdifferentiation or graph optimization to use it as needed.

  • class theano.gpuarray.blas.GpuCorr3dMMgradWeights(_border_mode='valid', subsample=(1, 1, 1), filter_dilation=(1, 1, 1), num_groups=1)[source]
  • Gradient wrt. filters for GpuCorr3dMM.

Notes

You will not want to use this directly, but rely on Theano’s automaticdifferentiation or graph optimization to use it as needed.

  • class theano.gpuarray.blas.GpuCorrMM(border_mode='valid', subsample=(1, 1), filter_dilation=(1, 1), num_groups=1, unshared=False)[source]
  • GPU correlation implementation using Matrix Multiplication.

Parameters:

  • border_mode – The width of a border of implicit zeros to pad theinput with. Must be a tuple with 2 elements giving the numbers of rowsand columns to pad on each side, or a single integer to pad the sameon all sides, or a string shortcut setting the padding at runtime:'valid' for (0, 0) (valid convolution, no padding), 'full'for (kernel_rows - 1, kernel_columns - 1) (full convolution),'half' for (kernel_rows // 2, kernel_columns // 2) (sameconvolution for odd-sized kernels).If it is a tuple containing 2 pairs of integers, then these specifythe padding to be applied on each side ((left, right), (top, bottom)).Otherwise, each width is applied twice, once per side (left and right,top and bottom).
  • subsample – The subsample operation applied to each output image.Should be a tuple with 2 elements.(sv, sh) is equivalent to GpuCorrMM(…)(…)[:,:,::sv, ::sh],but faster.Set to (1, 1) to disable subsampling.
  • filter_dilation – The filter dilation operation applied to each input image.Should be a tuple with 2 elements.Set to (1, 1) to disable filter dilation.
  • num_groups – The number of distinct groups the image and kernel must bedivided into.should be an intset to 1 to disable grouped convolution
  • unshared – Perform unshared correlation (default: False)

Notes

Currently, the Op requires the inputs, filters and outputs to beC-contiguous. Use gpu_contiguous on these argumentsif needed.

You can either enable the Theano flag optimizer_including=conv_gemm_to automatically replace all convolution operations with _GpuCorrMM_or one of its gradients, or you can use it as a replacement forconv2d, called as_GpuCorrMM(subsample=…)(image, filters). The latter is currentlyfaster, but note that it computes a correlation – if you need tocompute a convolution, flip the filters as filters[:,:,::-1,::-1].

  • class theano.gpuarray.blas.GpuCorrMMgradInputs(_border_mode='valid', subsample=(1, 1), filter_dilation=(1, 1), num_groups=1, unshared=False)[source]
  • Gradient wrt. inputs for GpuCorrMM.

Notes

You will not want to use this directly, but rely on Theano’s automaticdifferentiation or graph optimization to use it as needed.

  • class theano.gpuarray.blas.GpuCorrMMgradWeights(_border_mode='valid', subsample=(1, 1), filter_dilation=(1, 1), num_groups=1, unshared=False)[source]
  • Gradient wrt. filters for GpuCorrMM.

Notes

You will not want to use this directly, but rely on Theano’s automaticdifferentiation or graph optimization to use it as needed.

  • class theano.gpuarray.blas.GpuDot22[source]
  • Dot22 on the GPU.
  • class theano.gpuarray.blas.GpuGemm(inplace=False)[source]
  • Gemm on the GPU.
  • class theano.gpuarray.blas.GpuGemv(inplace=False)[source]
  • Gemv on the GPU.
  • class theano.gpuarray.blas.GpuGer(inplace=False)[source]
  • Ger on the GPU.

Elemwise Op

  • class theano.gpuarray.elemwise.GpuCAReduceCPY(scalar_op, axis=None, dtype=None, acc_dtype=None)[source]
  • CAReduce that reuse the python code from gpuarray.
  • class theano.gpuarray.elemwise.GpuCAReduceCuda(scalar_op, axis=None, reduce_mask=None, dtype=None, acc_dtype=None, pre_scalar_op=None)[source]
  • GpuCAReduceCuda is a Reduction along some dimensions by a scalar op.

Parameters:

  • reduce_mask – The dimensions along which to reduce. The reduce_mask is a tuple ofbooleans (actually integers 0 or 1) that specify for each inputdimension, whether to reduce it (1) or not (0).
  • pre_scalar_op – If present, must be a scalar op with only 1 input. We will execute iton the input value before reduction.

Examples

When scalar_op is a theano.scalar.basic.Add instance:

  • reduce_mask == (1,) sums a vector to a scalar
  • reduce_mask == (1,0) computes the sum of each column in a matrix
  • reduce_mask == (0,1) computes the sum of each row in a matrix
  • reduce_mask == (1,1,1) computes the sum of all elements in a 3-tensor.

Notes

Any reduce_mask of all zeros is a sort of ‘copy’, and may be removed duringgraph optimization.

This Op is a work in progress.

This op was recently upgraded from just GpuSum a general CAReduce. Notmany code cases are supported for scalar_op being anything other thanscalar.Add instances yet.

Important note: if you implement new cases for this op, be sure tobenchmark them and make sure that they actually result in a speedup.GPUs are not especially well-suited to reduction operations so it isquite possible that the GPU might be slower for some cases.

  • ccode_reduce_01X(_sio, node, name, x, z, fail, N)[source]

Parameters:N – The number of 1 in the pattern N=1 -> 01, N=2 -> 011 N=3 ->0111Work for N=1,2,3.

  • supportsc_code(_inputs)[source]
  • Returns True if the current op and reduce pattern has functioning C code.
  • class theano.gpuarray.elemwise.GpuDimShuffle(input_broadcastable, new_order, inplace=True)[source]
  • DimShuffle on the GPU.
  • class theano.gpuarray.elemwise.GpuElemwise(scalar_op, inplace_pattern=None, name=None, nfunc_spec=None, openmp=None)[source]
  • Elemwise on the GPU.

    • perform(node, inputs, output_storage, params=None)[source]
    • Required: Calculate the function on the inputs and put the variables inthe output storage. Return None.

Parameters:

  1. - **node** (_Apply instance_) Contains the symbolic inputs and outputs.
  2. - **inputs** (_list_) Sequence of inputs (immutable).
  3. - **output_storage** (_list_) List of mutable 1-element lists (do not change the length ofthese lists)

Notes

The output_storage list might contain data. If an element ofoutput_storage is not None, it has to be of the right type,for instance, for a TensorVariable, it has to be a Numpy ndarray,with the right number of dimensions, and the correct dtype.Its shape and stride pattern, can be arbitrary. It not isguaranteed that it was produced by a previous call to impl. Itcould be allocated by another Op impl is free to reuse it as itsees fit, or to discard it and allocate new memory.

Raises:MethodNotDefined – The subclass does not override this method.

  • class theano.gpuarray.elemwise.GpuErfcinv(output_types_preference=None, name=None)[source]
  • Inverse complementary error function for GPU.
  • class theano.gpuarray.elemwise.GpuErfinv(output_types_preference=None, name=None)[source]
  • Inverse error function for GPU.
  • exception theano.gpuarray.elemwise.SupportCodeError[source]
  • We do not support certain things (such as the C++ complex struct).
  • theano.gpuarray.elemwise.maxinputs_to_GpuElemwise(_node_or_outputs)[source]
  • Compute the maximum number of inputs that fit in a kernel call.

Subtensor Op

  • class theano.gpuarray.subtensor.GpuAdvancedBooleanIncSubtensor(inplace=False, set_instead_of_inc=False)[source]
  • Implement AdvancedBooleanIncSubtensor on the gpu.
  • class theano.gpuarray.subtensor.GpuAdvancedBooleanSubtensor[source]
  • AdvancedBooleanSubtensor on the GPU.
  • class theano.gpuarray.subtensor.GpuAdvancedIncSubtensor(inplace=False, set_instead_of_inc=False)[source]
  • Implement AdvancedIncSubtensor on the gpu.
  • class theano.gpuarray.subtensor.GpuAdvancedIncSubtensor1(inplace=False, set_instead_of_inc=False)[source]
  • Implement AdvancedIncSubtensor1 on the gpu.
  • class theano.gpuarray.subtensor.GpuAdvancedIncSubtensor1dev20(_inplace=False, set_instead_of_inc=False)[source]
  • Implement AdvancedIncSubtensor1 on the gpu with atomics

    • makenode(_x, y, ilist)[source]
    • It differs from GpuAdvancedIncSubtensor1 in that it makes surethe indexes are of type long.
  • class theano.gpuarray.subtensor.GpuAdvancedSubtensor[source]
  • AdvancedSubtensor on the GPU.
  • class theano.gpuarray.subtensor.GpuAdvancedSubtensor1(sparse_grad=False)[source]
  • AdvancedSubrensor1 on the GPU.
  • class theano.gpuarray.subtensor.GpuIncSubtensor(idx_list, inplace=False, set_instead_of_inc=False, destroyhandler_tolerate_aliased=None)[source]
  • Implement IncSubtensor on the gpu.

Notes

The optimization to make this inplace is in tensor/opt.The same optimization handles IncSubtensor and GpuIncSubtensor.This Op has c_code too; it inherits tensor.IncSubtensor’s c_code.The helper methods like do_type_checking(),copy_of_x(), etc. specialize the c_code for this Op.

Parameters:

  1. - **view** (_string_) C code expression for an array.
  2. - **source** (_string_) C code expression for an array.Returns:

C code expression to copy source into view, and 0 on success. Return type: str

Parameters:x – A string giving the name of a C variable pointing to an array.Returns:C code expression to make a copy of x.Return type:str

Notes

Base class uses PyArrayObject *, subclasses may override fordifferent types of arrays.

  • dotype_checking(_node)[source]
  • Should raise NotImplementedError if c_code does not supportthe types involved in this node.

  • get_helper_c_code_args()[source]

  • Return a dictionary of arguments to use with helper_c_code.

  • makeview_array(_x, view_ndim)[source]

  • //TODO

Parameters:

  1. - **x** A string identifying an array to be viewed.
  2. - **view_ndim** A string specifying the number of dimensions to have in the view.This doesnt need to actually set up the view with theright indexing; well do that manually later.
  • class theano.gpuarray.subtensor.GpuSubtensor(idx_list)[source]
  • Subtensor on the GPU.
  • theano.gpuarray.subtensor.checkand_convert_boolean_masks(_input, idx_list)[source]
  • This function checks if the boolean mask arrays in the index havethe right shape and converts them to index arrays by calling nonzero.For each boolean mask, we check if the mask has thesame shape as the input. This is enforced in NumPy 0.13.0 andnewer, but not by earlier versions. If the size is not the same,this method raises an IndexError.

Nnet Op

  • class theano.gpuarray.nnet.GpuCrossentropySoftmax1HotWithBiasDx[source]
  • Implement CrossentropySoftmax1HotWithBiasDx on the gpu.

Gradient wrt x of the CrossentropySoftmax1Hot Op.

  • class theano.gpuarray.nnet.GpuCrossentropySoftmaxArgmax1HotWithBias[source]
  • Implement CrossentropySoftmaxArgmax1HotWithBias on the gpu.
  • class theano.gpuarray.nnet.GpuSoftmax[source]
  • Implement Softmax on the gpu.
  • class theano.gpuarray.nnet.GpuSoftmaxWithBias[source]
  • Implement SoftmaxWithBias on the gpu.
  • class theano.gpuarray.neighbours.GpuImages2Neibs(mode='valid')[source]
  • Images2Neibs for the GPU.