Implementing the arithmetic Ops in C

Now that we have set up our double type properly to allow Cimplementations for operations that work on it, all we have to do nowis to actually define these operations in C.

How does it work?

Before a C Op is executed, the variables related to each of itsinputs will be declared and will be filled appropriately, either froman input provided by the end user (using c_extract) or it might simplyhave been calculated by another operation. For each of the outputs,the variables associated to them will be declared and initialized.

The operation then has to compute what it needs to using theinput variables and place the variables in the output variables.

What needs to be defined

There are less methods to define for an Op than for a Type:

  • class Op
    • ccode(_node, name, input_names, output_names, sub)
    • This must return C code that carries the computation we want todo.

sub is a dictionary of extras parameters to the c_codemethod. It contains the following values:

sub['fail']

A string of code that you should execute (after ensuring that a python exception is set) if your C code needs to raise an exception.

sub['params']

(optional) The name of the variable which holds the context for the node. This will only appear if the op has requested a context by having a get_params() method that return something other than None.
  • ccode_cleanup(_node, name, input_names, output_names, sub)
  • This must return C code that cleans up whatever c_codeallocated and that we must free.

Default: The default behavior is to do nothing.

  • cheaders([_c_compiler])
  • Returns a list of headers to include in the file. ‘Python.h’ isincluded by default so you don’t need to specify it. Also allof the headers required by the Types involved (inputs andoutputs) will also be included.

The c_compiler [1] parameter is the C compiler that willbe used to compile the code for the node. You may get multiplecalls with different C compilers.

  • cheader_dirs([_c_compiler])
  • Returns a list of directories to search for headers (argumentsto -I).

The c_compiler [1] parameter is the C compiler that willbe used to compile the code for the node. You may get multiplecalls with different C compilers.

  • clibraries([_c_compiler])
  • Returns a list of library names that your op needs to link to.All ops are automatically linked with ‘python’ and thelibraries their types require. (arguments to -l)

The c_compiler [1] parameter is the C compiler that willbe used to compile the code for the node. You may get multiplecalls with different C compilers.

  • clib_dirs([_c_compiler])
  • Returns a list of directory to search for libraries (argumentsto -L).

The c_compiler [1] parameter is the C compiler that willbe used to compile the code for the node. You may get multiplecalls with different C compilers.

  • ccompile_args([_c_compiler])
  • Allows to specify additional arbitrary arguments to the Ccompiler. This is not usually required.

The c_compiler [1] parameter is the C compiler that willbe used to compile the code for the node. You may get multiplecalls with different C compilers.

  • cno_compile_args([_c_compiler])
  • Returns a list of C compiler arguments that are forbidden whencompiling this Op.

The c_compiler [1] parameter is the C compiler that willbe used to compile the code for the node. You may get multiplecalls with different C compilers.

  • c_init_code()
  • Allows you to specify code that will be executed once when themodule is initialized, before anything else is executed. Thisis for code that will be executed once per Op.

  • cinit_code_apply(_node, name)

  • Allows you to specify code that will be executed once when themodule is initialized, before anything else is executed and isspecialized for a particular apply of an Op.

  • cinit_code_struct(_node, name, sub)

  • Allows you to specify code that will be inserted in the structconstructor of the Op. This is for code which should beexecuted once per thunk (Apply node, more or less).

sub is a dictionary of extras parameters to thec_code_init_code_struct method. It contains the followingvalues:

sub['fail']

A string of code that you should execute (after ensuring that a python exception is set) if your C code needs to raise an exception.

sub['params']

(optional) The name of the variable which holds the context for the node. This will only appear if the op has requested a context by having a get_params() method that return something other than None.
  • c_support_code()
  • Allows you to specify helper functions/structs that theOp needs. That code will be reused for each apply ofthis op. It will be inserted at global scope.

  • csupport_code_apply(_node, name)

  • Allows you to specify helper functions/structs specialized fora particular apply of an Op. Use c_support_code()if the code is the same for each apply of an op. It will beinserted at global scope.

  • csupport_code_struct(_node, name)

  • Allows you to specify helper functions of variables that willbe specific to one particular thunk. These are inserted atstruct scope.

Note:You cannot specify CUDA kernels in the code returned by thissince that isn’t supported by CUDA. You should place yourkernels in c_support_code() orc_support_code_apply() and call them from this code.

  • ccleanup_code_struct(_node, name)
  • Allows you to specify code that will be inserted in the structdestructor of the Op. This is for cleaninp up allocations andstuff like this when the thunk is released (when you “free” acompiled function using this op).

  • infershape(_node, (i0_shapes, i1_shapes, …))

  • Allow optimizations to lift the Shape op over this op. Anexample of why this is good is when we only need the shape of avariable: we will be able to obtain it without computing thevariable itself.

Must return a list where each element is a tuple representingthe shape of one output.

For example, for the matrix-matrix product infer_shape willhave as inputs (node, ((x0,x1), (y0,y1))) and should return[(x0, y1)]. Both the inputs and the return value may be Theanovariables.

  • c_code_cache_version()
  • Must return a tuple of hashable objects like integers. Thisspecifies the version of the code. It is used to cache thecompiled code. You MUST change the returned tuple for eachchange in the code. If you don’t want to cache the compiledcode return an empty tuple or don’t implement it.

  • ccode_cache_version_apply(_node)

  • Overrides c_code_cache_version() if defined, butotherwise has the same contract.

  • pythonconstant_folding(_node)

  • Optional. If present this method will be called before doingconstant folding of a node, with that node as a parameter. Ifit return True, we will not generate c code when doing constantfolding of this node. This is useful when the compilation ofthe c code will be longer then the computation in python(e.g. Elemwise of scalars).

In addition, this allow to lower the number of compiled moduleand disk access. Particularly useful when the file system loadis high or when theano compilation directory is shared by manyprocess (like on a network file server on a cluster).

  • getparams(_node)
  • (optional) If defined, should return the runtime params the opneeds. These parameters will be passed to the C code through thevariable named in sub[‘params’]. The variable is alsoavailable for use in the code returned byc_init_code_struct(). If it returns None this isconsidered the same as if the method was not defined.

If this method is defined and does not return None, then theOp must have a params_type property with the Type to usefor the params variable.

  • _f16_ok
  • (optional) If this attribute is absent or evaluates to False,C code will be disabled for the op if any of its inputs oroutputs contains float16 data. This is added as a check to makesure we don’t compute wrong results since there is no hardwarefloat16 type so special care must be taken to make sureoperations are done correctly.

If you don’t intend to deal with float16 data you can leavethis undefined.

This attribute is internal and may go away at any point duringdeveloppment if a better solution is found.

The name argument is currently given an invalid value, so steeraway from it. As was the case with Type, sub['fail'] providesfailure code that you must use if you want to raise an exception,after setting the exception message.

The node argument is an Apply node representing anapplication of the current Op on a list of inputs, producing a list ofoutputs. input_names and output_names arguments contain asmany strings as there are inputs and outputs to the application of theOp and they correspond to the name that is passed to the type ofeach Variable in these lists. For example, if node.inputs[0].type == double, then input_names[0] is the name argument passed todouble.c_declare etc. when the first input is processed by Theano.

In a nutshell, input_names and output_names parameterize thenames of the inputs your operation needs to use and the outputs itneeds to put variables into. But this will be clear with the examples.

Footnotes

[1](1, 2, 3, 4, 5, 6) There are actually two versions of this method one with ac_compiler parameter and one without. The calling code willtry the version with ccompiler and try the version withoutif it does not work. Defining both versions is pointlesssince the one without _c_compiler will never get called. Note that these methods are not specific to a single applynode so they may get called more than once on the same objectwith different values for c_compiler.

Defining the methods

We will be defining C code for the multiplication Op on doubles.

c_code

  1. def c_code(node, name, input_names, output_names, sub):
  2. x_name, y_name = input_names[0], input_names[1]
  3. output_name = output_names[0]
  4. return """
  5. %(output_name)s = %(x_name)s * %(y_name)s;
  6. """ % locals()
  7. mul.c_code = c_code

And that’s it. As we enter the scope of the C code we are defining inthe method above, many variables are defined for us. Namely, thevariables x_name, y_name and output_name are all of the primitive Cdouble type and they were declared using the C code returned bydouble.c_declare.

Implementing multiplication is as simple as multiplying the two inputdoubles and setting the output double to what comes out of it. If youhad more than one output, you would just set the variable(s) foreach output to what they should be.

Warning

Do NOT use C’s return statement to return the variable(s) ofthe computations. Set the output variables directly as shownabove. Theano will pick them up for you.

c_code_cleanup

There is nothing to cleanup after multiplying two doubles. Typically,you won’t need to define this method unless you malloc() sometemporary storage (which you would free() here) or create temporaryPython objects (which you would Py_XDECREF() here).

Final version

As before, I tried to organize the code in order to minimizerepetition. You can check that mul produces the same C code in thisversion that it produces in the code I gave above.

  1. from theano import gof
  2.  
  3. class BinaryDoubleOp(gof.Op):
  4.  
  5. __props__ = ("name", "fn", "ccode")
  6.  
  7. def __init__(self, name, fn, ccode):
  8. self.name = name
  9. self.fn = fn
  10. self.ccode = ccode
  11.  
  12. def make_node(self, x, y):
  13. if isinstance(x, (int, float)):
  14. x = gof.Constant(double, x)
  15. if isinstance(y, (int, float)):
  16. y = gof.Constant(double, y)
  17. if x.type != double or y.type != double:
  18. raise TypeError('%s only works on doubles' % self.name)
  19. return gof.Apply(self, [x, y], [double()])
  20.  
  21. def perform(self, node, inp, out):
  22. x, y = inp
  23. z, = out
  24. z[0] = self.fn(x, y)
  25.  
  26. def __str__(self):
  27. return self.name
  28.  
  29. def c_code(self, node, name, inp, out, sub):
  30. x, y = inp
  31. z, = out
  32. return self.ccode % locals()
  33.  
  34.  
  35. add = BinaryDoubleOp(name='add',
  36. fn=lambda x, y: x + y,
  37. ccode="%(z)s = %(x)s + %(y)s;")
  38.  
  39. sub = BinaryDoubleOp(name='sub',
  40. fn=lambda x, y: x - y,
  41. ccode="%(z)s = %(x)s - %(y)s;")
  42.  
  43. mul = BinaryDoubleOp(name='mul',
  44. fn=lambda x, y: x * y,
  45. ccode="%(z)s = %(x)s * %(y)s;")
  46.  
  47. div = BinaryDoubleOp(name='div',
  48. fn=lambda x, y: x / y,
  49. ccode="%(z)s = %(x)s / %(y)s;")