Implementing double in C

The previous two sections described how to define a double Typeand arithmetic operations on that Type, but all of them wereimplemented in pure Python. In this section we will see how to definethe double type in such a way that it can be used by operationsimplemented in C (which we will define in the section after that).

How does it work?

In order to be C-compatible, a Type must provide a C interface to thePython data that satisfy the constraints it puts forward. In otherwords, it must define C code that can convert a Python reference intosome type suitable for manipulation in C and it must define C codethat can convert some C structure in which the C implementation of anoperation stores its variables into a reference to an object that can beused from Python and is a valid value for the Type.

For example, in the current example, we have a Type which represents aPython float. First, we will choose a corresponding C type. Thenatural choice would be the primitive double type. Then, we needto write code that will take a PyObject, check that it is aPython float and extract its value as a double. Finally, weneed to write code that will take a C double and will build aPyObject of Python type float that we can work with fromPython. We will be using CPython and thus special care must be givento making sure reference counts are updated properly!

The C code we will write makes use of CPython’s C API which you canfind here.

What needs to be defined

In order to be C-compatible, a Type must define several additionalmethods, which all start with the c_ prefix. The complete list canbe found in the documentation for gof.type.Type. Here, we’ll focus onthe most important ones:

  • class CLinkerType
    • cdeclare(_name, sub, check_input=True)
    • This must return C code which declares variables. These variableswill be available to operations defined in C. You may also writetypedefs.

    • cinit(_name, sub)

    • This must return C code which initializes the variables declared inc_declare. Either this or c_extract will be called.

    • cextract(_name, sub, check_input=True)

    • This must return C code which takes a reference to a Python objectand initializes the variables declared in c_declare to match thePython object’s data. Either this or c_init will be called.

    • csync(_name, sub)

    • When the computations are done, transfer the variables from the Cstructure we put them in to the destination Python object. This willonly be called for the outputs.

    • ccleanup(_name, sub)

    • When we are done using the data, clean up whatever we allocated anddecrease the appropriate reference counts.

    • cheaders([_c_compiler])

    • clibraries([_c_compiler])
    • cheader_dirs([_c_compiler])
    • clib_dirs([_c_compiler])
    • Allows you to specify headers, libraries and associated directories.

These methods have two versions, one with a _c_compiler_argument and one without. The version with c_compiler is triedfirst and if it doesn’t work, the one without is.

The c_compiler argument is the C compiler that will be usedto compile the C code for the node that uses this type.

  • ccompile_args([_c_compiler])
  • cno_compile_args([_c_compiler])
  • Allows to specify special compiler arguments to add/exclude.

These methods have two versions, one with a _c_compiler_argument and one without. The version with c_compiler is triedfirst and if it doesn’t work, the one without is.

The c_compiler argument is the C compiler that will be usedto compile the C code for the node that uses this type.

  • c_init_code()
  • Allows you to specify code that will be executed once when themodule is initialized, before anything else is executed.For instance, if a type depends on NumPy’s C API, then'import_array();' has to be among the snippets returnedby c_init_code().

  • c_support_code()

  • Allows to add helper functions/structs that the Type needs.

  • c_compiler()

  • Allows to specify a special compiler. This will force this compilerfor the current compilation block (a particular op or the full graph).This is used for the GPU code.

  • c_code_cache_version()

  • Should return a tuple of hashable objects like integers. Thisspecifies the version of the code. It is used to cache thecompiled code. You MUST change the returned tuple for eachchange in the code. If you don’t want to cache the compiled codereturn an empty tuple or don’t implement it.

Each of these functions take two arguments, name and sub whichmust be used to parameterize the C code they return. name is astring which is chosen by the compiler to represent a Variable ofthe Type in such a way that there are no name conflicts betweendifferent pieces of data. Therefore, all variables declared incdeclare should have a name which includes name. Furthermore,the name of the variable containing a pointer to the Python objectassociated to the Variable is py<name>.

sub, on the other hand, is a dictionary containing bits of C codesuitable for use in certain situations. For instance, sub['fail']contains code that should be inserted wherever an error is identified.

c_declare and c_extract also accept a third check_inputoptional argument. If you want your type to validate its inputs, it mustonly do it when check_input is True.

The example code below should help you understand how everything playsout:

Warning

If some error condition occurs and you want to fail and/or raise anException, you must use the fail code contained insub['fail'] (there is an example in the definition of cextractbelow). You must _NOT use the return statement anywhere, ever,nor break outside of your own loops or goto to strangeplaces or anything like that. Failure to comply with thisrestriction could lead to erratic behavior, segfaults and/or memoryleaks because Theano defines its own cleanup system and assumesthat you are not meddling with it. Furthermore, advanced operationsor types might do code transformations on your code such asinserting it in a loop – in that case they can call yourcode-generating methods with custom failure code that takes into accountwhat they are doing!

Defining the methods

c_declare

  1. def c_declare(name, sub):
  2. return """
  3. double %(name)s;
  4. """ % dict(name = name)
  5. double.c_declare = c_declare

Very straightforward. All we need to do is write C code to declare adouble. That double will be named whatever is passed to our functionin the name argument. That will usually be some mangled name like“V0”, “V2” or “V92” depending on how many nodes there are in thecomputation graph and what rank the current node has. This functionwill be called for all Variables whose type is double.

You can declare as many variables as you want there and you can alsodo typedefs. Make sure that the name of each variable contains thename argument in order to avoid name collisions (collisions will_happen if you don’t parameterize the variable names as indicatedhere). Also note that you cannot declare a variable calledpy<name> or storage_<name> because Theano already definesthem.

What you declare there is basically the C interface you are giving toyour Type. If you wish people to develop operations that make use ofit, it’s best to publish it somewhere.

c_init

  1. def c_init(name, sub):
  2. return """
  3. %(name)s = 0.0;
  4. """ % dict(name = name)
  5. double.c_init = c_init

This function has to initialize thedouble we declared previously to a suitable value. This is useful ifwe want to avoid dealing with garbage values, especially if our datatype is a pointer. This is not going to be called for all Variables withthe double type. Indeed, if a Variable is an input that we passfrom Python, we will want to extract that input from a Python object,therefore it is the c_extract method that will be called instead ofc_init. You can therefore not assume, when writing c_extract, that theinitialization has been done (in fact you can assume that it _hasn’t_been done).

c_init will typically be called on output Variables, but in generalyou should only assume that either c_init or c_extract has beencalled, without knowing for sure which of the two.

c_extract

  1. def c_extract(name, sub):
  2. return """
  3. if (!PyFloat_Check(py_%(name)s)) {
  4. PyErr_SetString(PyExc_TypeError, "expected a float");
  5. %(fail)s
  6. }
  7. %(name)s = PyFloat_AsDouble(py_%(name)s);
  8. """ % dict(name = name, fail = sub['fail'])
  9. double.c_extract = c_extract

This method is slightly more sophisticated. What happens here is thatwe have a reference to a Python object which Theano has placed inpy%(name)s where %(name)s must be substituted for the namegiven in the inputs. This special variable is declared by Theano asPyObject* py%(name)s where PyObject* is a pointer to a Pythonobject as defined by CPython’s C API. This is the reference thatcorresponds, on the Python side of things, to a Variable with thedouble type. It is what the end user will give and what he or sheexpects to get back.

In this example, the user will give a Python float. The firstthing we should do is verify that what we got is indeed a Pythonfloat. The PyFloat_Check function is provided by CPython’s CAPI and does this for us. If the check fails, we set an exception andthen we insert code for failure. The code for failure is insub["fail"] and it basically does a goto to cleanup code.

If the check passes then we convert the Python float into a doubleusing the PyFloat_AsDouble function (yet again provided by CPython’s CAPI) and we put it in our double variable that we declared previously.

c_sync

  1. def c_sync(name, sub):
  2. return """
  3. Py_XDECREF(py_%(name)s);
  4. py_%(name)s = PyFloat_FromDouble(%(name)s);
  5. if (!py_%(name)s) {
  6. printf("PyFloat_FromDouble failed on: %%f\\n", %(name)s);
  7. Py_XINCREF(Py_None);
  8. py_%(name)s = Py_None;
  9. }
  10. """ % dict(name = name)
  11. double.c_sync = c_sync

This function is probably the trickiest. What happens here is that wehave computed some operation on doubles and we have put the variableinto the double variable %(name)s. Now, we need to put this datainto a Python object that we can manipulate on the Python side ofthings. This Python object must be put into the py_%(name)svariable which Theano recognizes (this is the same pointer we get inc_extract).

Now, that pointer is already a pointer to a valid Python object(unless you or a careless implementer did terribly wrong things withit). If we want to point to another object, we need to tell Pythonthat we don’t need the old one anymore, meaning that we need todecrease the previous object’s reference count. The first line,PyXDECREF(py%(name)s) does exactly this. If it is forgotten,Python will not be able to reclaim the data even if it is not usedanymore and there will be memory leaks! This is especially importantif the data you work on is large.

Now that we have decreased the reference count, we callPyFloatFromDouble on our double variable in order to convert itto a Python float. This returns a new reference which we assign topy%(name)s. From there Theano will do the rest and the end userwill happily see a Python float come out of his computations.

The rest of the code is not absolutely necessary and it is basically“good practice”. PyFloatFromDouble can return NULL on failure.NULL is a pretty bad reference to have and neither Python nor Theanolike it. If this happens, we change the NULL pointer (which willcause us problems) to a pointer to None (which is _not a NULLpointer). Since None is an object like the others, we need toincrease its reference count before we can set a new pointer to it. Thissituation is unlikely to ever happen, but if it ever does, better safethan sorry.

Warning

I said this already but it really needs to be emphasized that ifyou are going to change the py%(name)s pointer to point to anew reference, you _must decrease the reference count of whateverit was pointing to before you do the change. This is only valid ifyou change the pointer, if you are not going to change the pointer,do NOT decrease its reference count!

c_cleanup

  1. def c_cleanup(name, sub):
  2. return ""
  3. double.c_cleanup = c_cleanup

We actually have nothing to do here. We declared a double on the stackso the C language will reclaim it for us when its scope ends. Wedidn’t malloc() anything so there’s nothing to free(). Furthermore,the py_%(name)s pointer hasn’t changed so we don’t need to doanything with it. Therefore, we have nothing to cleanup. Sweet!

There are however two important things to keep in mind:

First, note that c_sync and c_cleanup might be called insequence, so they need to play nice together. In particular, let’ssay that you allocate memory in c_init or c_extract for somereason. You might want to either embed what you allocated to some Pythonobject in c_sync or to free it in c_cleanup. If you do theformer, you don’t want to free the allocated storage so you should setthe pointer to it to NULL to avoid that c_cleanup mistakenlyfrees it. Another option is to declare a variable in c_declare thatyou set to true in c_sync to notify c_cleanup that c_syncwas called.

Second, whenever you use %(fail)s in c_extract or in the code of anoperation, you can count on c_cleanup being called rightafter that. Therefore, it’s important to make sure that c_cleanupdoesn’t depend on any code placed after a reference to%(fail)s. Furthermore, because of the way Theano blocks code together,only the variables declared in c_declare will be visible in c_cleanup!

What the generated C will look like

c_init and c_extract will only be called if there is a Pythonobject on which we want to apply computations using Ccode. Conversely, c_sync will only be called if we want tocommunicate the values we have computed to Python, and c_cleanupwill only be called when we don’t need to process the data with Canymore. In other words, the use of these functions for a given Variabledepends on the the relationship between Python and C with respect tothat Variable. For instance, imagine you define the following functionand call it:

  1. x, y, z = double('x'), double('y'), double('z')
  2. a = add(x, y)
  3. b = mul(a, z)
  4. f = function([x, y, z], b)
  5. f(1.0, 2.0, 3.0)

Using the CLinker, the code that will be produced will look roughlylike this:

  1. // BEGIN defined by Theano
  2. PyObject* py_x = ...;
  3. PyObject* py_y = ...;
  4. PyObject* py_z = ...;
  5. PyObject* py_a = ...; // note: this reference won't actually be used for anything
  6. PyObject* py_b = ...;
  7. // END defined by Theano
  8.  
  9. {
  10. double x; //c_declare for x
  11. x = ...; //c_extract for x
  12. {
  13. double y; //c_declare for y
  14. y = ...; //c_extract for y
  15. {
  16. double z; //c_declare for z
  17. z = ...; //c_extract for z
  18. {
  19. double a; //c_declare for a
  20. a = 0; //c_init for a
  21. {
  22. double b; //c_declare for b
  23. b = 0; //c_init for b
  24. {
  25. a = x + y; //c_code for add
  26. {
  27. b = a * z; //c_code for mul
  28. labelmul:
  29. //c_cleanup for mul
  30. }
  31. labeladd:
  32. //c_cleanup for add
  33. }
  34. labelb:
  35. py_b = ...; //c_sync for b
  36. //c_cleanup for b
  37. }
  38. labela:
  39. //c_cleanup for a
  40. }
  41. labelz:
  42. //c_cleanup for z
  43. }
  44. labely:
  45. //c_cleanup for y
  46. }
  47. labelx:
  48. //c_cleanup for x
  49. }

It’s not pretty, but it gives you an idea of how things work (note thatthe variable names won’t be x, y, z, etc. - they willget a unique mangled name). The fail code runs a goto to theappropriate label in order to run all cleanup that needs to bedone. Note which variables get extracted (the three inputs x, y andz), which ones only get initialized (the temporary variable a and theoutput b) and which one is synced (the final output b).

The C code above is a single C block for the whole graph. Depending onwhich linker is used to process the computation graph, it ispossible that one such block is generated for each operation and thatwe transit through Python after each operation. In that situation,a would be synced by the addition block and extracted by themultiplication block.

Final version

  1. from theano import gof
  2.  
  3. class Double(gof.Type):
  4.  
  5. def filter(self, x, strict=False, allow_downcast=None):
  6. if strict and not isinstance(x, float):
  7. raise TypeError('Expected a float!')
  8. return float(x)
  9.  
  10. def values_eq_approx(self, x, y, tolerance=1e-4):
  11. return abs(x - y) / (x + y) < tolerance
  12.  
  13. def __str__(self):
  14. return "double"
  15.  
  16. def c_declare(self, name, sub):
  17. return """
  18. double %(name)s;
  19. """ % dict(name = name)
  20.  
  21. def c_init(self, name, sub):
  22. return """
  23. %(name)s = 0.0;
  24. """ % dict(name = name)
  25.  
  26. def c_extract(self, name, sub):
  27. return """
  28. if (!PyFloat_Check(py_%(name)s)) {
  29. PyErr_SetString(PyExc_TypeError, "expected a float");
  30. %(fail)s
  31. }
  32. %(name)s = PyFloat_AsDouble(py_%(name)s);
  33. """ % dict(sub, name = name)
  34.  
  35. def c_sync(self, name, sub):
  36. return """
  37. Py_XDECREF(py_%(name)s);
  38. py_%(name)s = PyFloat_FromDouble(%(name)s);
  39. if (!py_%(name)s) {
  40. printf("PyFloat_FromDouble failed on: %%f\\n", %(name)s);
  41. Py_XINCREF(Py_None);
  42. py_%(name)s = Py_None;
  43. }
  44. """ % dict(name = name)
  45.  
  46. def c_cleanup(self, name, sub):
  47. return ""
  48.  
  49. double = Double()

DeepCopyOp

We have an internal Op called DeepCopyOp. It is used to make sure werespect the user vs Theano memory region as described in the tutorial. Theano has a Python implementation that calls the object’scopy() or deepcopy() method for Theano types for which it does notknow how to generate C code.

You can implement c_code for this op. You register it like this:

  1. theano.compile.ops.register_deep_copy_op_c_code(YOUR_TYPE_CLASS, THE_C_CODE, version=())

In your C code, you should use %(iname)s and %(oname)s to representthe C variable names of the DeepCopyOp input and outputrespectively. See an example for the type CudaNdarrayType (GPUarray) in the file theano/sandbox/cuda/type.py. The versionparameter is what is returned by DeepCopyOp.c_code_cache_version(). Bydefault, it will recompile the c code for each process.

ViewOp

We have an internal Op called ViewOp. It is used for someverification of inplace/view Ops. Its C implementation increments anddecrements Python reference counts, and thus only works with Pythonobjects. If your new type represents Python objects, you should tellViewOp to generate C code when working with this type, asotherwise it will use Python code instead. This is achieved bycalling:

  1. theano.compile.ops.register_view_op_c_code(YOUR_TYPE_CLASS, THE_C_CODE, version=())

In your C code, you should use %(iname)s and %(oname)s to representthe C variable names of the ViewOp input and outputrespectively. See an example for the type CudaNdarrayType (GPUarray) in the file theano/sandbox/cuda/type.py. The versionparameter is what is returned by ViewOp.c_code_cache_version(). Bydefault, it will recompile the c code for each process.

Shape and Shape_i

We have 2 generic Ops, Shape and Shape_i, that return the shape of anyTheano Variable that has a shape attribute (Shape_i returns only one ofthe elements of the shape).

  1. theano.compile.ops.register_shape_c_code(YOUR_TYPE_CLASS, THE_C_CODE, version=())
  2. theano.compile.ops.register_shape_i_c_code(YOUR_TYPE_CLASS, THE_C_CODE, CHECK_INPUT, version=())

The C code works as the ViewOp. Shape_i has the additional i parameterthat you can use with %(i)s.

In your CHECK_INPUT, you must check that the input has enough dimensions tobe able to access the i-th one.