Loading and Saving

Python’s standard way of saving class instances and reloading themis the pickle mechanism. Many Theano objects can be serialized (anddeserialized) by pickle, however, a limitation of pickle is thatit does not save the code or data of a class along with the instance ofthe class being serialized. As a result, reloading objects created by aprevious version of a class can be really problematic.

Thus, you will want to consider different mechanisms depending onthe amount of time you anticipate between saving and reloading. Forshort-term (such as temp files and network transfers), pickling ofthe Theano objects or classes is possible. For longer-term (such assaving models from an experiment) you should not rely on pickled Theanoobjects; we recommend loading and saving the underlying shared objectsas you would in the course of any other Python program.

The Basics of Pickling

The two modules pickle and cPickle have the same functionalities, butcPickle, coded in C, is much faster.

  1. >>> from six.moves import cPickle

You can serialize (or save, or pickle) objects to a file withcPickle.dump:

  1. >>> f = open('obj.save', 'wb')
  2. >>> cPickle.dump(my_obj, f, protocol=cPickle.HIGHEST_PROTOCOL)
  3. >>> f.close()

Note

If you want your saved object to be stored efficiently, don’t forgetto use cPickle.HIGHEST_PROTOCOL. The resulting file can bedozens of times smaller than with the default protocol.

Note

Opening your file in binary mode ('b') is required for portability(especially between Unix and Windows).

To de-serialize (or load, or unpickle) a pickled file, usecPickle.load:

  1. >>> f = open('obj.save', 'rb')
  2. >>> loaded_obj = cPickle.load(f)
  3. >>> f.close()

You can pickle several objects into the same file, and load them all (in thesame order):

  1. >>> f = open('objects.save', 'wb')
  2. >>> for obj in [obj1, obj2, obj3]:
  3. ... cPickle.dump(obj, f, protocol=cPickle.HIGHEST_PROTOCOL)
  4. >>> f.close()

Then:

  1. >>> f = open('objects.save', 'rb')
  2. >>> loaded_objects = []
  3. >>> for i in range(3):
  4. ... loaded_objects.append(cPickle.load(f))
  5. >>> f.close()

For more details about pickle’s usage, seePython documentation.

Short-Term Serialization

If you are confident that the class instance you are serializing will bedeserialized by a compatible version of the code, pickling the whole model isan adequate solution. It would be the case, for instance, if you are savingmodels and reloading them during the same execution of your program, or if theclass you’re saving has been really stable for a while.

You can control what pickle will save from your object, by defining agetstate method,and similarly setstate.

This will be especially useful if, for instance, your model class contains alink to the data set currently in use, that you probably don’t want to picklealong every instance of your model.

For instance, you can define functions along the lines of:

  1. def __getstate__(self):
  2. state = dict(self.__dict__)
  3. del state['training_set']
  4. return state
  5.  
  6. def __setstate__(self, d):
  7. self.__dict__.update(d)
  8. self.training_set = cPickle.load(open(self.training_set_file, 'rb'))

Robust Serialization

This type of serialization uses some helper functions particular to Theano. Itserializes the object using Python’s pickling protocol, but any ndarray orCudaNdarray objects contained within the object are saved separately as NPYfiles. These NPY files and the Pickled file are all saved together in singleZIP-file.

The main advantage of this approach is that you don’t even need Theano installedin order to look at the values of shared variables that you pickled. You canjust load the parameters manually with numpy.

  1. import numpy
  2. numpy.load('model.zip')

This approach could be beneficial if you are sharing your model with people whomight not have Theano installed, who are using a different Python version, or ifyou are planning to save your model for a long time (in which case versionmismatches might make it difficult to unpickle objects).

See theano.misc.pkl_utils.dump() and theano.misc.pkl_utils.load().

Long-Term Serialization

If the implementation of the class you want to save is quite unstable, forinstance if functions are created or removed, class members are renamed, youshould save and load only the immutable (and necessary) part of your class.

You can do that by defining getstate and setstate functions as above,maybe defining the attributes you want to save, rather than the ones youdon’t.

For instance, if the only parameters you want to save are a weightmatrix W and a bias b, you can define:

  1. def __getstate__(self):
  2. return (self.W, self.b)
  3.  
  4. def __setstate__(self, state):
  5. W, b = state
  6. self.W = W
  7. self.b = b

If at some point in time W is renamed to weights and b tobias, the older pickled files will still be usable, if you update thesefunctions to reflect the change in name:

  1. def __getstate__(self):
  2. return (self.weights, self.bias)
  3.  
  4. def __setstate__(self, state):
  5. W, b = state
  6. self.weights = W
  7. self.bias = b

For more information on advanced use of pickle and its internals, see Python’spickle documentation.