More Examples

At this point it would be wise to begin familiarizing yourself moresystematically with Theano’s fundamental objects and operations bybrowsing this section of the library: Basic Tensor Functionality.

As the tutorial unfolds, you should also gradually acquaint yourselfwith the other relevant areas of the library and with the relevantsubjects of the documentation entrance page.

Logistic Function

Here’s another straightforward example, though a bit more elaboratethan adding two numbers together. Let’s say that you want to computethe logistic curve, which is given by:

s(x) = \frac{1}{1 + e^{-x}}

../_images/logistic.png A plot of the logistic function, with x on the x-axis and s(x) on they-axis.

You want to compute the function elementwise on matrices of doubles, which means thatyou want to apply this function to each individual element of thematrix.

Well, what you do is this:

  1. >>> import theano
  2. >>> import theano.tensor as T
  3. >>> x = T.dmatrix('x')
  4. >>> s = 1 / (1 + T.exp(-x))
  5. >>> logistic = theano.function([x], s)
  6. >>> logistic([[0, 1], [-1, -2]])
  7. array([[ 0.5 , 0.73105858],
  8. [ 0.26894142, 0.11920292]])

The reason logistic is performed elementwise is because all of itsoperations—division, addition, exponentiation, and division—arethemselves elementwise operations.

It is also the case that:

s(x) = \frac{1}{1 + e^{-x}} = \frac{1 + \tanh(x/2)}{2}

We can verify that this alternate form produces the same values:

  1. >>> s2 = (1 + T.tanh(x / 2)) / 2
  2. >>> logistic2 = theano.function([x], s2)
  3. >>> logistic2([[0, 1], [-1, -2]])
  4. array([[ 0.5 , 0.73105858],
  5. [ 0.26894142, 0.11920292]])

Computing More than one Thing at the Same Time

Theano supports functions with multiple outputs. For example, we cancompute the elementwise difference, absolute difference, andsquared difference between two matrices a and b at the same time:

  1. >>> a, b = T.dmatrices('a', 'b')
  2. >>> diff = a - b
  3. >>> abs_diff = abs(diff)
  4. >>> diff_squared = diff**2
  5. >>> f = theano.function([a, b], [diff, abs_diff, diff_squared])

Note

dmatrices produces as many outputs as names that you provide. It is ashortcut for allocating symbolic variables that we will often use in thetutorials.

When we use the function f, it returns the three variables (the printingwas reformatted for readability):

  1. >>> f([[1, 1], [1, 1]], [[0, 1], [2, 3]])
  2. [array([[ 1., 0.],
  3. [-1., -2.]]), array([[ 1., 0.],
  4. [ 1., 2.]]), array([[ 1., 0.],
  5. [ 1., 4.]])]

Setting a Default Value for an Argument

Let’s say you want to define a function that adds two numbers, exceptthat if you only provide one number, the other input is assumed to beone. You can do it like this:

  1. >>> from theano import In
  2. >>> from theano import function
  3. >>> x, y = T.dscalars('x', 'y')
  4. >>> z = x + y
  5. >>> f = function([x, In(y, value=1)], z)
  6. >>> f(33)
  7. array(34.0)
  8. >>> f(33, 2)
  9. array(35.0)

This makes use of the In class which allowsyou to specify properties of your function’s parameters with greater detail. Here wegive a default value of 1 for y by creating a In instance withits value field set to 1.

Inputs with default values must follow inputs without defaultvalues (like Python’s functions). There can be multiple inputs with default values. These parameters canbe set positionally or by name, as in standard Python:

  1. >>> x, y, w = T.dscalars('x', 'y', 'w')
  2. >>> z = (x + y) * w
  3. >>> f = function([x, In(y, value=1), In(w, value=2, name='w_by_name')], z)
  4. >>> f(33)
  5. array(68.0)
  6. >>> f(33, 2)
  7. array(70.0)
  8. >>> f(33, 0, 1)
  9. array(33.0)
  10. >>> f(33, w_by_name=1)
  11. array(34.0)
  12. >>> f(33, w_by_name=1, y=0)
  13. array(33.0)

Note

In does not know the name of the local variables y and w_that are passed as arguments. The symbolic variable objects have nameattributes (set by dscalars in the example above) and _these are thenames of the keyword parameters in the functions that we build. This isthe mechanism at work in In(y, value=1). In the case of In(w, value=2, name='w_by_name'). We override the symbolic variable’s nameattribute with a name to be used for this function.

You may like to see Function in the library for more detail.

Using Shared Variables

It is also possible to make a function with an internal state. Forexample, let’s say we want to make an accumulator: at the beginning,the state is initialized to zero. Then, on each function call, the stateis incremented by the function’s argument.

First let’s define the accumulator function. It adds its argument to theinternal state, and returns the old state value.

  1. >>> from theano import shared
  2. >>> state = shared(0)
  3. >>> inc = T.iscalar('inc')
  4. >>> accumulator = function([inc], state, updates=[(state, state+inc)])

This code introduces a few new concepts. The shared function constructsso-called shared variables.These are hybrid symbolic and non-symbolic variables whose value may be sharedbetween multiple functions. Shared variables can be used in symbolic expressions just likethe objects returned by dmatrices(…) but they also have an internalvalue that defines the value taken by this symbolic variable in all thefunctions that use it. It is called a shared variable because its value isshared between many functions. The value can be accessed and modified by the.get_value() and .set_value() methods. We will come back to this soon.

The other new thing in this code is the updates parameter of function.updates must be supplied with a list of pairs of the form (shared-variable, new expression).It can also be a dictionary whose keys are shared-variables and values arethe new expressions. Either way, it means “whenever this function runs, itwill replace the .value of each shared variable with the result of thecorresponding expression”. Above, our accumulator replaces the state‘s value with the sumof the state and the increment amount.

Let’s try it out!

  1. >>> print(state.get_value())
  2. 0
  3. >>> accumulator(1)
  4. array(0)
  5. >>> print(state.get_value())
  6. 1
  7. >>> accumulator(300)
  8. array(1)
  9. >>> print(state.get_value())
  10. 301

It is possible to reset the state. Just use the .set_value() method:

  1. >>> state.set_value(-1)
  2. >>> accumulator(3)
  3. array(-1)
  4. >>> print(state.get_value())
  5. 2

As we mentioned above, you can define more than one function to use the sameshared variable. These functions can all update the value.

  1. >>> decrementor = function([inc], state, updates=[(state, state-inc)])
  2. >>> decrementor(2)
  3. array(2)
  4. >>> print(state.get_value())
  5. 0

You might be wondering why the updates mechanism exists. You can alwaysachieve a similar result by returning the new expressions, and working withthem in NumPy as usual. The updates mechanism can be a syntactic convenience,but it is mainly there for efficiency. Updates to shared variables cansometimes be done more quickly using in-place algorithms (e.g. low-rank matrixupdates). Also, Theano has more control over where and how shared variables areallocated, which is one of the important elements of getting good performanceon the GPU.

It may happen that you expressed some formula using a shared variable, butyou do not want to use its value. In this case, you can use thegivens parameter of function which replaces a particular node in a graphfor the purpose of one particular function.

  1. >>> fn_of_state = state * 2 + inc
  2. >>> # The type of foo must match the shared variable we are replacing
  3. >>> # with the ``givens``
  4. >>> foo = T.scalar(dtype=state.dtype)
  5. >>> skip_shared = function([inc, foo], fn_of_state, givens=[(state, foo)])
  6. >>> skip_shared(1, 3) # we're using 3 for the state, not state.value
  7. array(7)
  8. >>> print(state.get_value()) # old state still there, but we didn't use it
  9. 0

The givens parameter can be used to replace any symbolic variable, not just ashared variable. You can replace constants, and expressions, in general. Becareful though, not to allow the expressions introduced by a givenssubstitution to be co-dependent, the order of substitution is not defined, sothe substitutions have to work in any order.

In practice, a good way of thinking about the givens is as a mechanismthat allows you to replace any part of your formula with a differentexpression that evaluates to a tensor of same shape and dtype.

Note

Theano shared variable broadcast pattern default to False for eachdimensions. Shared variable size can change over time, so we can’tuse the shape to find the broadcastable pattern. If you want adifferent pattern, just pass it as a parametertheano.shared(…, broadcastable=(True, False))

Copying functions

Theano functions can be copied, which can be useful for creating similarfunctions but with different shared variables or updates. This is done usingthe copy() method of function objects. The optimized graph of the original function is copied,so compilation only needs to be performed once.

Let’s start from the accumulator defined above. Let’s add the on_unused_input='ignore' parameter in case we don’t want to use both of our current arguments in a future copy of the function (this isn’t necessary on versions > 0.8.2):

  1. >>> import theano
  2. >>> import theano.tensor as T
  3. >>> state = theano.shared(0)
  4. >>> inc = T.iscalar('inc')
  5. >>> accumulator = theano.function([inc], state, updates=[(state, state+inc)], on_unused_input='ignore')

We can use it to increment the state as usual:

  1. >>> accumulator(10)
  2. array(0)
  3. >>> print(state.get_value())
  4. 10

We can use copy() to create a similar accumulator but with its own internal stateusing the swap parameter, which is a dictionary of shared variables to exchange:

  1. >>> new_state = theano.shared(0)
  2. >>> new_accumulator = accumulator.copy(swap={state:new_state})
  3. >>> new_accumulator(100)
  4. [array(0)]
  5. >>> print(new_state.get_value())
  6. 100

The state of the first function is left untouched:

  1. >>> print(state.get_value())
  2. 10

We now create a copy with updates removed using the delete_updatesparameter, which is set to False by default. Notice our new copy doesn’t actually use the inc argument after removing the updates parameter:

  1. >>> null_accumulator = accumulator.copy(delete_updates=True)

As expected, the shared state is no longer updated:

  1. >>> null_accumulator(9000)
  2. [array(10)]
  3. >>> print(state.get_value())
  4. 10

Using Random Numbers

Because in Theano you first express everything symbolically andafterwards compile this expression to get functions,using pseudo-random numbers is not as straightforward as it is inNumPy, though also not too complicated.

The way to think about putting randomness into Theano’s computations isto put random variables in your graph. Theano will allocate a NumPyRandomStream object (a random number generator) for each suchvariable, and draw from it as necessary. We will call this sort ofsequence of random numbers a random stream. Random streams are attheir core shared variables, so the observations on shared variableshold here as well. Theanos’s random objects are defined and implemented inRandomStreams and, at a lower level,in RandomStreamsBase.

Brief Example

Here’s a brief example. The setup code is:

  1. from theano.tensor.shared_randomstreams import RandomStreams
  2. from theano import function
  3. srng = RandomStreams(seed=234)
  4. rv_u = srng.uniform((2,2))
  5. rv_n = srng.normal((2,2))
  6. f = function([], rv_u)
  7. g = function([], rv_n, no_default_updates=True) #Not updating rv_n.rng
  8. nearly_zeros = function([], rv_u + rv_u - 2 * rv_u)

Here, ‘rv_u’ represents a random stream of 2x2 matrices of draws from a uniformdistribution. Likewise, ‘rv_n’ represents a random stream of 2x2 matrices ofdraws from a normal distribution. The distributions that are implemented aredefined in RandomStreams and, at a lower level,in raw_random. They only work on CPU.See Other Implementations for GPU version.

Now let’s use these objects. If we call f(), we get random uniform numbers.The internal state of the random number generator is automatically updated,so we get different random numbers every time.

  1. >>> f_val0 = f()
  2. >>> f_val1 = f() #different numbers from f_val0

When we add the extra argument nodefault_updates=True tofunction (as in _g), then the random number generator state isnot affected by calling the returned function. So, for example, callingg multiple times will return the same numbers.

  1. >>> g_val0 = g() # different numbers from f_val0 and f_val1
  2. >>> g_val1 = g() # same numbers as g_val0!

An important remark is that a random variable is drawn at most once during anysingle function execution. So the nearly_zeros function is guaranteed toreturn approximately 0 (except for rounding error) even though the _rv_u_random variable appears three times in the output expression.

  1. >>> nearly_zeros = function([], rv_u + rv_u - 2 * rv_u)

Seeding Streams

Random variables can be seeded individually or collectively.

You can seed just one random variable by seeding or assigning to the.rng attribute, using .rng.set_value().

  1. >>> rng_val = rv_u.rng.get_value(borrow=True) # Get the rng for rv_u
  2. >>> rng_val.seed(89234) # seeds the generator
  3. >>> rv_u.rng.set_value(rng_val, borrow=True) # Assign back seeded rng

You can also seed all of the random variables allocated by a RandomStreamsobject by that object’s seed method. This seed will be used to seed atemporary random number generator, that will in turn generate seeds for eachof the random variables.

  1. >>> srng.seed(902340) # seeds rv_u and rv_n with different seeds each

Sharing Streams Between Functions

As usual for shared variables, the random number generators used for randomvariables are common between functions. So our nearly_zeros function willupdate the state of the generators used in function f above.

For example:

  1. >>> state_after_v0 = rv_u.rng.get_value().get_state()
  2. >>> nearly_zeros() # this affects rv_u's generator
  3. array([[ 0., 0.],
  4. [ 0., 0.]])
  5. >>> v1 = f()
  6. >>> rng = rv_u.rng.get_value(borrow=True)
  7. >>> rng.set_state(state_after_v0)
  8. >>> rv_u.rng.set_value(rng, borrow=True)
  9. >>> v2 = f() # v2 != v1
  10. >>> v3 = f() # v3 == v1

Copying Random State Between Theano Graphs

In some use cases, a user might want to transfer the “state” of all randomnumber generators associated with a given theano graph (e.g. g1, with compiledfunction f1 below) to a second graph (e.g. g2, with function f2). This mightarise for example if you are trying to initialize the state of a model, fromthe parameters of a pickled version of a previous model. Fortheano.tensor.shared_randomstreams.RandomStreams andtheano.sandbox.rng_mrg.MRG_RandomStreamsthis can be achieved by copying elements of the state_updates parameter.

Each time a random variable is drawn from a RandomStreams object, a tuple isadded to the state_updates list. The first element is a shared variable,which represents the state of the random number generator associated with thisparticular variable, while the second represents the theano graphcorresponding to the random number generation process (i.e. RandomFunction{uniform}.0).

An example of how “random states” can be transferred from one theano functionto another is shown below.

  1. >>> from __future__ import print_function
  2. >>> import theano
  3. >>> import numpy
  4. >>> import theano.tensor as T
  5. >>> from theano.sandbox.rng_mrg import MRG_RandomStreams
  6. >>> from theano.tensor.shared_randomstreams import RandomStreams
  1. >>> class Graph():
  2. ... def __init__(self, seed=123):
  3. ... self.rng = RandomStreams(seed)
  4. ... self.y = self.rng.uniform(size=(1,))
  1. >>> g1 = Graph(seed=123)
  2. >>> f1 = theano.function([], g1.y)
  1. >>> g2 = Graph(seed=987)
  2. >>> f2 = theano.function([], g2.y)
  1. >>> # By default, the two functions are out of sync.
  2. >>> f1()
  3. array([ 0.72803009])
  4. >>> f2()
  5. array([ 0.55056769])
  1. >>> def copy_random_state(g1, g2):
  2. ... if isinstance(g1.rng, MRG_RandomStreams):
  3. ... g2.rng.rstate = g1.rng.rstate
  4. ... for (su1, su2) in zip(g1.rng.state_updates, g2.rng.state_updates):
  5. ... su2[0].set_value(su1[0].get_value())
  1. >>> # We now copy the state of the theano random number generators.
  2. >>> copy_random_state(g1, g2)
  3. >>> f1()
  4. array([ 0.59044123])
  5. >>> f2()
  6. array([ 0.59044123])

Other Random Distributions

There are other distributions implemented.

Other Implementations

There are 2 other implementations based on MRG31k3p and CURAND.The RandomStream only work on the CPU, MRG31k3pwork on the CPU and GPU. CURAND only work on the GPU.

Note

To use you the MRG version easily, you can just change the import to:

  1. from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams

A Real Example: Logistic Regression

The preceding elements are featured in this more realistic example.It will be used repeatedly.

  1. import numpy
  2. import theano
  3. import theano.tensor as T
  4. rng = numpy.random
  5.  
  6. N = 400 # training sample size
  7. feats = 784 # number of input variables
  8.  
  9. # generate a dataset: D = (input_values, target_class)
  10. D = (rng.randn(N, feats), rng.randint(size=N, low=0, high=2))
  11. training_steps = 10000
  12.  
  13. # Declare Theano symbolic variables
  14. x = T.dmatrix("x")
  15. y = T.dvector("y")
  16.  
  17. # initialize the weight vector w randomly
  18. #
  19. # this and the following bias variable b
  20. # are shared so they keep their values
  21. # between training iterations (updates)
  22. w = theano.shared(rng.randn(feats), name="w")
  23.  
  24. # initialize the bias term
  25. b = theano.shared(0., name="b")
  26.  
  27. print("Initial model:")
  28. print(w.get_value())
  29. print(b.get_value())
  30.  
  31. # Construct Theano expression graph
  32. p_1 = 1 / (1 + T.exp(-T.dot(x, w) - b)) # Probability that target = 1
  33. prediction = p_1 > 0.5 # The prediction thresholded
  34. xent = -y * T.log(p_1) - (1-y) * T.log(1-p_1) # Cross-entropy loss function
  35. cost = xent.mean() + 0.01 * (w ** 2).sum()# The cost to minimize
  36. gw, gb = T.grad(cost, [w, b]) # Compute the gradient of the cost
  37. # w.r.t weight vector w and
  38. # bias term b
  39. # (we shall return to this in a
  40. # following section of this tutorial)
  41.  
  42. # Compile
  43. train = theano.function(
  44. inputs=[x,y],
  45. outputs=[prediction, xent],
  46. updates=((w, w - 0.1 * gw), (b, b - 0.1 * gb)))
  47. predict = theano.function(inputs=[x], outputs=prediction)
  48.  
  49. # Train
  50. for i in range(training_steps):
  51. pred, err = train(D[0], D[1])
  52.  
  53. print("Final model:")
  54. print(w.get_value())
  55. print(b.get_value())
  56. print("target values for D:")
  57. print(D[1])
  58. print("prediction on D:")
  59. print(predict(D[0]))