Loop

Scan

  • A general form of recurrence, which can be used for looping.
  • Reduction and map (loop over the leading dimensions) are special cases of scan.
  • You scan a function along some input sequence, producing an output at each time-step.
  • The function can see the previous K time-steps of your function.
  • sum() could be computed by scanning the z + x(i) function over a list, given an initial state of z=0.
  • Often a for loop can be expressed as a scan() operation, and scan is the closest that Theano comes to looping.
  • Advantages of using scan over for loops:
    • Number of iterations to be part of the symbolic graph.
    • Minimizes GPU transfers (if GPU is involved).
    • Computes gradients through sequential steps.
    • Slightly faster than using a for loop in Python with a compiled Theano function.
    • Can lower the overall memory usage by detecting the actual amount of memory needed.

The full documentation can be found in the library: Scan.

A good ipython notebook with explanation and more examples.

Scan Example: Computing tanh(x(t).dot(W) + b) elementwise

  1. import theano
  2. import theano.tensor as T
  3. import numpy as np
  4.  
  5. # defining the tensor variables
  6. X = T.matrix("X")
  7. W = T.matrix("W")
  8. b_sym = T.vector("b_sym")
  9.  
  10. results, updates = theano.scan(lambda v: T.tanh(T.dot(v, W) + b_sym), sequences=X)
  11. compute_elementwise = theano.function(inputs=[X, W, b_sym], outputs=results)
  12.  
  13. # test values
  14. x = np.eye(2, dtype=theano.config.floatX)
  15. w = np.ones((2, 2), dtype=theano.config.floatX)
  16. b = np.ones((2), dtype=theano.config.floatX)
  17. b[1] = 2
  18.  
  19. print(compute_elementwise(x, w, b))
  20.  
  21. # comparison with numpy
  22. print(np.tanh(x.dot(w) + b))
  1. [[ 0.96402758 0.99505475]
  2. [ 0.96402758 0.99505475]]
  3. [[ 0.96402758 0.99505475]
  4. [ 0.96402758 0.99505475]]

Scan Example: Computing the sequence x(t) = tanh(x(t - 1).dot(W) + y(t).dot(U) + p(T - t).dot(V))

  1. import theano
  2. import theano.tensor as T
  3. import numpy as np
  4.  
  5. # define tensor variables
  6. X = T.vector("X")
  7. W = T.matrix("W")
  8. b_sym = T.vector("b_sym")
  9. U = T.matrix("U")
  10. Y = T.matrix("Y")
  11. V = T.matrix("V")
  12. P = T.matrix("P")
  13.  
  14. results, updates = theano.scan(lambda y, p, x_tm1: T.tanh(T.dot(x_tm1, W) + T.dot(y, U) + T.dot(p, V)),
  15. sequences=[Y, P[::-1]], outputs_info=[X])
  16. compute_seq = theano.function(inputs=[X, W, Y, U, P, V], outputs=results)
  17.  
  18. # test values
  19. x = np.zeros((2), dtype=theano.config.floatX)
  20. x[1] = 1
  21. w = np.ones((2, 2), dtype=theano.config.floatX)
  22. y = np.ones((5, 2), dtype=theano.config.floatX)
  23. y[0, :] = -3
  24. u = np.ones((2, 2), dtype=theano.config.floatX)
  25. p = np.ones((5, 2), dtype=theano.config.floatX)
  26. p[0, :] = 3
  27. v = np.ones((2, 2), dtype=theano.config.floatX)
  28.  
  29. print(compute_seq(x, w, y, u, p, v))
  30.  
  31. # comparison with numpy
  32. x_res = np.zeros((5, 2), dtype=theano.config.floatX)
  33. x_res[0] = np.tanh(x.dot(w) + y[0].dot(u) + p[4].dot(v))
  34. for i in range(1, 5):
  35. x_res[i] = np.tanh(x_res[i - 1].dot(w) + y[i].dot(u) + p[4-i].dot(v))
  36. print(x_res)
  1. [[-0.99505475 -0.99505475]
  2. [ 0.96471973 0.96471973]
  3. [ 0.99998585 0.99998585]
  4. [ 0.99998771 0.99998771]
  5. [ 1. 1. ]]
  6. [[-0.99505475 -0.99505475]
  7. [ 0.96471973 0.96471973]
  8. [ 0.99998585 0.99998585]
  9. [ 0.99998771 0.99998771]
  10. [ 1. 1. ]]

Scan Example: Computing norms of lines of X

  1. import theano
  2. import theano.tensor as T
  3. import numpy as np
  4.  
  5. # define tensor variable
  6. X = T.matrix("X")
  7. results, updates = theano.scan(lambda x_i: T.sqrt((x_i ** 2).sum()), sequences=[X])
  8. compute_norm_lines = theano.function(inputs=[X], outputs=results)
  9.  
  10. # test value
  11. x = np.diag(np.arange(1, 6, dtype=theano.config.floatX), 1)
  12. print(compute_norm_lines(x))
  13.  
  14. # comparison with numpy
  15. print(np.sqrt((x ** 2).sum(1)))
  1. [ 1. 2. 3. 4. 5. 0.]
  2. [ 1. 2. 3. 4. 5. 0.]

Scan Example: Computing norms of columns of X

  1. import theano
  2. import theano.tensor as T
  3. import numpy as np
  4.  
  5. # define tensor variable
  6. X = T.matrix("X")
  7. results, updates = theano.scan(lambda x_i: T.sqrt((x_i ** 2).sum()), sequences=[X.T])
  8. compute_norm_cols = theano.function(inputs=[X], outputs=results)
  9.  
  10. # test value
  11. x = np.diag(np.arange(1, 6, dtype=theano.config.floatX), 1)
  12. print(compute_norm_cols(x))
  13.  
  14. # comparison with numpy
  15. print(np.sqrt((x ** 2).sum(0)))
  1. [ 0. 1. 2. 3. 4. 5.]
  2. [ 0. 1. 2. 3. 4. 5.]

Scan Example: Computing trace of X

  1. import theano
  2. import theano.tensor as T
  3. import numpy as np
  4. floatX = "float32"
  5.  
  6. # define tensor variable
  7. X = T.matrix("X")
  8. results, updates = theano.scan(lambda i, j, t_f: T.cast(X[i, j] + t_f, floatX),
  9. sequences=[T.arange(X.shape[0]), T.arange(X.shape[1])],
  10. outputs_info=np.asarray(0., dtype=floatX))
  11. result = results[-1]
  12. compute_trace = theano.function(inputs=[X], outputs=result)
  13.  
  14. # test value
  15. x = np.eye(5, dtype=theano.config.floatX)
  16. x[0] = np.arange(5, dtype=theano.config.floatX)
  17. print(compute_trace(x))
  18.  
  19. # comparison with numpy
  20. print(np.diagonal(x).sum())
  1. 4.0
  2. 4.0

Scan Example: Computing the sequence x(t) = x(t - 2).dot(U) + x(t - 1).dot(V) + tanh(x(t - 1).dot(W) + b)

  1. import theano
  2. import theano.tensor as T
  3. import numpy as np
  4.  
  5. # define tensor variables
  6. X = T.matrix("X")
  7. W = T.matrix("W")
  8. b_sym = T.vector("b_sym")
  9. U = T.matrix("U")
  10. V = T.matrix("V")
  11. n_sym = T.iscalar("n_sym")
  12.  
  13. results, updates = theano.scan(lambda x_tm2, x_tm1: T.dot(x_tm2, U) + T.dot(x_tm1, V) + T.tanh(T.dot(x_tm1, W) + b_sym),
  14. n_steps=n_sym, outputs_info=[dict(initial=X, taps=[-2, -1])])
  15. compute_seq2 = theano.function(inputs=[X, U, V, W, b_sym, n_sym], outputs=results)
  16.  
  17. # test values
  18. x = np.zeros((2, 2), dtype=theano.config.floatX) # the initial value must be able to return x[-2]
  19. x[1, 1] = 1
  20. w = 0.5 * np.ones((2, 2), dtype=theano.config.floatX)
  21. u = 0.5 * (np.ones((2, 2), dtype=theano.config.floatX) - np.eye(2, dtype=theano.config.floatX))
  22. v = 0.5 * np.ones((2, 2), dtype=theano.config.floatX)
  23. n = 10
  24. b = np.ones((2), dtype=theano.config.floatX)
  25.  
  26. print(compute_seq2(x, u, v, w, b, n))
  27.  
  28. # comparison with numpy
  29. x_res = np.zeros((10, 2))
  30. x_res[0] = x[0].dot(u) + x[1].dot(v) + np.tanh(x[1].dot(w) + b)
  31. x_res[1] = x[1].dot(u) + x_res[0].dot(v) + np.tanh(x_res[0].dot(w) + b)
  32. x_res[2] = x_res[0].dot(u) + x_res[1].dot(v) + np.tanh(x_res[1].dot(w) + b)
  33. for i in range(2, 10):
  34. x_res[i] = (x_res[i - 2].dot(u) + x_res[i - 1].dot(v) +
  35. np.tanh(x_res[i - 1].dot(w) + b))
  36. print(x_res)
  1. [[ 1.40514825 1.40514825]
  2. [ 2.88898899 2.38898899]
  3. [ 4.34018291 4.34018291]
  4. [ 6.53463142 6.78463142]
  5. [ 9.82972243 9.82972243]
  6. [ 14.22203814 14.09703814]
  7. [ 20.07439936 20.07439936]
  8. [ 28.12291843 28.18541843]
  9. [ 39.1913681 39.1913681 ]
  10. [ 54.28407732 54.25282732]]
  11. [[ 1.40514825 1.40514825]
  12. [ 2.88898899 2.38898899]
  13. [ 4.34018291 4.34018291]
  14. [ 6.53463142 6.78463142]
  15. [ 9.82972243 9.82972243]
  16. [ 14.22203814 14.09703814]
  17. [ 20.07439936 20.07439936]
  18. [ 28.12291843 28.18541843]
  19. [ 39.1913681 39.1913681 ]
  20. [ 54.28407732 54.25282732]]

Scan Example: Computing the Jacobian of y = tanh(v.dot(A)) wrt x

  1. import theano
  2. import theano.tensor as T
  3. import numpy as np
  4.  
  5. # define tensor variables
  6. v = T.vector()
  7. A = T.matrix()
  8. y = T.tanh(T.dot(v, A))
  9. results, updates = theano.scan(lambda i: T.grad(y[i], v), sequences=[T.arange(y.shape[0])])
  10. compute_jac_t = theano.function([A, v], results, allow_input_downcast=True) # shape (d_out, d_in)
  11.  
  12. # test values
  13. x = np.eye(5, dtype=theano.config.floatX)[0]
  14. w = np.eye(5, 3, dtype=theano.config.floatX)
  15. w[2] = np.ones((3), dtype=theano.config.floatX)
  16. print(compute_jac_t(w, x))
  17.  
  18. # compare with numpy
  19. print(((1 - np.tanh(x.dot(w)) ** 2) * w).T)
  1. [[ 0.41997434 0. 0.41997434 0. 0. ]
  2. [ 0. 1. 1. 0. 0. ]
  3. [ 0. 0. 1. 0. 0. ]]
  4. [[ 0.41997434 0. 0.41997434 0. 0. ]
  5. [ 0. 1. 1. 0. 0. ]
  6. [ 0. 0. 1. 0. 0. ]]

Note that we need to iterate over the indices of y and not over the elements of y. The reason is that scan create a placeholder variable for its internal function and this placeholder variable does not have the same dependencies than the variables that will replace it.

Scan Example: Accumulate number of loop during a scan

  1. import theano
  2. import theano.tensor as T
  3. import numpy as np
  4.  
  5. # define shared variables
  6. k = theano.shared(0)
  7. n_sym = T.iscalar("n_sym")
  8.  
  9. results, updates = theano.scan(lambda:{k:(k + 1)}, n_steps=n_sym)
  10. accumulator = theano.function([n_sym], [], updates=updates, allow_input_downcast=True)
  11.  
  12. k.get_value()
  13. accumulator(5)
  14. k.get_value()

Scan Example: Computing tanh(v.dot(W) + b) * d where d is binomial

  1. import theano
  2. import theano.tensor as T
  3. import numpy as np
  4.  
  5. # define tensor variables
  6. X = T.matrix("X")
  7. W = T.matrix("W")
  8. b_sym = T.vector("b_sym")
  9.  
  10. # define shared random stream
  11. trng = T.shared_randomstreams.RandomStreams(1234)
  12. d=trng.binomial(size=W[1].shape)
  13.  
  14. results, updates = theano.scan(lambda v: T.tanh(T.dot(v, W) + b_sym) * d, sequences=X)
  15. compute_with_bnoise = theano.function(inputs=[X, W, b_sym], outputs=results,
  16. updates=updates, allow_input_downcast=True)
  17. x = np.eye(10, 2, dtype=theano.config.floatX)
  18. w = np.ones((2, 2), dtype=theano.config.floatX)
  19. b = np.ones((2), dtype=theano.config.floatX)
  20.  
  21. print(compute_with_bnoise(x, w, b))
  1. [[ 0.96402758 0. ]
  2. [ 0. 0.96402758]
  3. [ 0. 0. ]
  4. [ 0.76159416 0.76159416]
  5. [ 0.76159416 0. ]
  6. [ 0. 0.76159416]
  7. [ 0. 0.76159416]
  8. [ 0. 0.76159416]
  9. [ 0. 0. ]
  10. [ 0.76159416 0.76159416]]

Note that if you want to use a random variable d that will not be updated through scan loops, you should pass this variable as a non_sequences arguments.

Scan Example: Computing pow(A, k)

  1. import theano
  2. import theano.tensor as T
  3. theano.config.warn.subtensor_merge_bug = False
  4.  
  5. k = T.iscalar("k")
  6. A = T.vector("A")
  7.  
  8. def inner_fct(prior_result, B):
  9. return prior_result * B
  10.  
  11. # Symbolic description of the result
  12. result, updates = theano.scan(fn=inner_fct,
  13. outputs_info=T.ones_like(A),
  14. non_sequences=A, n_steps=k)
  15.  
  16. # Scan has provided us with A ** 1 through A ** k. Keep only the last
  17. # value. Scan notices this and does not waste memory saving them.
  18. final_result = result[-1]
  19.  
  20. power = theano.function(inputs=[A, k], outputs=final_result,
  21. updates=updates)
  22.  
  23. print(power(range(10), 2))
  1. [ 0. 1. 4. 9. 16. 25. 36. 49. 64. 81.]

Scan Example: Calculating a Polynomial

  1. import numpy
  2. import theano
  3. import theano.tensor as T
  4. theano.config.warn.subtensor_merge_bug = False
  5.  
  6. coefficients = theano.tensor.vector("coefficients")
  7. x = T.scalar("x")
  8. max_coefficients_supported = 10000
  9.  
  10. # Generate the components of the polynomial
  11. full_range=theano.tensor.arange(max_coefficients_supported)
  12. components, updates = theano.scan(fn=lambda coeff, power, free_var:
  13. coeff * (free_var ** power),
  14. outputs_info=None,
  15. sequences=[coefficients, full_range],
  16. non_sequences=x)
  17.  
  18. polynomial = components.sum()
  19. calculate_polynomial = theano.function(inputs=[coefficients, x],
  20. outputs=polynomial)
  21.  
  22. test_coeff = numpy.asarray([1, 0, 2], dtype=numpy.float32)
  23. print(calculate_polynomial(test_coeff, 3))
  1. 19.0

Exercise

Run both examples.

Modify and execute the polynomial example to have the reduction done by scan.

Solution