tensor.elemwise – Tensor Elemwise

  • class theano.tensor.elemwise.All(axis=None)[source]
  • Applies logical and to all the values of a tensor along thespecified axis(es).
  • class theano.tensor.elemwise.Any(axis=None)[source]
  • Applies bitwise or to all the values of a tensor along thespecified axis(es).
  • class theano.tensor.elemwise.CAReduce(scalar_op, axis=None)[source]
  • CAReduce = Commutative Associative ReduceReduces a scalar operation along the specified axis(es).(The scalar op should be both commutative and assocative)

The output will have the same shape as the input minus the reduceddimensions. It will contain the variable of accumulating all valuesover the reduced dimensions using the specified scalar op.

Parameters:

  • scalar_op – A binary scalar op with only one output.It must be commutative and associative.
  • axis
    • The dimension along which we want to reduce
    • List of dimensions that we want to reduce
    • If None, all dimensions are reduced

Note

  1. CAReduce(add) # sum (ie, acts like the numpy sum operation)
  2. CAReduce(mul) # product
  3. CAReduce(maximum) # max
  4. CAReduce(minimum) # min
  5. CAReduce(or_) # any # not lazy
  6. CAReduce(and_) # all # not lazy
  7. CAReduce(xor) # a bit at 1 tell that there was an odd number of
  8. # bit at that position that where 1. 0 it was an
  9. # even number ...

In order to (eventually) optimize memory usage patterns,CAReduce makes zero guarantees on the order in which ititerates over the dimensions and the elements of thearray(s). Therefore, to ensure consistent variables, the scalaroperation represented by the reduction must be both commutativeand associative (eg add, multiply, maximum, binary or/and/xor - but notsubtract, divide or power).

  • class theano.tensor.elemwise.CAReduceDtype(scalar_op, axis=None, dtype=None, acc_dtype=None)[source]
  • Reduces a scalar operation along the specified axis(es).

This subclass of CAReduce accepts an additional “dtype” parameter,that specifies which dtype the output should be.

It also accepts an optional “acc_dtype”, which specify the dtype thatwill be used for the accumulation.

So, the accumulation will be done into a tensor of dtype “acc_dtype”,then it will be casted into “dtype” and returned.

If no dtype is provided, one will be inferred so as not to losetoo much precision.

Parameters:

  • scalar_op – A binary scalar op with only one output.It must be commutative and associative.
  • axis
    • the dimension along which we want to reduce
    • list of dimensions that we want to reduce
    • if None, all dimensions are reduced
  • dtype – The dtype of the returned tensor. If None, then we use the defaultdtype which is the same as the input tensor’s dtype except when:

    • the input dtype is a signed integer of precision < 64 bit, in whichcase we use int64
    • the input dtype is an unsigned integer of precision < 64 bit, inwhich case we use uint64 This default dtype does not depend on the value of “acc_dtype”.This behavior is similar in spirit to that of numpy (except numpyuses the default machine integer while we always use 64 bitintegers to avoid platform-dependent behavior).
  • acc_dtype – The dtype of the internal accumulator.If None (default), we use the dtype in the list below,or the input dtype if its precision is higher:

    • for int dtypes, we use at least int64;
    • for uint dtypes, we use at least uint64;
    • for float dtypes, we use at least float64;
    • for complex dtypes, we use at least complex128.
  • class theano.tensor.elemwise.DimShuffle(input_broadcastable, new_order, inplace=True)[source]
  • Allows to reorder the dimensions of a tensor or insert or removebroadcastable dimensions.

In the following examples, ‘x’ means that we insert a broadcastabledimension and a numerical index represents the dimension of the samerank in the tensor passed to perform.

Parameters:

  • input_broadcastable – The expected broadcastable pattern of the input
  • new_order – A list representing the relationship between the input’sdimensions and the output’s dimensions. Each element of thelist can either be an index or ‘x’. Indices must be encodedas python integers, not theano symbolic integers.
  • inplace (bool, optional) – If True (default), the output will be a view of the input.

Note

If j = new_order[i] is an index, the output’s ith dimensionwill be the input’s jth dimension.If new_order[i] is x, the output’s ith dimension willbe 1 and Broadcast operations will be allowed to do broadcastingover that dimension.

If input.broadcastable[i] == False then i must be found in new_order.Broadcastable dimensions, on the other hand, can be discarded.

Note

  1. DimShuffle((False, False, False), ['x', 2, 'x', 0, 1])

This op will only work on 3d tensors with no broadcastabledimensions. The first dimension will be broadcastable,then we will have the third dimension of the input tensor asthe second of the resulting tensor, etc. If the tensor hasshape (20, 30, 40), the resulting tensor will have dimensions(1, 40, 1, 20, 30). (AxBxC tensor is mapped to 1xCx1xAxB tensor)

  1. DimShuffle((True, False), [1])

This op will only work on 2d tensors with the first dimensionbroadcastable.The second dimension of the input tensor will be the first dimension ofthe resulting tensor.If the tensor has shape (1, 20), the resulting tensor will have shape(20, ).

Example

  1. DimShuffle((), ['x']) # make a 0d (scalar) into a 1d vector
  2. DimShuffle((False, False), [0, 1]) # identity
  3. DimShuffle((False, False), [1, 0]) # inverts the 1st and 2nd dimensions
  4. DimShuffle((False,), ['x', 0]) # make a row out of a 1d vector
  5. # (N to 1xN)
  6. DimShuffle((False,), [0, 'x']) # make a column out of a 1d vector
  7. # (N to Nx1)
  8. DimShuffle((False, False, False), [2, 0, 1]) # AxBxC to CxAxB
  9. DimShuffle((False, False), [0, 'x', 1]) # AxB to Ax1xB
  10. DimShuffle((False, False), [1, 'x', 0]) # AxB to Bx1xA

The reordering of the dimensions can be done with the numpy.transposefunction.Adding, subtracting dimensions can be done with reshape.

  • class theano.tensor.elemwise.Elemwise(scalar_op, inplace_pattern=None, name=None, nfunc_spec=None, openmp=None)[source]
  • Generalizes a scalar op to tensors.

All the inputs must have the same number of dimensions. When theOp is performed, for each dimension, each input’s size for thatdimension must be the same. As a special case, it can also be 1but only if the input’s broadcastable flag is True for thatdimension. In that case, the tensor is (virtually) replicatedalong that dimension to match the size of the others.

The dtypes of the outputs mirror those of the scalar Op that isbeing generalized to tensors. In particular, if the calculationsfor an output are done inplace on an input, the output type mustbe the same as the corresponding input type (see the doc ofscalar.ScalarOp to get help about controlling the output type)

Parameters:

  • scalar_op – An instance of a subclass of scalar.ScalarOp which works uniquelyon scalars.
  • inplace_pattern – A dictionary that maps the index of an output to theindex of an input so the output is calculated inplace usingthe input’s storage. (Just like destroymap, but without the lists.)
  • nfunc_spec – Either None or a tuple of three elements,(nfunc_name, nin, nout) such that getattr(numpy, nfunc_name)implements this operation, takes nin inputs and nout outputs.Note that nin cannot always be inferred from the scalar op’sown nin field because that value is sometimes 0 (meaning avariable number of inputs), whereas the numpy function maynot have varargs.

Note

Elemwise(add) represents + on tensors (x + y)

Elemwise(add, {0 : 0}) represents the += operation (x += y)

Elemwise(add, {0 : 1}) represents += on the second argument (y += x)

Elemwise(mul)(rand(10, 5), rand(1, 5)) the second input is completed along the first dimension to match the first input

Elemwise(true_div)(rand(10, 5), rand(10, 1)) same but along the second dimension

Elemwise(int_div)(rand(1, 5), rand(10, 1)) the output has size (10, 5)

Elemwise(log)(rand(3, 4, 5))

  • getoutput_info(_dim_shuffle, *inputs)[source]
  • Return the outputs dtype and broadcastable pattern and thedimshuffled niputs.

  • makenode(*inputs_)[source]

  • If the inputs have different number of dimensions, their shapeis left-completed to the greatest number of dimensions with 1susing DimShuffle.

  • pythonconstant_folding(_node)[source]

  • Return True if we do not want to compile c codewhen doing constant folding of this node.
  • class theano.tensor.elemwise.Prod(axis=None, dtype=None, acc_dtype=None, no_zeros_in_input=False)[source]
  • Multiplies all the values of a tensor along the specified axis(es).

Equivalent to CAReduce(scalar.prod, axis = axis), with thedifference that this defines the gradient of prod wrt its tensorinput.

  • Lop(_inp, out, grads)[source]
  • The grad of this Op could be very easy, if it is was not for the casewhere zeros are present in a given “group” (ie. elements reducedtogether to form the product).

If no zeros are found in the elements of the product, then thepartial derivative of the product relative to one of the elements(one of the inputs) is simply the product of the other elements.That’s easy to see from the chain rule.

Now the trick (with no zeros) is to take the overall product, thenfor every original element, the partial derivative is given bythis product divided by the element itself (which equals the productof the other terms). This is easy to do by broadcasting the originalproduct.

(Note that we also need to broadcast-multiply by the“incoming gradient”, ie. the gradient of the cost relative to theoutput/product).

With zeros, things get more complicated. For a given group, we have 3cases:

  1. -

No zeros in the group. Use previous trick.

  1. -
  2. - If only one zero is present, then the gradient for that element is
  3. -

non-zero, but is zero for all others.

  1. -

If more than one zero is present, then all the derivatives are zero.

For the last two cases (with 1 or more zeros), we can’t use thedivision trick, as this gives divisions by 0.

Implementing that case-by-case logic is not as trivial, so a bunch ofhacks are piled down here to do it. Notably, for the “only one zero”case, there’s a special Op that computes the product of the elementsin the group, minus the zero (see ProdWithoutZero). The trick is thento use the division trick for groups with no zero, to use theProdWithoutZeros op where there’s only one zero, and to output aderivative of zero for any element part of a group with more thanone zero.

I do this by first counting the number of zeros in each group (seethe “T.eq()” bits), then taking this or that behavior (see T.switch)based on the result of this count.

  • class theano.tensor.elemwise.Sum(axis=None, dtype=None, acc_dtype=None)[source]
  • Sums all the values of a tensor along the specified axis(es).

Equivalent to CAReduceDtype(scalar.add, axis=axis, dtype=dtype),with the difference that this defines the gradient of sum wrt itstensor input.

Parameters:

  • axis – Axis(es) along which the tensor should be summed(use None to sum over all axes, and a list or tuple to sum along morethan one axis).
  • dtype – The dtype of the internal accumulator and returnedtensor. If None, then we use the default dtype which is the same as theinput tensor’s dtype except when:- the input dtype is a signed integer of precision < 64 bit, inwhich case we use int64- the input dtype is an unsigned integer of precision < 64 bit, inwhich case we use uint64This value does not depend on the value of “acc_dtype”.
  • acc_dtype – The dtype of the internal accumulator.If None (default), we use the dtype in the list below,or the input dtype if its precision is higher:- for int dtypes, we use at least int64;- for uint dtypes, we use at least uint64;- for float dtypes, we use at least float64;- for complex dtypes, we use at least complex128.