Sparse

In general, sparse matrices provide the same functionality as regularmatrices. The difference lies in the way the elements of sparse matrices arerepresented and stored in memory. Only the non-zero elements of the latter are stored.This has some potential advantages: first, thismay obviously lead to reduced memory usage and, second, cleverstorage methods may lead to reduced computation time through the use ofsparse specific algorithms. We usually refer to the generically stored matricesas dense matrices.

Theano’s sparse package provides efficient algorithms, but its use is not recommendedin all cases or for all matrices. As an obvious example, consider the case wherethe sparsity proportion is very low. The sparsity proportion refers to theratio of the number of zero elements to the number of all elements in a matrix.A low sparsity proportion may result in the use of more space in memorysince not only the actual data is stored, but also the position of nearly everyelement of the matrix. This would also require more computationtime whereas a dense matrix representation along with regular optimized algorithms might do abetter job. Other examples may be found at the nexus of the specific purpose and structureof the matrices. More documentation may be found in theSciPy Sparse Reference.

Since sparse matrices are not stored in contiguous arrays, there are severalways to represent them in memory. This is usually designated by the so-called formatof the matrix. Since Theano’s sparse matrix package is based on the SciPysparse package, complete information about sparse matrices can be foundin the SciPy documentation. Like SciPy, Theano does not implement sparse formats forarrays with a number of dimensions different from two.

So far, Theano implements two formats of sparse matrix: csc and csr.Those are almost identical except that csc is based on the columns of thematrix and csr is based on its rows. They both have the same purpose:to provide for the use of efficient algorithms performing linear algebra operations.A disadvantage is that they fail to give an efficient way to modify the sparsity structureof the underlying matrix, i.e. adding new elements. This means that if you areplanning to add new elements in a sparse matrix very often in your computational graph,perhaps a tensor variable could be a better choice.

More documentation may be found in the Sparse Library Reference.

Before going further, here are the import statements that are assumed for the rest of thetutorial:

  1. >>> import theano
  2. >>> import numpy as np
  3. >>> import scipy.sparse as sp
  4. >>> from theano import sparse

Compressed Sparse Format

Theano supports two compressed sparse formats: csc and csr, respectively based on columnsand rows. They have both the same attributes: data, indices, indptr and shape.

  • The data attribute is a one-dimensional ndarray which contains all the non-zero elements of the sparse matrix.
  • The indices and indptr attributes are used to store the position of the data in the sparse matrix.
  • The shape attribute is exactly the same as the shape attribute of a dense (i.e. generic) matrix. It can be explicitly specified at the creation of a sparse matrix if it cannot be infered from the first three attributes.

Which format should I use?

At the end, the format does not affect the length of the data and indices attributes. They are bothcompletly fixed by the number of elements you want to store. The only thing that changes with the formatis indptr. In csc format, the matrix is compressed along columns so a lower number of columns willresult in less memory use. On the other hand, with the csr format, the matrix is compressed alongthe rows and with a matrix that have a lower number of rows, csr format is a better choice. So here is the rule:

Note

If shape[0] > shape[1], use csc format. Otherwise, use csr.

Sometimes, since the sparse module is young, ops does not exist for both format. So here iswhat may be the most relevent rule:

Note

Use the format compatible with the ops in your computation graph.

The documentation about the ops and their supported format may be found inthe Sparse Library Reference.

Handling Sparse in Theano

Most of the ops in Theano depend on the format of the sparse matrix.That is why there are two kinds of constructors of sparse variables:csc_matrix and csr_matrix. These can be called with the usualname and dtype parameters, but no broadcastable flags areallowed. This is forbidden since the sparse package, as the SciPy sparse module,does not provide any way to handle a number of dimensions different from two.The set of all accepted dtype for the sparse matrices can be found insparse.all_dtypes.

  1. >>> sparse.all_dtypes
  2. set(['int8', 'int16', 'int32', 'int64', 'uint8', 'uint16', 'uint32', 'uint64',
  3. 'float32', 'float64', 'complex64', 'complex128'])

To and Fro

To move back and forth from a dense matrix to a sparse matrix representation, Theanoprovides the dense_from_sparse, csr_from_dense andcsc_from_dense functions. No additional detail must be provided. Here isan example that performs a full cycle from sparse to sparse:

  1. >>> x = sparse.csc_matrix(name='x', dtype='float32')
  2. >>> y = sparse.dense_from_sparse(x)
  3. >>> z = sparse.csc_from_dense(y)

Properties and Construction

Although sparse variables do not allow direct access to their properties,this can be accomplished using the csm_properties function. This will returna tuple of one-dimensional tensor variables that represents the internal characteristicsof the sparse matrix.

In order to reconstruct a sparse matrix from some properties, the functions CSCand CSR can be used. This will create the sparse matrix in the desiredformat. As an example, the following code reconstructs a csc matrix intoa csr one.

  1. >>> x = sparse.csc_matrix(name='x', dtype='int64')
  2. >>> data, indices, indptr, shape = sparse.csm_properties(x)
  3. >>> y = sparse.CSR(data, indices, indptr, shape)
  4. >>> f = theano.function([x], y)
  5. >>> a = sp.csc_matrix(np.asarray([[0, 1, 1], [0, 0, 0], [1, 0, 0]]))
  6. >>> print(a.toarray())
  7. [[0 1 1]
  8. [0 0 0]
  9. [1 0 0]]
  10. >>> print(f(a).toarray())
  11. [[0 0 1]
  12. [1 0 0]
  13. [1 0 0]]

The last example shows that one format can be obtained from transposition ofthe other. Indeed, when calling the transpose function,the sparse characteristics of the resulting matrix cannot be the same as the oneprovided as input.

Structured Operation

Several ops are set to make use of the very peculiar structure of the sparsematrices. These ops are said to be structured and simply do not perform anycomputations on the zero elements of the sparse matrix. They can be thought as beingapplied only to the data attribute of the latter. Note that these structured opsprovide a structured gradient. More explication below.

  1. >>> x = sparse.csc_matrix(name='x', dtype='float32')
  2. >>> y = sparse.structured_add(x, 2)
  3. >>> f = theano.function([x], y)
  4. >>> a = sp.csc_matrix(np.asarray([[0, 0, -1], [0, -2, 1], [3, 0, 0]], dtype='float32'))
  5. >>> print(a.toarray())
  6. [[ 0. 0. -1.]
  7. [ 0. -2. 1.]
  8. [ 3. 0. 0.]]
  9. >>> print(f(a).toarray())
  10. [[ 0. 0. 1.]
  11. [ 0. 0. 3.]
  12. [ 5. 0. 0.]]

Gradient

The gradients of the ops in the sparse module can also be structured. Some ops providea flag to indicate if the gradient is to be structured or not. The documentation canbe used to determine if the gradient of an op is regular or structured or if itsimplementation can be modified. Similarly to structured ops, when a structured gradient is calculated, thecomputation is done only for the non-zero elements of the sparse matrix.

More documentation regarding the gradients of specific ops can be found in theSparse Library Reference.