Multi cores support in Theano

BLAS operation

BLAS is an interface for some mathematic operations between twovectors, a vector and a matrix or two matrices (e.g. the dot productbetween vector/matrix and matrix/matrix). Many differentimplementations of that interface exist and some of them areparallelized.

Theano tries to use that interface as frequently as possible forperformance reasons. So if Theano links to a parallel implementation,those operations will run in parallel in Theano.

The most frequent way to control the number of threads used is via theOMP_NUM_THREADS environment variable. Set it to the number ofthreads you want to use before starting the Python process. Some BLASimplementations support other environment variables.

To test if you BLAS supports OpenMP/Multiple cores, you can use the theano/misc/check_blas.py script from the command line like this:

  1. OMP_NUM_THREADS=1 python theano/misc/check_blas.py -q
  2. OMP_NUM_THREADS=2 python theano/misc/check_blas.py -q

Parallel element wise ops with OpenMP

Because element wise ops work on every tensor entry independently theycan be easily parallelized using OpenMP.

To use OpenMP you must set the openmp flagto True.

You can use the flag openmp_elemwise_minsize to set the minimumtensor size for which the operation is parallelized because for shorttensors using OpenMP can slow down the operation. The default value is200000.

For simple (fast) operations you can obtain a speed-up with very largetensors while for more complex operations you can obtain a good speed-upalso for smaller tensors.

There is a script elemwise_openmp_speedup.py in theano/misc/which you can use to tune the value of openmp_elemwise_minsize foryour machine. The script runs two elemwise operations (a fast one anda slow one) for a vector of size openmp_elemwise_minsize with andwithout OpenMP and shows the time difference between the cases.

The only way to control the number of threads used is via theOMP_NUM_THREADS environment variable. Set it to the number ofthreads you want to use before starting the Python process. You cantest this with this command:

  1. OMP_NUM_THREADS=2 python theano/misc/elemwise_openmp_speedup.py
  2. #The output
  3.  
  4. Fast op time without openmp 0.000533s with openmp 0.000474s speedup 1.12
  5. Slow op time without openmp 0.002987s with openmp 0.001553s speedup 1.92