Multi-Threading

Multi-Threading

Threads.@threads [schedule] for ... end

A macro to execute a for loop in parallel. The iteration space is distributed to coarse-grained tasks. This policy can be specified by the schedule argument. The execution of the loop waits for the evaluation of all iterations.

See also: @spawn and pmap in Distributed.

Extended help

Semantics

Unless stronger guarantees are specified by the scheduling option, the loop executed by @threads macro have the following semantics.

The @threads macro executes the loop body in an unspecified order and potentially concurrently. It does not specify the exact assignments of the tasks and the worker threads. The assignments can be different for each execution. The loop body code (including any code transitively called from it) must not make any assumptions about the distribution of iterations to tasks or the worker thread in which they are executed. The loop body for each iteration must be able to make forward progress independent of other iterations and be free from data races. As such, invalid synchronizations across iterations may deadlock while unsynchronized memory accesses may result in undefined behavior.

For example, the above conditions imply that:

The lock taken in an iteration must be released within the same iteration.
Communicating between iterations using blocking primitives like Channels is incorrect.
Write only to locations not shared across iterations (unless a lock or atomic operation is used).
The value of threadid() may change even within a single iteration.

Schedulers

Without the scheduler argument, the exact scheduling is unspecified and varies across Julia releases. Currently, :dynamic is used when the scheduler is not specified.

Julia 1.5

The schedule argument is available as of Julia 1.5.

:dynamic (default)

:dynamic scheduler executes iterations dynamically to available worker threads. Current implementation assumes that the workload for each iteration is uniform. However, this assumption may be removed in the future.

This scheduling option is merely a hint to the underlying execution mechanism. However, a few properties can be expected. The number of Tasks used by :dynamic scheduler is bounded by a small constant multiple of the number of available worker threads (nthreads()). Each task processes contiguous regions of the iteration space. Thus, @threads :dynamic for x in xs; f(x); end is typically more efficient than @sync for x in xs; @spawn f(x); end if length(xs) is significantly larger than the number of the worker threads and the run-time of f(x) is relatively smaller than the cost of spawning and synchronizaing a task (typically less than 10 microseconds).

Julia 1.8

The :dynamic option for the schedule argument is available and the default as of Julia 1.8.

:static

:static scheduler creates one task per thread and divides the iterations equally among them, assigning each task specifically to each thread. In particular, the value of threadid() is guranteed to be constant within one iteration. Specifying :static is an error if used from inside another @threads loop or from a thread other than 1.

Note

:static scheduling exists for supporting transition of code written before Julia 1.3. In newly written library functions, :static scheduling is discouraged because the functions using this option cannot be called from arbitrary worker threads.

Example

To illustrate of the different scheduling strategies, consider the following function busywait containing a non-yielding timed loop that runs for a given number of seconds.

julia> function busywait(seconds)
            tstart = time_ns()
            while (time_ns() - tstart) / 1e9 < seconds
            end
        end
julia> @time begin
            Threads.@spawn busywait(5)
            Threads.@threads :static for i in 1:Threads.nthreads()
                busywait(1)
            end
        end
6.003001 seconds (16.33 k allocations: 899.255 KiB, 0.25% compilation time)
julia> @time begin
            Threads.@spawn busywait(5)
            Threads.@threads :dynamic for i in 1:Threads.nthreads()
                busywait(1)
            end
        end
2.012056 seconds (16.05 k allocations: 883.919 KiB, 0.66% compilation time)

The :dynamic example takes 2 seconds since one of the non-occupied threads is able to run two of the 1-second iterations to complete the for loop.

Multi-Threading

Multi-Threading

Atomic operations

ccall using a threadpool (Experimental))

Low-level synchronization primitives