Skip to content

Commit

Permalink
Clarify the behavior of @threads for (#44168)
Browse files Browse the repository at this point in the history
* Clarify the behavior of `@threads for`

Co-authored-by: Ian Butterworth <i.r.butterworth@gmail.com>
  • Loading branch information
tkf and IanButterworth authored Mar 3, 2022
1 parent 82dc130 commit 2f67b51
Showing 1 changed file with 62 additions and 30 deletions.
92 changes: 62 additions & 30 deletions base/threadingconstructs.jl
Original file line number Diff line number Diff line change
Expand Up @@ -99,46 +99,82 @@ end
"""
Threads.@threads [schedule] for ... end
A macro to parallelize a `for` loop to run with multiple threads. Splits the iteration
space among multiple tasks and runs those tasks on threads according to a scheduling
policy.
A barrier is placed at the end of the loop which waits for all tasks to finish
execution.
The `schedule` argument can be used to request a particular scheduling policy.
Except for `:static` scheduling, how the iterations are assigned to tasks, and how the tasks
are assigned to the worker threads is undefined. The exact assignments can be different
for each execution. The scheduling option is a hint. The loop body code (including any code
transitively called from it) must not make assumptions about the distribution of iterations
to tasks or the worker thread in which they are executed. The loop body for each iteration
must be able to make forward progress independent of other iterations and be free from data
races. As such, synchronizations across iterations may deadlock.
A macro to execute a `for` loop in parallel. The iteration space is distributed to
coarse-grained tasks. This policy can be specified by the `schedule` argument. The
execution of the loop waits for the evaluation of all iterations.
See also: [`@spawn`](@ref Threads.@spawn) and
`pmap` in [`Distributed`](@ref man-distributed).
# Extended help
## Semantics
Unless stronger guarantees are specified by the scheduling option, the loop executed by
`@threads` macro have the following semantics.
The `@threads` macro executes the loop body in an unspecified order and potentially
concurrently. It does not specify the exact assignments of the tasks and the worker threads.
The assignments can be different for each execution. The loop body code (including any code
transitively called from it) must not make any assumptions about the distribution of
iterations to tasks or the worker thread in which they are executed. The loop body for each
iteration must be able to make forward progress independent of other iterations and be free
from data races. As such, invalid synchronizations across iterations may deadlock while
unsynchronized memory accesses may result in undefined behavior.
For example, the above conditions imply that:
- The lock taken in an iteration *must* be released within the same iteration.
- Communicating between iterations using blocking primitives like `Channel`s is incorrect.
- Write only to locations not shared across iterations (unless a lock or atomic operation is used).
- Write only to locations not shared across iterations (unless a lock or atomic operation is
used).
- The value of [`threadid()`](@ref Threads.threadid) may change even within a single
iteration.
Schedule options are:
- `:dynamic` (default) will schedule iterations dynamically to available worker threads,
assuming that the workload for each iteration is uniform.
- `:static` creates one task per thread and divides the iterations equally among
them, assigning each task specifically to each thread.
Specifying `:static` is an error if used from inside another `@threads` loop
or from a thread other than 1.
## Schedulers
Without the scheduler argument, the exact scheduling is unspecified and varies across Julia releases.
Without the scheduler argument, the exact scheduling is unspecified and varies across Julia
releases. Currently, `:dynamic` is used when the scheduler is not specified.
!!! compat "Julia 1.5"
The `schedule` argument is available as of Julia 1.5.
### `:dynamic` (default)
`:dynamic` scheduler executes iterations dynamically to available worker threads. Current
implementation assumes that the workload for each iteration is uniform. However, this
assumption may be removed in the future.
This scheduling option is merely a hint to the underlying execution mechanism. However, a
few properties can be expected. The number of `Task`s used by `:dynamic` scheduler is
bounded by a small constant multiple of the number of available worker threads
([`nthreads()`](@ref Threads.nthreads)). Each task processes contiguous regions of the
iteration space. Thus, `@threads :dynamic for x in xs; f(x); end` is typically more
efficient than `@sync for x in xs; @spawn f(x); end` if `length(xs)` is significantly
larger than the number of the worker threads and the run-time of `f(x)` is relatively
smaller than the cost of spawning and synchronizaing a task (typically less than 10
microseconds).
!!! compat "Julia 1.8"
The `:dynamic` option for the `schedule` argument is available and the default as of Julia 1.8.
For example, an illustration of the different scheduling strategies where `busywait`
is a non-yielding timed loop that runs for a number of seconds.
### `:static`
`:static` scheduler creates one task per thread and divides the iterations equally among
them, assigning each task specifically to each thread. In particular, the value of
[`threadid()`](@ref Threads.threadid) is guranteed to be constant within one iteration.
Specifying `:static` is an error if used from inside another `@threads` loop or from a
thread other than 1.
!!! note
`:static` scheduling exists for supporting transition of code written before Julia 1.3.
In newly written library functions, `:static` scheduling is discouraged because the
functions using this option cannot be called from arbitrary worker threads.
## Example
To illustrate of the different scheduling strategies, consider the following function
`busywait` containing a non-yielding timed loop that runs for a given number of seconds.
```julia-repl
julia> function busywait(seconds)
Expand Down Expand Up @@ -166,10 +202,6 @@ julia> @time begin
The `:dynamic` example takes 2 seconds since one of the non-occupied threads is able
to run two of the 1-second iterations to complete the for loop.
See also: [`@spawn`](@ref Threads.@spawn), [`nthreads()`](@ref Threads.nthreads),
[`threadid()`](@ref Threads.threadid), `pmap` in [`Distributed`](@ref man-distributed),
`BLAS.set_num_threads` in [`LinearAlgebra`](@ref man-linalg).
"""
macro threads(args...)
na = length(args)
Expand Down

0 comments on commit 2f67b51

Please sign in to comment.