Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify the behavior of @threads for #44168

Merged
merged 6 commits into from
Mar 3, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 62 additions & 30 deletions base/threadingconstructs.jl
Original file line number Diff line number Diff line change
Expand Up @@ -99,46 +99,82 @@ end
"""
Threads.@threads [schedule] for ... end
A macro to parallelize a `for` loop to run with multiple threads. Splits the iteration
space among multiple tasks and runs those tasks on threads according to a scheduling
policy.
A barrier is placed at the end of the loop which waits for all tasks to finish
execution.
The `schedule` argument can be used to request a particular scheduling policy.
Except for `:static` scheduling, how the iterations are assigned to tasks, and how the tasks
are assigned to the worker threads is undefined. The exact assignments can be different
for each execution. The scheduling option is a hint. The loop body code (including any code
transitively called from it) must not make assumptions about the distribution of iterations
to tasks or the worker thread in which they are executed. The loop body for each iteration
must be able to make forward progress independent of other iterations and be free from data
races. As such, synchronizations across iterations may deadlock.
A macro to execute a `for` loop in parallel. The iteration space is distributed to
coarse-grained tasks. This policy can be specified by the `schedule` argument. The
execution of the loop waits for the evaluation of all iterations.
See also: [`@spawn`](@ref Threads.@spawn) and
`pmap` in [`Distributed`](@ref man-distributed).
# Extended help
## Semantics
Unless stronger guarantees are specified by the scheduling option, the loop executed by
`@threads` macro have the following semantics.
The `@threads` macro executes the loop body in an unspecified order and potentially
concurrently. It does not specify the exact assignments of the tasks and the worker threads.
The assignments can be different for each execution. The loop body code (including any code
transitively called from it) must not make any assumptions about the distribution of
iterations to tasks or the worker thread in which they are executed. The loop body for each
iteration must be able to make forward progress independent of other iterations and be free
from data races. As such, invalid synchronizations across iterations may deadlock while
unsynchronized memory accesses may result in undefined behavior.
For example, the above conditions imply that:
- The lock taken in an iteration *must* be released within the same iteration.
- Communicating between iterations using blocking primitives like `Channel`s is incorrect.
- Write only to locations not shared across iterations (unless a lock or atomic operation is used).
- Write only to locations not shared across iterations (unless a lock or atomic operation is
used).
- The value of [`threadid()`](@ref Threads.threadid) may change even within a single
iteration.
Schedule options are:
- `:dynamic` (default) will schedule iterations dynamically to available worker threads,
assuming that the workload for each iteration is uniform.
- `:static` creates one task per thread and divides the iterations equally among
them, assigning each task specifically to each thread.
Specifying `:static` is an error if used from inside another `@threads` loop
or from a thread other than 1.
## Schedulers
Without the scheduler argument, the exact scheduling is unspecified and varies across Julia releases.
Without the scheduler argument, the exact scheduling is unspecified and varies across Julia
releases. Currently, `:dynamic` is used when the scheduler is not specified.
!!! compat "Julia 1.5"
The `schedule` argument is available as of Julia 1.5.
### `:dynamic` (default)
`:dynamic` scheduler executes iterations dynamically to available worker threads. Current
implementation assumes that the workload for each iteration is uniform. However, this
assumption may be removed in the future.
This scheduling option is merely a hint to the underlying execution mechanism. However, a
few properties can be expected. The number of `Task`s used by `:dynamic` scheduler is
bounded by a small constant multiple of the number of available worker threads
([`nthreads()`](@ref Threads.nthreads)). Each task processes contiguous regions of the
iteration space. Thus, `@threads :dynamic for x in xs; f(x); end` is typically more
efficient than `@sync for x in xs; @spawn f(x); end` if `length(xs)` is significantly
larger than the number of the worker threads and the run-time of `f(x)` is relatively
smaller than the cost of spawning and synchronizaing a task (typically less than 10
microseconds).
Comment on lines +152 to +156
Copy link
Member Author

@tkf tkf Feb 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"typically less than 10 microseconds" is from

julia> Threads.nthreads()
2

julia> @benchmark wait(Threads.@spawn nothing)
BenchmarkTools.Trial: 10000 samples with 4 evaluations.
 Range (min  max):  1.455 μs  53.389 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     9.047 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.004 μs ±  1.242 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                        ██▁   ▁▄▄▂
  ▂▁▁▂▁▁▁▁▂▁▂▁▁▁▂▁▂▂▁▁▁▂▂▂▂▂▂▂▂▂▃▃▄▃▆▆▄▆███▆▅▆█████▆▅▄▄▅▅▅▄▃ ▃
  1.46 μs        Histogram: frequency by time        11.5 μs <

 Memory estimate: 480 bytes, allocs estimate: 5.

I'm looking at the median (and the mean, which is not very different from the median) as a "typical" case. It's more useful than min since the multi-queue is a random and concurrent algorithm.

This is on Linux. Can someone check it with other OSes? Note that it's important to use nthreads() > 1 since nthreads() == 1 is special-cased.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the second part of the statement be reversed?

and the run-time of f(x) is relatively smaller larger than the cost of spawning and synchronizing a task (typically less than 10 microseconds).

Copy link
Contributor

@Moelf Moelf Mar 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it's correct as it is. If the cost of f(x) is larger, then the cost of @spawn would be negligible, so you won't see @threads :dynamic for being more efficient.

!!! compat "Julia 1.8"
The `:dynamic` option for the `schedule` argument is available and the default as of Julia 1.8.
For example, an illustration of the different scheduling strategies where `busywait`
is a non-yielding timed loop that runs for a number of seconds.
### `:static`
`:static` scheduler creates one task per thread and divides the iterations equally among
them, assigning each task specifically to each thread. In particular, the value of
[`threadid()`](@ref Threads.threadid) is guranteed to be constant within one iteration.
Specifying `:static` is an error if used from inside another `@threads` loop or from a
thread other than 1.
!!! note
`:static` scheduling exists for supporting transition of code written before Julia 1.3.
In newly written library functions, `:static` scheduling is discouraged because the
functions using this option cannot be called from arbitrary worker threads.
## Example
To illustrate of the different scheduling strategies, consider the following function
`busywait` containing a non-yielding timed loop that runs for a given number of seconds.
```julia-repl
julia> function busywait(seconds)
Expand Down Expand Up @@ -166,10 +202,6 @@ julia> @time begin
The `:dynamic` example takes 2 seconds since one of the non-occupied threads is able
to run two of the 1-second iterations to complete the for loop.
See also: [`@spawn`](@ref Threads.@spawn), [`nthreads()`](@ref Threads.nthreads),
[`threadid()`](@ref Threads.threadid), `pmap` in [`Distributed`](@ref man-distributed),
`BLAS.set_num_threads` in [`LinearAlgebra`](@ref man-linalg).
"""
macro threads(args...)
na = length(args)
Expand Down