From 8ea93b54dd759c834edd607f2e1ea42de1385b3e Mon Sep 17 00:00:00 2001 From: Takafumi Arakaki Date: Mon, 14 Feb 2022 03:01:43 -0500 Subject: [PATCH 1/6] Clarify the behavior of `@threads for` --- base/threadingconstructs.jl | 88 ++++++++++++++++++++++++------------- 1 file changed, 58 insertions(+), 30 deletions(-) diff --git a/base/threadingconstructs.jl b/base/threadingconstructs.jl index 9ed416caec2a6..051d4fc14d406 100644 --- a/base/threadingconstructs.jl +++ b/base/threadingconstructs.jl @@ -99,46 +99,78 @@ end """ Threads.@threads [schedule] for ... end -A macro to parallelize a `for` loop to run with multiple threads. Splits the iteration -space among multiple tasks and runs those tasks on threads according to a scheduling -policy. -A barrier is placed at the end of the loop which waits for all tasks to finish -execution. - -The `schedule` argument can be used to request a particular scheduling policy. - -Except for `:static` scheduling, how the iterations are assigned to tasks, and how the tasks -are assigned to the worker threads is undefined. The exact assignments can be different -for each execution. The scheduling option is a hint. The loop body code (including any code -transitively called from it) must not make assumptions about the distribution of iterations -to tasks or the worker thread in which they are executed. The loop body for each iteration -must be able to make forward progress independent of other iterations and be free from data -races. As such, synchronizations across iterations may deadlock. +A macro to execute a `for` loop in parallel. The iteration space is distributed to +coarse-grained tasks. This policy can be specified by the `schedule` argument. The +execution of the loop waits for the evaluation of all iterations. + +See also: [`@spawn`](@ref Threads.@spawn), +`pmap` in [`Distributed`](@ref man-distributed), and +`BLAS.set_num_threads` in [`LinearAlgebra`](@ref man-linalg). + +# Extended help + +## Semantics + +Unless stronger guarantees are specified by the scheduling option, the loop executed by +`@threads` macro have the following semantics. + +The `@threads` macro executes the loop body in an unspecified order and potentially +concurrently. It does not specify the exact assignments of the tasks and the worker threads. +The assignments can be different for each execution. The loop body code (including any code +transitively called from it) must not make any assumptions about the distribution of +iterations to tasks or the worker thread in which they are executed. The loop body for each +iteration must be able to make forward progress independent of other iterations and be free +from data races. As such, invalid synchronizations across iterations may deadlock while +unsynchronized memory accesses may result in undefined behavior. For example, the above conditions imply that: - The lock taken in an iteration *must* be released within the same iteration. - Communicating between iterations using blocking primitives like `Channel`s is incorrect. -- Write only to locations not shared across iterations (unless a lock or atomic operation is used). +- Write only to locations not shared across iterations (unless a lock or atomic operation is + used). +- The value of [`threadid()`](@ref Threads.threadid) may be changed even within a single + iteration. -Schedule options are: -- `:dynamic` (default) will schedule iterations dynamically to available worker threads, - assuming that the workload for each iteration is uniform. -- `:static` creates one task per thread and divides the iterations equally among - them, assigning each task specifically to each thread. - Specifying `:static` is an error if used from inside another `@threads` loop - or from a thread other than 1. +## Schedulers -Without the scheduler argument, the exact scheduling is unspecified and varies across Julia releases. +Without the scheduler argument, the exact scheduling is unspecified and varies across Julia +releases. Currently, `:dynamic` is used when the scheduler is not specified. !!! compat "Julia 1.5" The `schedule` argument is available as of Julia 1.5. +### `:dynamic` (default) + +`:dynamic` scheduler executes iterations dynamically to available worker threads, assuming +that the workload for each iteration is uniform. + +This scheduling option is merely a hint to the underlying execution mechanism. However, a +few properties can be assumed for understanding when to use this scheduling option. The +number of `Task`s used by `:dynamic` scheduler is bounded by a small constant multiple of +the number of available worker threads ([`nthreads()`](@ref Threads.nthreads)). Each task +processes contiguous regions of the iteration space. + !!! compat "Julia 1.8" The `:dynamic` option for the `schedule` argument is available and the default as of Julia 1.8. -For example, an illustration of the different scheduling strategies where `busywait` -is a non-yielding timed loop that runs for a number of seconds. +### `:static` + +`:static` scheduler creates one task per thread and divides the iterations equally among +them, assigning each task specifically to each thread. In particular, the value of +[`threadid()`](@ref Threads.threadid) is guranteed to be constant within one iteration. +Specifying `:static` is an error if used from inside another `@threads` loop or from a +thread other than 1. + +!!! note + `:static` scheduling exists for supporting transition of code written before Julia 1.3. + In newly written library functions, `:static` scheduling is discouraged because the + functions using this option cannot be called from arbitrary worker threads. + +## Example + +To illustrate of the different scheduling strategies, consider the following function +`busywait` containing a non-yielding timed loop that runs for a given number of seconds. ```julia-repl julia> function busywait(seconds) @@ -166,10 +198,6 @@ julia> @time begin The `:dynamic` example takes 2 seconds since one of the non-occupied threads is able to run two of the 1-second iterations to complete the for loop. - -See also: [`@spawn`](@ref Threads.@spawn), [`nthreads()`](@ref Threads.nthreads), -[`threadid()`](@ref Threads.threadid), `pmap` in [`Distributed`](@ref man-distributed), -`BLAS.set_num_threads` in [`LinearAlgebra`](@ref man-linalg). """ macro threads(args...) na = length(args) From 79288b464b11b705ef1c4e1cda0c1c74becc4f99 Mon Sep 17 00:00:00 2001 From: Takafumi Arakaki Date: Mon, 14 Feb 2022 20:09:06 +0900 Subject: [PATCH 2/6] Update base/threadingconstructs.jl Co-authored-by: Ian Butterworth --- base/threadingconstructs.jl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/base/threadingconstructs.jl b/base/threadingconstructs.jl index 051d4fc14d406..7d62801d07ca4 100644 --- a/base/threadingconstructs.jl +++ b/base/threadingconstructs.jl @@ -129,7 +129,7 @@ For example, the above conditions imply that: - Communicating between iterations using blocking primitives like `Channel`s is incorrect. - Write only to locations not shared across iterations (unless a lock or atomic operation is used). -- The value of [`threadid()`](@ref Threads.threadid) may be changed even within a single +- The value of [`threadid()`](@ref Threads.threadid) may change even within a single iteration. ## Schedulers From 784686001e6d82019ff362467fffc43799ba17df Mon Sep 17 00:00:00 2001 From: Takafumi Arakaki Date: Wed, 16 Feb 2022 00:50:16 -0800 Subject: [PATCH 3/6] Stop mentioning `BLAS.set_num_threads` for now --- base/threadingconstructs.jl | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/base/threadingconstructs.jl b/base/threadingconstructs.jl index 7d62801d07ca4..394211133f52a 100644 --- a/base/threadingconstructs.jl +++ b/base/threadingconstructs.jl @@ -103,9 +103,8 @@ A macro to execute a `for` loop in parallel. The iteration space is distributed coarse-grained tasks. This policy can be specified by the `schedule` argument. The execution of the loop waits for the evaluation of all iterations. -See also: [`@spawn`](@ref Threads.@spawn), -`pmap` in [`Distributed`](@ref man-distributed), and -`BLAS.set_num_threads` in [`LinearAlgebra`](@ref man-linalg). +See also: [`@spawn`](@ref Threads.@spawn) and +`pmap` in [`Distributed`](@ref man-distributed). # Extended help From 8f207374225175c9b5a811b77091b14c46f0673f Mon Sep 17 00:00:00 2001 From: Takafumi Arakaki Date: Wed, 16 Feb 2022 18:10:17 +0900 Subject: [PATCH 4/6] Explain `:dynamic` more --- base/threadingconstructs.jl | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/base/threadingconstructs.jl b/base/threadingconstructs.jl index 394211133f52a..9ea2fed461ba9 100644 --- a/base/threadingconstructs.jl +++ b/base/threadingconstructs.jl @@ -145,10 +145,14 @@ releases. Currently, `:dynamic` is used when the scheduler is not specified. that the workload for each iteration is uniform. This scheduling option is merely a hint to the underlying execution mechanism. However, a -few properties can be assumed for understanding when to use this scheduling option. The -number of `Task`s used by `:dynamic` scheduler is bounded by a small constant multiple of -the number of available worker threads ([`nthreads()`](@ref Threads.nthreads)). Each task -processes contiguous regions of the iteration space. +few properties can be expected. The number of `Task`s used by `:dynamic` scheduler is +bounded by a small constant multiple of the number of available worker threads +([`nthreads()`](@ref Threads.nthreads)). Each task processes contiguous regions of the +iteration space. Thus, `@threads :dynamic for x in xs; f(x); end` is typically more +efficient than `@sync for x in xs; @spawn f(x); end` if `length(xs)` is significantly +larger than the number of the worker threads and the run-time of `f(x)` is relatively +smaller than the cost of spawning and synchronizaing a task (which can be estimated roughly +by `wait(@spawn nothing)`). !!! compat "Julia 1.8" The `:dynamic` option for the `schedule` argument is available and the default as of Julia 1.8. From 0f74971467333b698e3211163054ce4aee6591db Mon Sep 17 00:00:00 2001 From: Takafumi Arakaki Date: Wed, 23 Feb 2022 04:28:36 -0500 Subject: [PATCH 5/6] Clarify that `:dynamic` can be more dynamic in the future --- base/threadingconstructs.jl | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/base/threadingconstructs.jl b/base/threadingconstructs.jl index 9ea2fed461ba9..048deddebab60 100644 --- a/base/threadingconstructs.jl +++ b/base/threadingconstructs.jl @@ -141,8 +141,9 @@ releases. Currently, `:dynamic` is used when the scheduler is not specified. ### `:dynamic` (default) -`:dynamic` scheduler executes iterations dynamically to available worker threads, assuming -that the workload for each iteration is uniform. +`:dynamic` scheduler executes iterations dynamically to available worker threads. Current +implementation assumes that the workload for each iteration is uniform. However, this +assumption may be removed in the future. This scheduling option is merely a hint to the underlying execution mechanism. However, a few properties can be expected. The number of `Task`s used by `:dynamic` scheduler is From c9ca5b18e98ae761ac10d3cc4857fb4aa6fa06b5 Mon Sep 17 00:00:00 2001 From: Takafumi Arakaki Date: Wed, 23 Feb 2022 04:43:07 -0500 Subject: [PATCH 6/6] Put actual number for spawn/sync cost --- base/threadingconstructs.jl | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/base/threadingconstructs.jl b/base/threadingconstructs.jl index 048deddebab60..a3413701fb7de 100644 --- a/base/threadingconstructs.jl +++ b/base/threadingconstructs.jl @@ -152,8 +152,8 @@ bounded by a small constant multiple of the number of available worker threads iteration space. Thus, `@threads :dynamic for x in xs; f(x); end` is typically more efficient than `@sync for x in xs; @spawn f(x); end` if `length(xs)` is significantly larger than the number of the worker threads and the run-time of `f(x)` is relatively -smaller than the cost of spawning and synchronizaing a task (which can be estimated roughly -by `wait(@spawn nothing)`). +smaller than the cost of spawning and synchronizaing a task (typically less than 10 +microseconds). !!! compat "Julia 1.8" The `:dynamic` option for the `schedule` argument is available and the default as of Julia 1.8.