-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
disable_polyester_threads #86
Conversation
Benchmarks with and without it below: ``` using Base.Threads using BenchmarkTools using Revise using Polyester \## function inner(x,y,j) for i ∈ axes(x,1) y[i,j] = sin(x[i,j]) end end function inner_polyester(x,y,j) @Batch for i ∈ axes(x,1) y[i,j] = sin(x[i,j]) end end function inner_thread(x,y,j) @threads for i ∈ axes(x,1) y[i,j] = sin(x[i,j]) end end function sequential_sequential(x,y) for j ∈ axes(x,2) inner(x,y,j) end end function sequential_polyester(x,y) for j ∈ axes(x,2) inner_polyester(x,y,j) end end function sequential_thread(x,y) for j ∈ axes(x,2) inner_thread(x,y,j) end end function threads_of_polyester(x,y) @threads for j ∈ axes(x,2) inner_polyester(x,y,j) end end function threads_of_polyester_inner_disable(x,y) @threads for j ∈ axes(x,2) Polyester.disable_polyester_threads() do inner_polyester(x,y,j) end end end function threads_of_thread(x,y) @threads for j ∈ axes(x,2) inner_thread(x,y,j) end end function threads_of_thread(x,y) @threads for j ∈ axes(x,2) inner_thread(x,y,j) end end function threads_of_sequential(x,y) @threads for j ∈ axes(x,2) inner(x,y,j) end end # Big inner problem, repeated only a few times y = rand(10000000,4); x = rand(size(y)...); @Btime inner($x,$y,1) # 73.319 ms (0 allocations: 0 bytes) @Btime inner_polyester($x,$y,1) # 8.936 ms (0 allocations: 0 bytes) @Btime inner_thread($x,$y,1) # 11.206 ms (49 allocations: 4.56 KiB) @Btime sequential_sequential($x,$y) # 274.926 ms (0 allocations: 0 bytes) @Btime sequential_polyester($x,$y) # 36.963 ms (0 allocations: 0 bytes) @Btime sequential_thread($x,$y) # 49.373 ms (196 allocations: 18.25 KiB) @Btime threads_of_polyester($x,$y) # 78.828 ms (58 allocations: 4.84 KiB) @Btime threads_of_polyester_inner_disable($x,$y) # 70.182 ms (47 allocations: 4.50 KiB) @Btime Polyester.disable_polyester_threads() do; threads_of_polyester($x,$y) end; # 71.141 ms (47 allocations: 4.50 KiB) @Btime threads_of_sequential($x,$y) # 70.857 ms (46 allocations: 4.47 KiB) @Btime threads_of_thread($x,$y) # 45.116 ms (219 allocations: 22.00 KiB) # Small inner problem, repated many times y = rand(1000,1000); x = rand(size(y)...); @Btime inner($x,$y,1) # 7.028 μs (0 allocations: 0 bytes) @Btime inner_polyester($x,$y,1) # 1.917 μs (0 allocations: 0 bytes) @Btime inner_thread($x,$y,1) # 7.544 μs (45 allocations: 4.44 KiB) @Btime sequential_sequential($x,$y) # 6.790 ms (0 allocations: 0 bytes) @Btime sequential_polyester($x,$y) # 2.070 ms (0 allocations: 0 bytes) @Btime sequential_thread($x,$y) # 9.296 ms (49002 allocations: 4.46 MiB) @Btime threads_of_polyester($x,$y) # 2.090 ms (42 allocations: 4.34 KiB) @Btime threads_of_polyester_inner_disable($x,$y) # 1.065 ms (42 allocations: 4.34 KiB) @Btime Polyester.disable_polyester_threads() do; threads_of_polyester($x,$y) end; # 997.918 μs (49 allocations: 4.56 KiB) @Btime threads_of_sequential($x,$y) # 1.057 ms (48 allocations: 4.53 KiB) @Btime threads_of_thread($x,$y) # 4.105 ms (42059 allocations: 4.25 MiB) ```
Codecov Report
@@ Coverage Diff @@
## master #86 +/- ##
=======================================
Coverage 88.46% 88.46%
=======================================
Files 2 2
Lines 416 416
=======================================
Hits 368 368
Misses 48 48 Continue to review full report at Codecov.
|
src/utility.jl
Outdated
t, r = request_threads(num_threads()) | ||
f() | ||
foreach(free_threads!, r) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this should be
t, r = request_threads(num_threads())
try
f()
finally
foreach(free_threads!, r)
end
Also, maybe this should be in PolyesterWeave itself?
It could still be re-exported here.
Should also document that it turns off threading for LoopVectorization.@tturbo
and Octavian.matmul
, even though neither depend on Polyester (but they do depend on PolyesterWeave).
Do benchmark my suggestion though.
If it is slower, feel free to ignore it.
The advantage of my suggestion is that, if your code throws an error, it'll still free the threads. Otherwise, they'll be turned off until you manually call PolyesterWeave.reset_workers!()
.
Polyester.@batch
doesn't use try
/finally
either, meaning it requires the manual calls after you get errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do:
- documenting disabling threads in tturbo and matmul
- checking whether a try block is much slower
- move to PolyesterWeave after the above is done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After moving to PolyesterWeave, you can of course import and reexport in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benchmarks did not show any slowdown with a try
block (especially and obviously if the disable command is executed once in the outermost scope).
Polyester.disable_polyester_threads() do | ||
inner_polyester(x,y,j) | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@threads for j ∈ axes(x,2) | |
Polyester.disable_polyester_threads() do | |
inner_polyester(x,y,j) | |
end | |
end | |
Polyester.disable_polyester_threads() do | |
@threads for j ∈ axes(x,2) | |
inner_polyester(x,y,j) | |
end | |
end |
May as well hoist the disabling and re-enabling out of the loop, to avoid doing atomic operations to the same address in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although that is effectively what you're benchmarking with Polyester.disable_polyester_threads() do; threads_of_polyester($x,$y) end;
, my concern is that people are just going to copy/paste the examples, which means they'll likely use this slightly worse form, because it is there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted! The two versions (inside and outside the loop) were given exactly so that this can be compared. I will add a note to the readme to warn people about using it outside the loop.
@btime Polyester.disable_polyester_threads() do; threads_of_polyester($x,$y) end; # 71.141 ms (47 allocations: 4.50 KiB) | ||
@btime threads_of_sequential($x,$y) # 70.857 ms (46 allocations: 4.47 KiB) | ||
@btime threads_of_thread($x,$y) # 45.116 ms (219 allocations: 22.00 KiB) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this is fastest because you have more than 4 threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I thought it is a reasonable edge case to show.
function inner_thread(x,y,j) | ||
@threads for i ∈ axes(x,1) | ||
y[i,j] = sin(x[i,j]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously just an example, but @tturbo
should win these benchmarks (and would also be single threaded from this!). ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed! I will mention it as a caveat in the benchmarks readme.
The tests will probably fail due to the dependence of the new PolyesterWeave. See here JuliaSIMD/PolyesterWeave.jl#4 |
Great, thanks! |
As mentioned on discourse: https://discourse.julialang.org/t/nesting-threads-thread-and-polyester-batch-or-context-manager-to-limit-polyester-threads/84490/4
Benchmarks with and without it below: