-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Will @batch
effect the Threads.@threads
?
#82
Comments
By the way, if I call the two functions separately as follows, the increment of time consuming of for i in 1:20
@timeit "testmul_th" testmul_thread(A,B,C)
end
for i in 1:20
@timeit "testmul_batch" testmul_batch(A,B,C)
end |
This is what julia> using Polyester, LinearAlgebra, TimerOutputs
julia> BLAS.set_num_threads(1)
julia> function testmul_thread(A,B,C)
Threads.@threads for i in 1:96
mul!(C[i],A,B)
end
end
testmul_thread (generic function with 1 method)
julia> function testmul_batch(A,B,C)
@batch per=core for i in 1:96
mul!(C[i],A,B)
end
end
testmul_batch (generic function with 1 method)
julia> @show Threads.nthreads()
Threads.nthreads() = 10
10
julia> function testmuls()
A = rand(100,100)
B = rand(100,100)
C = [rand(100,100) for i in 1:96]
for i in 1:20
@timeit "testmul_th" testmul_thread(A,B,C)
@timeit "testmul_batch" testmul_batch(A,B,C)
end
end
testmuls (generic function with 1 method)
julia> testmuls()
julia> reset_timer!();
julia> testmuls()
julia> TimerOutputs.get_defaulttimer()
──────────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 16.4s / 0.4% 49.9MiB / 0.2%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────────
testmul_th 20 52.2ms 86.1% 2.61ms 109KiB 98.6% 5.46KiB
testmul_batch 20 8.41ms 13.9% 420μs 1.56KiB 1.4% 80.0B
──────────────────────────────────────────────────────────────────────────
julia> using PolyesterWeave
julia> using ThreadingUtilities
julia> function testmuls2()
A = rand(100,100)
B = rand(100,100)
C = [rand(100,100) for i in 1:96]
for i in 1:20
@timeit "testmul_th" testmul_thread(A,B,C)
@timeit "testmul_batch" testmul_batch(A,B,C)
ThreadingUtilities.sleep_all_tasks()
end
end
testmuls2 (generic function with 1 method)
julia> testmuls2()
julia> reset_timer!();
julia> testmuls2()
julia> TimerOutputs.get_defaulttimer()
──────────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 3.36s / 2.9% 7.68MiB / 1.5%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────────
testmul_batch 20 53.3ms 53.8% 2.66ms 1.56KiB 1.3% 80.0B
testmul_th 20 45.8ms 46.2% 2.29ms 116KiB 98.7% 5.78KiB
────────────────────────────────────────────────────────────────────────── In this example, it does not seem to help much julia> function testmulsthread()
A = rand(100,100)
B = rand(100,100)
C = [rand(100,100) for i in 1:96]
for i in 1:20
@timeit "testmul_th" testmul_thread(A,B,C)
#@timeit "testmul_batch" testmul_batch(A,B,C)
#ThreadingUtilities.sleep_all_tasks()
end
end
testmulsthread (generic function with 1 method)
julia> testmulsthread()
julia> reset_timer!();
julia> testmulsthread()
julia> TimerOutputs.get_defaulttimer()
───────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 4.40s / 0.7% 7.68MiB / 1.5%
Section ncalls time %tot avg alloc %tot avg
───────────────────────────────────────────────────────────────────────
testmul_th 20 30.2ms 100.0% 1.51ms 116KiB 100.0% 5.82KiB
─────────────────────────────────────────────────────────────────────── and it hurts the performance of |
Thank you very much. julia> versioninfo()
Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: AMD Ryzen 7 4800U with Radeon Graphics
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, znver2) When use function testmuls()
A = rand(100,100)
B = rand(100,100)
C = [rand(100,100) for i in 1:96]
for i in 1:20
@timeit "testmul_th" testmul_thread(A,B,C)
# @timeit "testmul_batch" testmul_batch(A,B,C)
# ThreadingUtilities.sleep_all_tasks()
end
end is ───────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 60.5ms / 82.3% 7.55MiB / 0.8%
Section ncalls time %tot avg alloc %tot avg
───────────────────────────────────────────────────────────────────────
testmul_th 20 49.8ms 100.0% 2.49ms 63.8KiB 100.0% 3.19KiB
─────────────────────────────────────────────────────────────────────── and the result of function function testmuls()
A = rand(100,100)
B = rand(100,100)
C = [rand(100,100) for i in 1:96]
for i in 1:20
# @timeit "testmul_th" testmul_thread(A,B,C)
@timeit "testmul_batch" testmul_batch(A,B,C)
# ThreadingUtilities.sleep_all_tasks()
end
end is ──────────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 68.6ms / 64.2% 7.49MiB / 0.0%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────────
testmul_batch 20 44.0ms 100.0% 2.20ms 1.56KiB 100.0% 80.0B
────────────────────────────────────────────────────────────────────────── Finally, the result of function function testmuls()
A = rand(100,100)
B = rand(100,100)
C = [rand(100,100) for i in 1:96]
for i in 1:20
@timeit "testmul_th" testmul_thread(A,B,C)
@timeit "testmul_batch" testmul_batch(A,B,C)
ThreadingUtilities.sleep_all_tasks()
end
end
is about ──────────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 97.0ms / 96.7% 7.56MiB / 0.8%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────────
testmul_batch 20 49.2ms 52.5% 2.46ms 1.56KiB 2.4% 80.0B
testmul_th 20 44.5ms 47.5% 2.23ms 63.3KiB 97.6% 3.17KiB
────────────────────────────────────────────────────────────────────────── I wonder why your result of first time run of |
Hmm, I'm not sure what went wrong earlier. julia> using Polyester, LinearAlgebra, TimerOutputs
[ Info: Precompiling Polyester [f517fe37-dbe3-4b94-8317-1923a5111588]
ROOT : 3.36 % 104349936
LOWERING : 5.43 % 168795352
PARSING : 4.35 % 135114722
INFERENCE : 0.67 % 20892370
CODEGEN : 0.66 % 20442216
METHOD_LOOKUP_SLOW : 0.01 % 265948
METHOD_LOOKUP_FAST : 0.08 % 2344030
LLVM_OPT : 0.84 % 26267292
LLVM_MODULE_FINISH : 6.51 % 202523080
METHOD_MATCH : 7.32 % 227515996
TYPE_CACHE_LOOKUP : 0.46 % 14337602
TYPE_CACHE_INSERT : 0.06 % 1908924
MACRO_INVOCATION : 0.03 % 805162
AST_COMPRESS : 0.11 % 3455434
AST_UNCOMPRESS : 0.11 % 3338902
SYSIMG_LOAD : 7.43 % 231040270
ADD_METHOD : 51.56 % 1602959664
LOAD_MODULE : 6.18 % 192237612
SAVE_MODULE : 3.93 % 122197110
INIT_MODULE : 0.91 % 28354874
julia> BLAS.set_num_threads(1)
julia> function testmul_thread(A,B,C)
Threads.@threads for i in 1:96
mul!(C[i],A,B)
end
end
testmul_thread (generic function with 1 method)
julia> function testmul_batch(A,B,C)
@batch per=core for i in 1:96
mul!(C[i],A,B)
end
end
testmul_batch (generic function with 1 method)
julia> @show Threads.nthreads()
Threads.nthreads() = 10
10
julia> function testmuls()
A = rand(100,100)
B = rand(100,100)
C = [rand(100,100) for i in 1:96]
for i in 1:20
@timeit "testmul_th" testmul_thread(A,B,C)
@timeit "testmul_batch" testmul_batch(A,B,C)
end
end
testmuls (generic function with 1 method)
julia> testmuls()
julia> reset_timer!();
julia> testmuls()
julia> TimerOutputs.get_defaulttimer()
──────────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 654ms / 9.0% 49.2MiB / 0.2%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────────
testmul_th 20 52.2ms 89.0% 2.61ms 109KiB 98.6% 5.44KiB
testmul_batch 20 6.47ms 11.0% 323μs 1.56KiB 1.4% 80.0B
──────────────────────────────────────────────────────────────────────────
julia> using PolyesterWeave
julia> using ThreadingUtilities
julia> function testmuls2()
A = rand(100,100)
B = rand(100,100)
C = [rand(100,100) for i in 1:96]
for i in 1:20
@timeit "testmul_th" testmul_thread(A,B,C)
@timeit "testmul_batch" testmul_batch(A,B,C)
ThreadingUtilities.sleep_all_tasks()
end
end
testmuls2 (generic function with 1 method)
julia> testmuls2()
julia> reset_timer!();
julia> testmuls2()
julia> TimerOutputs.get_defaulttimer()
──────────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 4.52s / 0.4% 9.63MiB / 1.2%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────────
testmul_th 20 8.14ms 50.1% 407μs 120KiB 98.7% 6.01KiB
testmul_batch 20 8.10ms 49.9% 405μs 1.56KiB 1.3% 80.0B
──────────────────────────────────────────────────────────────────────────
julia> reset_timer!();
julia> testmuls()
julia> TimerOutputs.get_defaulttimer()
──────────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 10.0s / 0.6% 8.18MiB / 1.3%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────────
testmul_th 20 50.3ms 83.4% 2.52ms 109KiB 98.6% 5.46KiB
testmul_batch 20 10.0ms 16.6% 500μs 1.56KiB 1.4% 80.0B
──────────────────────────────────────────────────────────────────────────
julia> reset_timer!();
julia> testmuls2()
julia> TimerOutputs.get_defaulttimer()
──────────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 5.98s / 0.3% 11.3MiB / 1.1%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────────
testmul_batch 20 9.81ms 50.4% 490μs 1.56KiB 1.3% 80.0B
testmul_th 20 9.64ms 49.6% 482μs 120KiB 98.7% 6.02KiB
──────────────────────────────────────────────────────────────────────────
julia> reset_timer!();
julia> testmuls()
julia> TimerOutputs.get_defaulttimer()
──────────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 71.6ms / 84.4% 7.63MiB / 1.4%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────────
testmul_th 20 50.3ms 83.2% 2.51ms 109KiB 98.6% 5.46KiB
testmul_batch 20 10.1ms 16.8% 507μs 1.56KiB 1.4% 80.0B
──────────────────────────────────────────────────────────────────────────
julia> reset_timer!();
julia> testmuls2()
julia> TimerOutputs.get_defaulttimer()
──────────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 23.3ms / 92.0% 7.64MiB / 1.5%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────────
testmul_th 20 11.6ms 54.3% 581μs 119KiB 98.7% 5.95KiB
testmul_batch 20 9.78ms 45.7% 489μs 1.56KiB 1.3% 80.0B
──────────────────────────────────────────────────────────────────────────
julia> reset_timer!();
julia> testmuls()
julia> TimerOutputs.get_defaulttimer()
──────────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 64.9ms / 92.7% 7.63MiB / 1.4%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────────
testmul_th 20 50.0ms 83.1% 2.50ms 109KiB 98.6% 5.46KiB
testmul_batch 20 10.2ms 16.9% 509μs 1.56KiB 1.4% 80.0B
──────────────────────────────────────────────────────────────────────────
julia> reset_timer!();
julia> testmuls2()
julia> TimerOutputs.get_defaulttimer()
──────────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 25.4ms / 83.7% 7.64MiB / 1.6%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────────
testmul_th 20 11.5ms 54.3% 576μs 120KiB 98.7% 6.00KiB
testmul_batch 20 9.71ms 45.7% 485μs 1.56KiB 1.3% 80.0B
──────────────────────────────────────────────────────────────────────────
julia> reset_timer!();
julia> testmuls()
julia> TimerOutputs.get_defaulttimer()
──────────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 65.3ms / 92.5% 7.63MiB / 1.4%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────────
testmul_th 20 50.3ms 83.3% 2.51ms 109KiB 98.6% 5.46KiB
testmul_batch 20 10.1ms 16.7% 504μs 1.56KiB 1.4% 80.0B
──────────────────────────────────────────────────────────────────────────
julia> reset_timer!();
julia> testmuls2()
julia> TimerOutputs.get_defaulttimer()
──────────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 25.6ms / 83.9% 7.64MiB / 1.6%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────────
testmul_th 20 11.7ms 54.5% 585μs 120KiB 98.7% 6.01KiB
testmul_batch 20 9.77ms 45.5% 489μs 1.56KiB 1.3% 80.0B
──────────────────────────────────────────────────────────────────────────
julia> reset_timer!();
julia> testmuls()
julia> TimerOutputs.get_defaulttimer()
──────────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 67.0ms / 90.4% 7.63MiB / 1.4%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────────
testmul_th 20 50.3ms 83.1% 2.52ms 109KiB 98.6% 5.46KiB
testmul_batch 20 10.2ms 16.9% 511μs 1.56KiB 1.4% 80.0B
──────────────────────────────────────────────────────────────────────────
julia> reset_timer!();
julia> testmuls2()
julia> TimerOutputs.get_defaulttimer()
──────────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 23.7ms / 93.2% 7.64MiB / 1.6%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────────
testmul_th 20 12.4ms 56.0% 618μs 120KiB 98.7% 6.00KiB
testmul_batch 20 9.70ms 44.0% 485μs 1.56KiB 1.3% 80.0B
──────────────────────────────────────────────────────────────────────────
julia> versioninfo()
Julia Version 1.9.0-DEV.824
Commit 5a0685a8a6* (2022-06-21 07:43 UTC)
Platform Info:
OS: Linux (x86_64-redhat-linux)
CPU: 28 × Intel(R) Core(TM) i9-9940X CPU @ 3.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.3 (ORCJIT, skylake-avx512)
Threads: 10 on 28 virtual cores Now I see a good improvement from My times are much faster because my CPU has >10 physical cores and AVX512 (14 physical cores, but I only used 10 threads so that we get 2 iterations/thread). |
May be not 2 iterations/thread? Because the |
Yes, 96. I should've looked at the code. |
When I used
@batch
andThreads.@threads
concurrently, I find a confusing problem.The number of threads is 6. When I only call the function
testmul_thread()
intestmuls()
, the time consumed only94.6ms
.But, when I call concurrently
testmul_thread()
andtestmul_batch()
, the time consumed oftestmul_thread()
will increase to417ms
:Why the run of function
testmul_batch
will effecttestmul_thread
? Dose@batch
change something about global configuration of julia threads?Thank you very much.
The text was updated successfully, but these errors were encountered: