Improve block and thread calculations and invoke only if in range #76

PhilipFackler · 2024-04-18T17:14:39Z

This only applies to 1d parallel_for in CUDA. If this is acceptable to you all, a similar thing should be done where else it's applicable.

PhilipFackler · 2024-04-18T17:17:41Z

See #57. This is a problem in 1d also. It's actually not "wrong". It's just that if your N is greater than the threads per block and not divisible by the same, you pass indices to the kernel that are out of bounds. Adding the conditional doesn't seem to affect performance, which makes sense, since all threads in a block (except the last one) would follow the true branch.

williamfgc · 2024-04-18T18:33:59Z

Thanks @PhilipFackler eventually we will revisit this when doing performance testing, with the apps we are targeting.

williamfgc · 2024-04-18T18:34:05Z

Test this please

williamfgc · 2024-04-18T18:35:03Z

ext/JACCCUDA/JACCCUDA.jl

+    maxPossibleThreads = CUDA.maxthreads(parallel_kernel)
+    threads = min(N, maxPossibleThreads)
+    blocks = ceil(Int, N / threads)
+    parallel_kernel(parallel_kargs...; threads=threads, blocks=blocks)


Is this guarantee to be synchronized?

I don't know :)

williamfgc · 2024-04-18T18:38:19Z

Also, are we missing the same for AMDGPU.jl?

PhilipFackler · 2024-04-18T18:41:13Z

Also, are we missing the same for AMDGPU.jl?

Yes, but I don't have a way to check that out locally

Improve block and thread calculations and invoke only if in range

abe26c3

PhilipFackler requested a review from williamfgc April 18, 2024 17:14

williamfgc approved these changes Apr 18, 2024

View reviewed changes

williamfgc merged commit 6a07365 into JuliaORNL:main Apr 18, 2024
6 checks passed

PhilipFackler mentioned this pull request Jun 5, 2024

Improve threads and blocks tuning for 2D parallel_for (CUDA) #100

Closed

PhilipFackler mentioned this pull request Nov 1, 2024

WIP: Better blocks/threads calculations for CUDA backend #136

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve block and thread calculations and invoke only if in range #76

Improve block and thread calculations and invoke only if in range #76

PhilipFackler commented Apr 18, 2024

PhilipFackler commented Apr 18, 2024 •

edited

Loading

williamfgc commented Apr 18, 2024

williamfgc commented Apr 18, 2024

williamfgc Apr 18, 2024

PhilipFackler Apr 18, 2024

williamfgc commented Apr 18, 2024

PhilipFackler commented Apr 18, 2024

Improve block and thread calculations and invoke only if in range #76

Improve block and thread calculations and invoke only if in range #76

Conversation

PhilipFackler commented Apr 18, 2024

PhilipFackler commented Apr 18, 2024 • edited Loading

williamfgc commented Apr 18, 2024

williamfgc commented Apr 18, 2024

williamfgc Apr 18, 2024

Choose a reason for hiding this comment

PhilipFackler Apr 18, 2024

Choose a reason for hiding this comment

williamfgc commented Apr 18, 2024

PhilipFackler commented Apr 18, 2024

PhilipFackler commented Apr 18, 2024 •

edited

Loading