[SYCL][CUDA] Default block size attempts to maximize occupancy #2724

Ruyk · 2020-11-03T10:47:51Z

Using the cuOccupancyMaxPotentialBlockSize function from the CUDA driver, this patch tries to find a better block size for the default configuration. It takes into account the kernel properties and the dynamic local memory size required by the kernel.

Ruyk · 2020-11-03T10:49:52Z

I have not seen any performance improvements on BabelStream with this, but I have a very old NVDIA GPU on my desktop, Maybe @jeffhammond can run it in some other configs and see if its any better than the default?

EDIT: Got the wrong intel person initially :-)

jbrodman

LGTM. I tested this and it picks much better block sizes for nstream (256) and babelstream (1024) now.

Avoid deprecation warnings after LLVM commit 2f50b28 ("[DebugInfo] Enable deprecation of iterator-insertion methods (#102608)", 2024-09-20). Original commit: KhronosGroup/SPIRV-LLVM-Translator@374eb9d8a4ee0ec

Initial commit of local work group size optimization

11c0cec

Ruyk requested a review from a team as a code owner November 3, 2020 10:47

Ruyk requested a review from romanovvlad November 3, 2020 10:47

Ruyk mentioned this pull request Nov 3, 2020

add nd_range to SYCL UoB-HPC/BabelStream#83

Closed

jbrodman self-requested a review November 3, 2020 17:51

jbrodman approved these changes Nov 3, 2020

View reviewed changes

romanovvlad approved these changes Nov 6, 2020

View reviewed changes

romanovvlad merged commit 4fabfd1 into intel:sycl Nov 6, 2020

Ruyk deleted the guess_local_size branch November 9, 2020 10:54

bader added the cuda CUDA back-end label Apr 20, 2021

jingwan2 mentioned this pull request Feb 22, 2022

[CUDA Backend] sparse matrix multiplication 70% performance regression by https://github.com/intel/llvm/pull/2724 #5627

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL][CUDA] Default block size attempts to maximize occupancy #2724

[SYCL][CUDA] Default block size attempts to maximize occupancy #2724

Uh oh!

Ruyk commented Nov 3, 2020

Uh oh!

Ruyk commented Nov 3, 2020 •

edited

Loading

Uh oh!

jbrodman left a comment

Uh oh!

Uh oh!

[SYCL][CUDA] Default block size attempts to maximize occupancy #2724

[SYCL][CUDA] Default block size attempts to maximize occupancy #2724

Uh oh!

Conversation

Ruyk commented Nov 3, 2020

Uh oh!

Ruyk commented Nov 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbrodman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Ruyk commented Nov 3, 2020 •

edited

Loading