Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][CUDA] Default block size attempts to maximize occupancy #2724

Merged
merged 1 commit into from
Nov 6, 2020

Conversation

Ruyk
Copy link
Contributor

@Ruyk Ruyk commented Nov 3, 2020

Using the cuOccupancyMaxPotentialBlockSize function from the CUDA driver, this patch tries to find a better block size for the default configuration. It takes into account the kernel properties and the dynamic local memory size required by the kernel.

@Ruyk Ruyk requested a review from a team as a code owner November 3, 2020 10:47
@Ruyk Ruyk requested a review from romanovvlad November 3, 2020 10:47
@Ruyk
Copy link
Contributor Author

Ruyk commented Nov 3, 2020

I have not seen any performance improvements on BabelStream with this, but I have a very old NVDIA GPU on my desktop, Maybe @jeffhammond can run it in some other configs and see if its any better than the default?

EDIT: Got the wrong intel person initially :-)

Copy link
Contributor

@jbrodman jbrodman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I tested this and it picks much better block sizes for nstream (256) and babelstream (1024) now.

@romanovvlad romanovvlad merged commit 4fabfd1 into intel:sycl Nov 6, 2020
@Ruyk Ruyk deleted the guess_local_size branch November 9, 2020 10:54
@bader bader added the cuda CUDA back-end label Apr 20, 2021
jsji pushed a commit that referenced this pull request Oct 10, 2024
Avoid deprecation warnings after LLVM commit 2f50b28
("[DebugInfo] Enable deprecation of iterator-insertion methods
(#102608)", 2024-09-20).

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@374eb9d8a4ee0ec
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda CUDA back-end
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants