Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clusterization Bugfix, main branch (2024.09.21.) #708

Merged

Conversation

krasznaa
Copy link
Member

This PR is just here to help in the explanation of a bug that I'll open an issue about shortly.

@krasznaa krasznaa added the bug Something isn't working label Sep 21, 2024
@krasznaa krasznaa changed the title Clusterization Bug Demonstrator, main branch (2024.09.21.) Clusterization Bugfix, main branch (2024.09.21.) Sep 21, 2024
@krasznaa krasznaa marked this pull request as ready for review September 21, 2024 14:23
When the last-but-one thread block ends up claiming elements
until the very end of the cell container, the very last thread
block ends up not needing to do anything. This was not handled
correctly in the code so far.
@krasznaa krasznaa force-pushed the ClusterizationBug-main-20240921 branch from 6e7e34d to b11790e Compare September 21, 2024 14:25
@stephenswat
Copy link
Member

Looks like you found it. 👍

@krasznaa
Copy link
Member Author

Indeed. No thanks to cuda-gdb with that one... 😦 Even though in Debug mode we don't ask for any optimizations from nvcc, the compiled code kept behaving very weirdly still... 😕

It was the debug SYCL build, ran on the host CPU, that let me finally understand the issue. As in that one gdb-oneapi was actually giving me meaningful values for the variables in the problematic thread. 🤔

So we may not want to get rid of SYCL all too soon, after everything that's been said recently.

@stephenswat
Copy link
Member

Closes #709.

@stephenswat stephenswat linked an issue Sep 21, 2024 that may be closed by this pull request
@stephenswat
Copy link
Member

Indeed. No thanks to cuda-gdb with that one... 😦 Even though in Debug mode we don't ask for any optimizations from nvcc, the compiled code kept behaving very weirdly still... 😕

Yes I have been seeing this too, cuda-gdb (and to a lesser extent compute-sanitizer) have been very unhelpful lately. I wonder if some update on NVIDIA's end broke the tools or if we're somehow using the wrong compiler flags? 🤔

@krasznaa
Copy link
Member Author

Maybe we need to use -O0 explicitly now? 🤔 Since CMake doesn't do that automatically.

Host compilers of course don't do any optimizations unless asked for it, but maybe nvcc now adopted icpx-es behaviour, that if you don't ask for anything, it tries to use some aggressive optimization to "help you"... 😕

@stephenswat stephenswat merged commit 5a2015a into acts-project:main Sep 21, 2024
21 of 24 checks passed
@krasznaa krasznaa deleted the ClusterizationBug-main-20240921 branch September 21, 2024 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Device Clusterization Bug, main branch (2024.09.21.)
2 participants