Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase the robustness of device_atomic_ref #275

Merged
merged 1 commit into from
Jun 12, 2024

Conversation

stephenswat
Copy link
Member

While working on acts-project/traccc#595, I found out that the vecmem implementation of atomic CAS is fundamentally broken on CUDA platforms 😟. Currently, the functionality is compare_exchange_strong is broken because it relies on the CUDA atomicCAS builtin which functions fundamentally differently from the C++ STL version of the equivalent code. Indeed, the C++ version returns true on a succesful swap and false otherwise. The CUDA version always returns the old value. As such, if the old value is false-like, e.g. 0, the compare_exchange_strong function will always appear to fail, even if it succeeded. This commit fixes the above issue.

I also removed the backup implementation of CAS as it was not atomic in any way and was basically lying to users about working atomically 😟.

@stephenswat stephenswat added the bug Something isn't working label Jun 6, 2024
@stephenswat stephenswat requested a review from krasznaa June 6, 2024 16:01
@krasznaa
Copy link
Member

krasznaa commented Jun 6, 2024

As you noticed, one cannot just use static_assert(...) in a "non-templated" function of a templated class. As soon as the class is instantiated, the assertion kicks in. It doesn't only happen when the function is called. 😦

If you forego the removal of the naive, non-atomic implementation, then I'll be happy to get this fix in. But I'd rather not open the can of worms with how vecmem::device_atomic_ref should behave on the host before C++20. 😦 I don't think that will lead us anywhere useful.

@stephenswat
Copy link
Member Author

My bad, I had hoped that any of the templates would have been on the function, not on the class, but sadly not. Anyway there will be a bit more work to do here anyway, so I'll come up with a more comprehensive solution.

@krasznaa
Copy link
Member

krasznaa commented Jun 6, 2024

Note that I've been thinking for a while now about introducing cuda::atomic_ref in this code. 🤔 Similar to how we use sycl::atomic_ref, "under the right circumstances" the code should just use cuda::atomic_ref, as is. You should check if you could make that happen.

@stephenswat stephenswat force-pushed the fix/atomiccas branch 2 times, most recently from eaf5991 to 2bc62ed Compare June 7, 2024 12:26
@stephenswat
Copy link
Member Author

Okay, the scope of this PR has grown a little bit to fix a whole bunch of other issues with the atomic references. Also adds additional compile-time checks on the functionality of atomic references as well as runtime tests.

@stephenswat stephenswat changed the title Fix atomic CAS functionality in CUDA Increase the robustness of device_atomic_ref Jun 7, 2024
@stephenswat stephenswat force-pushed the fix/atomiccas branch 2 times, most recently from 1e7e8c3 to 8f4fb2d Compare June 7, 2024 13:21
Copy link
Member

@krasznaa krasznaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm absolutely on board with making this code better. I'm very happy that you're looking into it.

Please fix up all the remaining issues, and then I'll be willing to push in this macro-hell. 🤔 But after that, I'll absolutely want to clean this up.

  • Instead of doing preprocessor magic everywhere, I'll want to have a few different classes called let's say vecmem::details::cuda::atomic_ref, vecmem::details::win32::atomic_ref, etc.
    • The idea being that that would hopefully result in more understandable compiler errors when some preprocessor decision inevitably goes wrong in the future.
  • At that point we could push the implementation of the "host versions" into .cpp files, to avoid exposing the user to let's say <windows.h>. We only want to provide this class for a short list of primitive types anyway.

If you want to give that setup a try, I won't stop you. 😉 But as I started, I'm willing to let the code in with this design as well as a first step. (After the actual issues have been fixed.)

@stephenswat
Copy link
Member Author

Okay, let's see what the MSVC CI thinks of this.

@stephenswat
Copy link
Member Author

Okay so MSVC doesn't support atomics on unsigned integers. 😆

@stephenswat stephenswat force-pushed the fix/atomiccas branch 12 times, most recently from 874ced1 to 53dcf2f Compare June 9, 2024 22:29
@stephenswat stephenswat force-pushed the fix/atomiccas branch 3 times, most recently from 0fbc9ce to 0eebea1 Compare June 9, 2024 22:57
@stephenswat
Copy link
Member Author

Someone explain to me how this commit breaks the synchronized memory resource on release builds in MSVC and in those builds alone.

@stephenswat
Copy link
Member Author

Ah, of course.

Including intrin.h breaks locks and mutexes.

https://github.com/stephenswat/vecmem/actions/runs/9440139525/job/25999074005

🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡

@stephenswat
Copy link
Member Author

I got rid of the MSVC intrinsics and replaced them by a non-atomic implementation of CAS, but at least now there is only one point of fake atomicity in this code.

stephenswat added a commit to stephenswat/vecmem that referenced this pull request Jun 10, 2024
This commit moves `vecmem::memory_order` into its own header in order to
avoid circular dependencies in acts-project#275 and acts-project#276.
stephenswat added a commit to stephenswat/vecmem that referenced this pull request Jun 10, 2024
This commit moves `vecmem::memory_order` into its own header in order to
avoid circular dependencies in acts-project#275 and acts-project#276.
stephenswat added a commit to stephenswat/vecmem that referenced this pull request Jun 10, 2024
This commit moves `vecmem::memory_order` into its own header in order to
avoid circular dependencies in acts-project#275 and acts-project#276.
stephenswat added a commit to stephenswat/vecmem that referenced this pull request Jun 10, 2024
This commit moves `vecmem::memory_order` into its own header in order to
avoid circular dependencies in acts-project#275 and acts-project#276.
stephenswat added a commit to stephenswat/vecmem that referenced this pull request Jun 10, 2024
This commit moves `vecmem::memory_order` into its own header in order to
avoid circular dependencies in acts-project#275 and acts-project#276.
Currently, the functionality is `compare_exchange_strong` is broken
because it relies on the CUDA `atomicCAS` builtin which functions
fundamentally differently from the C++ STL version of the equivalent
code. Indeed, the C++ version returns true on a succesful swap and false
otherwise. The CUDA version always returns the old value. As such, if
the old value is false-like, e.g. 0, the `compare_exchange_strong`
function will always appear to fail, even if it succeeded. This commit
fixes the above issue. Also increases the robustness of other atomic
operations, adds new concepts, and adds new tests.
Copy link
Member

@krasznaa krasznaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this in. Cleanup/improvements to come afterwards.

@krasznaa krasznaa merged commit 013d297 into acts-project:main Jun 12, 2024
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants