Increase the robustness of `device_atomic_ref` #275

stephenswat · 2024-06-06T16:01:11Z

While working on acts-project/traccc#595, I found out that the vecmem implementation of atomic CAS is fundamentally broken on CUDA platforms 😟. Currently, the functionality is compare_exchange_strong is broken because it relies on the CUDA atomicCAS builtin which functions fundamentally differently from the C++ STL version of the equivalent code. Indeed, the C++ version returns true on a succesful swap and false otherwise. The CUDA version always returns the old value. As such, if the old value is false-like, e.g. 0, the compare_exchange_strong function will always appear to fail, even if it succeeded. This commit fixes the above issue.

I also removed the backup implementation of CAS as it was not atomic in any way and was basically lying to users about working atomically 😟.

krasznaa · 2024-06-06T18:24:32Z

As you noticed, one cannot just use static_assert(...) in a "non-templated" function of a templated class. As soon as the class is instantiated, the assertion kicks in. It doesn't only happen when the function is called. 😦

If you forego the removal of the naive, non-atomic implementation, then I'll be happy to get this fix in. But I'd rather not open the can of worms with how vecmem::device_atomic_ref should behave on the host before C++20. 😦 I don't think that will lead us anywhere useful.

stephenswat · 2024-06-06T18:26:40Z

My bad, I had hoped that any of the templates would have been on the function, not on the class, but sadly not. Anyway there will be a bit more work to do here anyway, so I'll come up with a more comprehensive solution.

krasznaa · 2024-06-06T18:31:06Z

Note that I've been thinking for a while now about introducing cuda::atomic_ref in this code. 🤔 Similar to how we use sycl::atomic_ref, "under the right circumstances" the code should just use cuda::atomic_ref, as is. You should check if you could make that happen.

stephenswat · 2024-06-07T12:27:42Z

Okay, the scope of this PR has grown a little bit to fix a whole bunch of other issues with the atomic references. Also adds additional compile-time checks on the functionality of atomic references as well as runtime tests.

krasznaa

I'm absolutely on board with making this code better. I'm very happy that you're looking into it.

Please fix up all the remaining issues, and then I'll be willing to push in this macro-hell. 🤔 But after that, I'll absolutely want to clean this up.

Instead of doing preprocessor magic everywhere, I'll want to have a few different classes called let's say vecmem::details::cuda::atomic_ref, vecmem::details::win32::atomic_ref, etc.
- The idea being that that would hopefully result in more understandable compiler errors when some preprocessor decision inevitably goes wrong in the future.
At that point we could push the implementation of the "host versions" into .cpp files, to avoid exposing the user to let's say <windows.h>. We only want to provide this class for a short list of primitive types anyway.

If you want to give that setup a try, I won't stop you. 😉 But as I started, I'm willing to let the code in with this design as well as a first step. (After the actual issues have been fixed.)

core/include/vecmem/concepts/atomic_ref.hpp

core/include/vecmem/memory/device_atomic_ref.hpp

core/include/vecmem/memory/impl/device_atomic_ref.ipp

stephenswat · 2024-06-08T19:50:44Z

Okay, let's see what the MSVC CI thinks of this.

stephenswat · 2024-06-09T19:47:32Z

Okay so MSVC doesn't support atomics on unsigned integers. 😆

stephenswat · 2024-06-09T23:11:25Z

Someone explain to me how this commit breaks the synchronized memory resource on release builds in MSVC and in those builds alone.

stephenswat · 2024-06-09T23:13:24Z

Ah, of course.

Including intrin.h breaks locks and mutexes.

https://github.com/stephenswat/vecmem/actions/runs/9440139525/job/25999074005

🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡

core/include/vecmem/memory/impl/device_atomic_ref.ipp

stephenswat · 2024-06-10T09:09:19Z

I got rid of the MSVC intrinsics and replaced them by a non-atomic implementation of CAS, but at least now there is only one point of fake atomicity in this code.

This commit moves `vecmem::memory_order` into its own header in order to avoid circular dependencies in acts-project#275 and acts-project#276.

Currently, the functionality is `compare_exchange_strong` is broken because it relies on the CUDA `atomicCAS` builtin which functions fundamentally differently from the C++ STL version of the equivalent code. Indeed, the C++ version returns true on a succesful swap and false otherwise. The CUDA version always returns the old value. As such, if the old value is false-like, e.g. 0, the `compare_exchange_strong` function will always appear to fail, even if it succeeded. This commit fixes the above issue. Also increases the robustness of other atomic operations, adds new concepts, and adds new tests.

krasznaa

Let's get this in. Cleanup/improvements to come afterwards.

core/include/vecmem/memory/impl/device_atomic_ref.ipp

stephenswat added the bug Something isn't working label Jun 6, 2024

stephenswat requested a review from krasznaa June 6, 2024 16:01

stephenswat force-pushed the fix/atomiccas branch 2 times, most recently from eaf5991 to 2bc62ed Compare June 7, 2024 12:26

stephenswat changed the title ~~Fix atomic CAS functionality in CUDA~~ Increase the robustness of device_atomic_ref Jun 7, 2024

stephenswat force-pushed the fix/atomiccas branch 2 times, most recently from 1e7e8c3 to 8f4fb2d Compare June 7, 2024 13:21

stephenswat mentioned this pull request Jun 7, 2024

Add a simple spinlock mutex type acts-project/traccc#607

Merged

krasznaa requested changes Jun 8, 2024

View reviewed changes

stephenswat mentioned this pull request Jun 8, 2024

Add ability to reserve bulk space in device vector #274

Merged

stephenswat force-pushed the fix/atomiccas branch from 8f4fb2d to 9142ff2 Compare June 8, 2024 19:50

stephenswat force-pushed the fix/atomiccas branch from 9142ff2 to f582428 Compare June 8, 2024 21:12

stephenswat force-pushed the fix/atomiccas branch 12 times, most recently from 874ced1 to 53dcf2f Compare June 9, 2024 22:29

stephenswat force-pushed the fix/atomiccas branch 3 times, most recently from 0fbc9ce to 0eebea1 Compare June 9, 2024 22:57

krasznaa reviewed Jun 10, 2024

View reviewed changes

core/include/vecmem/memory/impl/device_atomic_ref.ipp Show resolved Hide resolved

stephenswat force-pushed the fix/atomiccas branch from 0eebea1 to 7b02da0 Compare June 10, 2024 09:08

stephenswat mentioned this pull request Jun 10, 2024

Add concept for atomic references #276

Merged

stephenswat added a commit to stephenswat/vecmem that referenced this pull request Jun 10, 2024

Move vecmem::memory_order into its own header

f4e4242

This commit moves `vecmem::memory_order` into its own header in order to avoid circular dependencies in acts-project#275 and acts-project#276.

stephenswat mentioned this pull request Jun 10, 2024

Move vecmem::memory_order into its own header #277

Closed

stephenswat added a commit to stephenswat/vecmem that referenced this pull request Jun 10, 2024

Move vecmem::memory_order into its own header

797e3a7

This commit moves `vecmem::memory_order` into its own header in order to avoid circular dependencies in acts-project#275 and acts-project#276.

stephenswat added a commit to stephenswat/vecmem that referenced this pull request Jun 10, 2024

Move vecmem::memory_order into its own header

6ca23d4

This commit moves `vecmem::memory_order` into its own header in order to avoid circular dependencies in acts-project#275 and acts-project#276.

stephenswat added a commit to stephenswat/vecmem that referenced this pull request Jun 10, 2024

Move vecmem::memory_order into its own header

c5347fb

This commit moves `vecmem::memory_order` into its own header in order to avoid circular dependencies in acts-project#275 and acts-project#276.

stephenswat added a commit to stephenswat/vecmem that referenced this pull request Jun 10, 2024

Move vecmem::memory_order into its own header

ef8a4b3

This commit moves `vecmem::memory_order` into its own header in order to avoid circular dependencies in acts-project#275 and acts-project#276.

stephenswat force-pushed the fix/atomiccas branch from 7b02da0 to 2d4fe8b Compare June 11, 2024 16:20

stephenswat requested a review from krasznaa June 12, 2024 11:35

krasznaa approved these changes Jun 12, 2024

View reviewed changes

core/include/vecmem/memory/impl/device_atomic_ref.ipp Show resolved Hide resolved

krasznaa merged commit 013d297 into acts-project:main Jun 12, 2024
30 checks passed

This was referenced Aug 3, 2024

Atomic Failure on Windows with CUDA, main branch (2024.08.03.) #288

Open

Atomic Reference Reorganization, main branch (2024.08.08.) #291

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase the robustness of `device_atomic_ref` #275

Increase the robustness of `device_atomic_ref` #275

stephenswat commented Jun 6, 2024

krasznaa commented Jun 6, 2024

stephenswat commented Jun 6, 2024

krasznaa commented Jun 6, 2024

stephenswat commented Jun 7, 2024

krasznaa left a comment

stephenswat commented Jun 8, 2024

stephenswat commented Jun 9, 2024

stephenswat commented Jun 9, 2024

stephenswat commented Jun 9, 2024

stephenswat commented Jun 10, 2024

krasznaa left a comment

Increase the robustness of device_atomic_ref #275

Increase the robustness of device_atomic_ref #275

Conversation

stephenswat commented Jun 6, 2024

krasznaa commented Jun 6, 2024

stephenswat commented Jun 6, 2024

krasznaa commented Jun 6, 2024

stephenswat commented Jun 7, 2024

krasznaa left a comment

Choose a reason for hiding this comment

stephenswat commented Jun 8, 2024

stephenswat commented Jun 9, 2024

stephenswat commented Jun 9, 2024

stephenswat commented Jun 9, 2024

stephenswat commented Jun 10, 2024

krasznaa left a comment

Choose a reason for hiding this comment

Increase the robustness of `device_atomic_ref` #275

Increase the robustness of `device_atomic_ref` #275