-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase the robustness of device_atomic_ref
#275
Conversation
As you noticed, one cannot just use If you forego the removal of the naive, non-atomic implementation, then I'll be happy to get this fix in. But I'd rather not open the can of worms with how |
My bad, I had hoped that any of the templates would have been on the function, not on the class, but sadly not. Anyway there will be a bit more work to do here anyway, so I'll come up with a more comprehensive solution. |
Note that I've been thinking for a while now about introducing |
eaf5991
to
2bc62ed
Compare
Okay, the scope of this PR has grown a little bit to fix a whole bunch of other issues with the atomic references. Also adds additional compile-time checks on the functionality of atomic references as well as runtime tests. |
device_atomic_ref
1e7e8c3
to
8f4fb2d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm absolutely on board with making this code better. I'm very happy that you're looking into it.
Please fix up all the remaining issues, and then I'll be willing to push in this macro-hell. 🤔 But after that, I'll absolutely want to clean this up.
- Instead of doing preprocessor magic everywhere, I'll want to have a few different classes called let's say
vecmem::details::cuda::atomic_ref
,vecmem::details::win32::atomic_ref
, etc.- The idea being that that would hopefully result in more understandable compiler errors when some preprocessor decision inevitably goes wrong in the future.
- At that point we could push the implementation of the "host versions" into
.cpp
files, to avoid exposing the user to let's say<windows.h>
. We only want to provide this class for a short list of primitive types anyway.
If you want to give that setup a try, I won't stop you. 😉 But as I started, I'm willing to let the code in with this design as well as a first step. (After the actual issues have been fixed.)
8f4fb2d
to
9142ff2
Compare
Okay, let's see what the MSVC CI thinks of this. |
9142ff2
to
f582428
Compare
Okay so MSVC doesn't support atomics on unsigned integers. 😆 |
874ced1
to
53dcf2f
Compare
0fbc9ce
to
0eebea1
Compare
Someone explain to me how this commit breaks the synchronized memory resource on release builds in MSVC and in those builds alone. |
Ah, of course. Including https://github.com/stephenswat/vecmem/actions/runs/9440139525/job/25999074005 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 🤡 |
0eebea1
to
7b02da0
Compare
I got rid of the MSVC intrinsics and replaced them by a non-atomic implementation of CAS, but at least now there is only one point of fake atomicity in this code. |
This commit moves `vecmem::memory_order` into its own header in order to avoid circular dependencies in acts-project#275 and acts-project#276.
This commit moves `vecmem::memory_order` into its own header in order to avoid circular dependencies in acts-project#275 and acts-project#276.
This commit moves `vecmem::memory_order` into its own header in order to avoid circular dependencies in acts-project#275 and acts-project#276.
This commit moves `vecmem::memory_order` into its own header in order to avoid circular dependencies in acts-project#275 and acts-project#276.
This commit moves `vecmem::memory_order` into its own header in order to avoid circular dependencies in acts-project#275 and acts-project#276.
Currently, the functionality is `compare_exchange_strong` is broken because it relies on the CUDA `atomicCAS` builtin which functions fundamentally differently from the C++ STL version of the equivalent code. Indeed, the C++ version returns true on a succesful swap and false otherwise. The CUDA version always returns the old value. As such, if the old value is false-like, e.g. 0, the `compare_exchange_strong` function will always appear to fail, even if it succeeded. This commit fixes the above issue. Also increases the robustness of other atomic operations, adds new concepts, and adds new tests.
7b02da0
to
2d4fe8b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get this in. Cleanup/improvements to come afterwards.
While working on acts-project/traccc#595, I found out that the vecmem implementation of atomic CAS is fundamentally broken on CUDA platforms 😟. Currently, the functionality is
compare_exchange_strong
is broken because it relies on the CUDAatomicCAS
builtin which functions fundamentally differently from the C++ STL version of the equivalent code. Indeed, the C++ version returns true on a succesful swap and false otherwise. The CUDA version always returns the old value. As such, if the old value is false-like, e.g. 0, thecompare_exchange_strong
function will always appear to fail, even if it succeeded. This commit fixes the above issue.I also removed the backup implementation of CAS as it was not atomic in any way and was basically lying to users about working atomically 😟.