-
Notifications
You must be signed in to change notification settings - Fork 185
Refactor <atomic> and move implementation to libcxx #179
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review, only tests so far.
.upstream-tests/test/cuda/pipeline_memcpy_async_thread_scope_generic.h
Outdated
Show resolved
Hide resolved
4797649
to
51b2eb8
Compare
I've removed test changes in the interest of time. Unfortunately there is a bug blocking progress related to |
2dab8e2
to
4c3ab20
Compare
include/cuda/std/atomic
Outdated
using std::detail::thread_scope_block; | ||
using std::detail::thread_scope_thread; | ||
|
||
namespace detail { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's embarrassing to admit it, but I never realized we had a namespace called detail
. We should rename it to __detail
throughout .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in a rename commit.
include/cuda/std/barrier
Outdated
@@ -293,7 +293,7 @@ inline void __strided_memcpy(char * __destination, char const * __source, std::s | |||
} | |||
} | |||
|
|||
#if __CUDA_ARCH__ >= 800 | |||
#if __CUDA_MINIMUM_ARCH__ >= 800 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code down below still has __CUDA_ARCH__
in the codepath selection logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed remaining CUDA_ARCH dispatches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing any further changes in this file. Line 389, for one, isn't using nvtarget.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... That's because barrier fixes are part of another patch set. I won't be pulling those ones into this refactor. Instead I can reset this file to main to avoid complicating things further.
barrier will need the same injection into libcxx treatment.
// END TODO | ||
|
||
// Wrap host atomic implementations into a sub-namespace | ||
namespace host { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__host
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will uglify!
#include "atomic_cuda_derived.h" | ||
|
||
template <typename _Tp> | ||
struct __skip_amt { enum {value = 1}; }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"amt"? "amount"? I understand what this is for because I know the __atomic intrinsics, but I'd like this to have a better name. "__difference_scale" or something? (I don't think that's a good name either, but it's more evocative of what it is used for.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is essentially the same hack that was in another layer. I'll try to use that one instead and get rid of the duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed with a rename to __atomic_ptr_inc. I think that conveys the idea that it's an increment for pointers.
} | ||
__host__ __device__ | ||
inline int __cuda_memcmp(void const * __lhs, void const * __rhs, size_t __count) { | ||
#ifdef __CUDA_ARCH__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to give this if target
treatment? If not, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be fixed now.
@@ -435,4 +463,213 @@ _Type __host__ __atomic_fetch_min(_Type volatile *__ptr, _Delta __val, int __mem | |||
return __expected; | |||
} | |||
|
|||
_LIBCUDACXX_END_NAMESPACE_CUDA | |||
template <typename _Tp, int _Sco> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code from here onwards repeats for GCC and MSVC, right? Can we move it to some sort of a "derived" header that is just included in both of these headers to avoid repeating all of it?
Maybe if we had a from-scratch backend for MSVC this wouldn't be an issue, but seems we aren't really in such a place, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be 'easiest' to just strip the content from atomic_gcc.h
and instead call that some kind of cxx_atomic_base.h
atomic_gcc.h
need only include cxx_atomic_base.h
since gcc/clang comes with the intrinsics.
atomic_msvc.h
creates the back-end intrinsics and then includes cxx_atomic_base.h
.
Thoughts on that method? I think that would accomplish reducing code duplication and wouldn't be too confusing. Though it still means everything relies on the GCC atomic symbols being defined.
4c3ab20
to
303418a
Compare
1d1b951
to
efa59b9
Compare
…n MSVC, fix a test that used preprocessor directives within NV_TARGET
…within a C++ context
…and creating back-end intrinsics in the MSVC header
…ime lock free check
…e introduction of <nv/target>
…arts of atomics are aligned
665e6f8
to
0d5fb0e
Compare
include/cuda/std/barrier
Outdated
@@ -293,7 +293,7 @@ inline void __strided_memcpy(char * __destination, char const * __source, std::s | |||
} | |||
} | |||
|
|||
#if __CUDA_ARCH__ >= 800 | |||
#if __CUDA_MINIMUM_ARCH__ >= 800 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing any further changes in this file. Line 389, for one, isn't using nvtarget.
#ifndef _LIBCUDACXX_ATOMIC_BASE_H | ||
#define _LIBCUDACXX_ATOMIC_BASE_H | ||
|
||
#include <type_traits> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a file including this file doesn't include type_traits, type_traits is included inside a namespace, right? This needs to stop being here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Thanks for the catch.
#endif | ||
|
||
#ifndef __CUDACC_RTC__ | ||
#include <string.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as for type_traits above here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, good job!
https://builds4u.nvidia.com/dvs/#/change/3025952755833026.2?eventType=Virtual Build is all clear. |
Overview
This is a refactor of
<cuda/std/atomic>
and the underlying<atomic>
headers. This moves the implementation of CUDA specific atomics to the inner implementation. It also allows for us to implement the underlying mechancis foratomic_ref
in the near future and allows NVC++ to access atomic PTX intrinsics when support is finally enabled.Motivation
There are several goals here:
atomic<struct>
types.atomic_ref
, a feature we will support very soon.Requirements
<nv/target>
in the atomic backend and within relevant tests.atomic_impl
internals to support more generic host and device layering.Design
#include
a relevant host atomic backend into ahost::
namespace.atomic_ref
implement a corresponding atomic implementation class that is compatible with__cxx_atomic_unwrap
.Test Plan
Performance Tests
Documentation
Complex Internal Systems