Refactor <atomic> and move implementation to libcxx #179

wmaxey · 2021-07-07T22:03:24Z

Overview

This is a refactor of <cuda/std/atomic> and the underlying <atomic> headers. This moves the implementation of CUDA specific atomics to the inner implementation. It also allows for us to implement the underlying mechancis for atomic_ref in the near future and allows NVC++ to access atomic PTX intrinsics when support is finally enabled.

Motivation

There are several goals here:

Eases the review burden on the nvcxx feature branch.
An opportunity to fix some alignment bugs with atomic<struct> types.
Allows us to prepare and test a backend that supports atomic_ref, a feature we will support very soon.

Requirements

Begin using <nv/target> in the atomic backend and within relevant tests.
Move atomic intrinsics from <cuda/std/...> and instead into `<libcxx/support/atomic>.
Refactor the atomic_impl internals to support more generic host and device layering.

Design

Move implementation specific atomic backends into individual headers.
When the CUDA backend is enabled #include a relevant host atomic backend into a host:: namespace.
For atomic_ref implement a corresponding atomic implementation class that is compatible with __cxx_atomic_unwrap.
Remove C11 support or continue to support it in host only code.

Test Plan

All currently existing tests should continue to function as expected.
Add a test for ensuring the correctness of differently aligned types.

Performance Tests

Compare generated code to previously generated code. The results of this should indicate that nothing will have changed.

Documentation

No documentation changes needed as of yet.

Complex Internal Systems

The internal namespace layering that will be used to break apart host and device atomics.
- Controlled by macro switches for a given compiler.
- Splits host and device atomics into separate namespaces so that the atomic implementation layer may remain unaware of which compiler/mode has been targeted.

griwes

Partial review, only tests so far.

.upstream-tests/test/cuda/pipeline_memcpy_async_thread_scope_generic.h

.upstream-tests/test/std/atomics/atomics.flag/default.pass.cpp

.upstream-tests/test/support/concurrent_agents.h

.upstream-tests/test/support/cuda_space_selector.h

wmaxey · 2021-07-14T23:20:40Z

Partial review, only tests so far.

I've removed test changes in the interest of time.

Unfortunately there is a bug blocking progress related to <nv/target>: https://nvbugs/3341536

griwes · 2021-07-21T21:55:23Z

include/cuda/std/atomic

+using std::detail::thread_scope_block;
+using std::detail::thread_scope_thread;
+
+namespace detail {


It's embarrassing to admit it, but I never realized we had a namespace called detail. We should rename it to __detail throughout .

Fixed in a rename commit.

griwes · 2021-07-21T21:59:47Z

include/cuda/std/barrier

@@ -293,7 +293,7 @@ inline void __strided_memcpy(char * __destination, char const * __source, std::s
    }
 }

-#if __CUDA_ARCH__ >= 800
+#if __CUDA_MINIMUM_ARCH__ >= 800


The code down below still has __CUDA_ARCH__ in the codepath selection logic.

Fixed remaining CUDA_ARCH dispatches.

I'm not seeing any further changes in this file. Line 389, for one, isn't using nvtarget.

Hmm... That's because barrier fixes are part of another patch set. I won't be pulling those ones into this refactor. Instead I can reset this file to main to avoid complicating things further.

barrier will need the same injection into libcxx treatment.

griwes · 2021-07-21T22:06:22Z

libcxx/include/support/atomic/atomic_cuda.h

+// END TODO
+
+// Wrap host atomic implementations into a sub-namespace
+namespace host {


Will uglify!

griwes · 2021-07-21T22:08:58Z

libcxx/include/support/atomic/atomic_cuda.h

+#include "atomic_cuda_derived.h"
+
+template <typename _Tp>
+struct __skip_amt { enum {value = 1}; };


"amt"? "amount"? I understand what this is for because I know the __atomic intrinsics, but I'd like this to have a better name. "__difference_scale" or something? (I don't think that's a good name either, but it's more evocative of what it is used for.)

This is essentially the same hack that was in another layer. I'll try to use that one instead and get rid of the duplication.

Fixed with a rename to __atomic_ptr_inc. I think that conveys the idea that it's an increment for pointers.

griwes · 2021-07-21T22:12:43Z

libcxx/include/support/atomic/atomic_cuda.h

+}
+__host__ __device__
+inline int __cuda_memcmp(void const * __lhs, void const * __rhs, size_t __count) {
+#ifdef __CUDA_ARCH__


Do we want to give this if target treatment? If not, why?

This should be fixed now.

libcxx/include/support/atomic/atomic_cuda_generated.h

libcxx/include/support/atomic/atomic_msvc.h

griwes · 2021-07-21T22:20:03Z

libcxx/include/support/atomic/atomic_msvc.h

@@ -435,4 +463,213 @@ _Type __host__ __atomic_fetch_min(_Type volatile *__ptr, _Delta __val, int __mem
    return __expected;
 }

-_LIBCUDACXX_END_NAMESPACE_CUDA
+template <typename _Tp, int _Sco>


The code from here onwards repeats for GCC and MSVC, right? Can we move it to some sort of a "derived" header that is just included in both of these headers to avoid repeating all of it?

Maybe if we had a from-scratch backend for MSVC this wouldn't be an issue, but seems we aren't really in such a place, right?

It would be 'easiest' to just strip the content from atomic_gcc.h and instead call that some kind of cxx_atomic_base.h

atomic_gcc.h need only include cxx_atomic_base.h since gcc/clang comes with the intrinsics.

atomic_msvc.h creates the back-end intrinsics and then includes cxx_atomic_base.h.

Thoughts on that method? I think that would accomplish reducing code duplication and wouldn't be too confusing. Though it still means everything relies on the GCC atomic symbols being defined.

…om libcudacxx

…sage in tests

…test utilities

…n MSVC, fix a test that used preprocessor directives within NV_TARGET

…peline tests

…within a C++ context

…and creating back-end intrinsics in the MSVC header

…n cherry-picking

…ime lock free check

…e introduction of <nv/target>

…arts of atomics are aligned

wmaxey · 2021-08-02T23:01:24Z

SC-DVS: https://scbuilds4u/dvs/#/change/3025360339432407.2?eventType=Virtual
AUS-DVS: https://builds4u.nvidia.com/dvs/#/change/3025362855810995.1?eventType=Virtual&dvs_show=sanity

griwes · 2021-08-04T00:13:43Z

include/cuda/std/barrier

@@ -293,7 +293,7 @@ inline void __strided_memcpy(char * __destination, char const * __source, std::s
    }
 }

-#if __CUDA_ARCH__ >= 800
+#if __CUDA_MINIMUM_ARCH__ >= 800


I'm not seeing any further changes in this file. Line 389, for one, isn't using nvtarget.

griwes · 2021-08-04T00:16:18Z

libcxx/include/support/atomic/atomic_base.h

+#ifndef _LIBCUDACXX_ATOMIC_BASE_H
+#define _LIBCUDACXX_ATOMIC_BASE_H
+
+#include <type_traits>


If a file including this file doesn't include type_traits, type_traits is included inside a namespace, right? This needs to stop being here.

👍 Thanks for the catch.

griwes · 2021-08-04T00:16:39Z

libcxx/include/support/atomic/atomic_cuda.h

+#endif
+
+#ifndef __CUDACC_RTC__
+#include <string.h>


Same comment as for type_traits above here.

…ater

griwes

LGTM, good job!

wmaxey · 2021-08-04T02:25:32Z

https://builds4u.nvidia.com/dvs/#/change/3025952755833026.2?eventType=Virtual

Build is all clear.

wmaxey requested review from griwes, c0riolis and jrhemstad July 7, 2021 22:03

wmaxey mentioned this pull request Jul 7, 2021

cuda::atomic failures with underaligned types #160

Closed

griwes reviewed Jul 7, 2021

View reviewed changes

wmaxey force-pushed the feature/atomic_refactor branch from 4797649 to 51b2eb8 Compare July 8, 2021 22:52

wmaxey added this to the 1.6.0 milestone Jul 9, 2021

wmaxey force-pushed the feature/atomic_refactor branch 2 times, most recently from 2dab8e2 to 4c3ab20 Compare July 15, 2021 00:19

griwes reviewed Jul 21, 2021

View reviewed changes

wmaxey force-pushed the feature/atomic_refactor branch from 4c3ab20 to 303418a Compare July 22, 2021 21:18

wmaxey changed the title ~~WIP: Refactor <atomic> and move implementation to libcxx~~ Refactor <atomic> and move implementation to libcxx Jul 26, 2021

wmaxey force-pushed the feature/atomic_refactor branch 2 times, most recently from 1d1b951 to efa59b9 Compare July 29, 2021 17:46

wmaxey added 15 commits July 29, 2021 12:47

Remove all uses of non-compliant __CUDA_ARCH__/preprocessor macros fr…

9c3b6af

…om libcudacxx

Fix an issue in NVRTC tests

fe05d4c

Move files and implementation for atomic refactor

5522078

WIP: single interface wrap/unwrap

ec30850

Finish atomic refactor, bones of atomic_ref are in place

2a5fbc6

Refactor a few #ifdef __CUDA__ things and fix statics/shared memory u…

5016ca1

…sage in tests

Implement CUDA atomic ref, implement MSVC atomics layer, fix several …

a951059

…test utilities

Fix atomic_mscv header, fix macro processing in __threading_support o…

27a34b5

…n MSVC, fix a test that used preprocessor directives within NV_TARGET

Fix some local errors and warnings, put back the SHARED macro, fix pi…

205a2dd

…peline tests

Add symlink to nv/target within libcxx

e0896e8

Fix CV handling of atomics on the libcxx/std layer

815269d

Disable C11 atomics in __config

eaf550a

fixups for Clang specific issues in atomic, make sure <nv/target> is …

88d9240

…within a C++ context

Fix CUDA and MSVC atomic layers

c9eeeee

uncomment a couple tests from pipeline

e9dcf2b

wmaxey and others added 15 commits July 29, 2021 12:47

Revert tests, will <nv/target>-ify later

28e83de

Rebuild atomic_cuda_generated

dfbd5de

Dedup MSVC by splitting the atomic base class into a seperate header …

4f0b243

…and creating back-end intrinsics in the MSVC header

Missed grabbing important parts of the nvcxx-compatibility branch whe…

f38ee63

…n cherry-picking

Pickup more nv/target specializations from nvcxx_compatibility branch

09d6ac6

Uglify the atomic detail:: and host:: namespaces

2ddf98f

Rename __skip_amt to __atomic_ptr_inc

6f43f61

Refactor and dedup some code in the __cxx_atomic cuda layer, fix runt…

3996fb9

…ime lock free check

Fix set-but-not-used warnings for atomic intrinsics in atomic_base.h

8796c15

Fix static_assert in bad_atomic_alignment test.

5001363

Suppress pointless comparison warnings where tests are impacted by th…

ec7df62

…e introduction of <nv/target>

Add a missing license header to the atomic_c11.h file

09a2971

Fix pointless comparison warnings on two other pipeline tests

f1f4367

Wrap/Unwrap store, exchange, and load to make sure the 'written to' p…

047e3c9

…arts of atomics are aligned

Change method of ensuring that atomic types match

0d5fb0e

wmaxey force-pushed the feature/atomic_refactor branch from 665e6f8 to 0d5fb0e Compare July 29, 2021 19:49

Fix spurious warnings in atomic_base.h

ffcb0f9

wmaxey added the testing: internal ci passed Passed internal NVIDIA CI (DVS). label Aug 2, 2021

Rename __to_gcc_order to __cxx_atomic_order_to_int

6fb0a9a

wmaxey requested a review from griwes August 3, 2021 22:14

griwes reviewed Aug 4, 2021

View reviewed changes

wmaxey added 2 commits August 3, 2021 17:27

Reset barrier/latch/semaphore to head, as those will be nv/target'd l…

8711f32

…ater

Fix a few includes occuring within internal namespaces

6efb206

griwes approved these changes Aug 4, 2021

View reviewed changes

wmaxey merged commit ca45a79 into main Aug 4, 2021

wmaxey deleted the feature/atomic_refactor branch August 4, 2021 02:25

wmaxey mentioned this pull request Sep 10, 2021

Implement atomic_ref<Integral> #203

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor <atomic> and move implementation to libcxx #179

Refactor <atomic> and move implementation to libcxx #179

wmaxey commented Jul 7, 2021 •

edited

Loading

griwes left a comment

wmaxey commented Jul 14, 2021

griwes Jul 21, 2021

wmaxey Jul 27, 2021

griwes Jul 21, 2021

wmaxey Aug 3, 2021

griwes Aug 4, 2021

wmaxey Aug 4, 2021 •

edited

Loading

griwes Jul 21, 2021

wmaxey Jul 21, 2021

griwes Jul 21, 2021

wmaxey Jul 21, 2021

wmaxey Jul 27, 2021

griwes Jul 21, 2021

wmaxey Jul 27, 2021

griwes Jul 21, 2021

wmaxey Jul 21, 2021

wmaxey commented Aug 2, 2021

griwes Aug 4, 2021

griwes Aug 4, 2021

wmaxey Aug 4, 2021

griwes Aug 4, 2021

griwes left a comment

wmaxey commented Aug 4, 2021

Refactor <atomic> and move implementation to libcxx #179

Refactor <atomic> and move implementation to libcxx #179

Conversation

wmaxey commented Jul 7, 2021 • edited Loading

Overview

Motivation

Requirements

Design

Test Plan

Performance Tests

Documentation

Complex Internal Systems

griwes left a comment

Choose a reason for hiding this comment

wmaxey commented Jul 14, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wmaxey Aug 4, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wmaxey commented Aug 2, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

griwes left a comment

Choose a reason for hiding this comment

wmaxey commented Aug 4, 2021

wmaxey commented Jul 7, 2021 •

edited

Loading

wmaxey Aug 4, 2021 •

edited

Loading