Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Expose support for <iterator> and <array> #243

Merged
merged 28 commits into from
Feb 19, 2022
Merged

Expose support for <iterator> and <array> #243

merged 28 commits into from
Feb 19, 2022

Conversation

wmaxey
Copy link
Member

@wmaxey wmaxey commented Jan 21, 2022

<cuda/std/array> + <cuda/std/iterator> - Statically sized arrays and iterators.

Adds support for cuda::std::array and cuda::std::iterator.

Details

Motivation

  • Users often make use of statically sized typed arrays in CUDA and frequently implement their own array<T,N> class.
  • --expt-relaxed-constexpr is required to allow users to use std::array in device code.
  • Users using NVRTC have even fewer options.

Impact

  • Exposes <cuda/std/array> and <cuda/std/iterator> headers.
  • Silently exposes sections of algorithm as an implementation detail. Full support will come later.
  • No test coverage impact.
  • Exceptions are disabled, invalid accesses abort or trap on device.

Checklists

Testing

  • Bug fixes have regression tests that would reproduce the bug.
  • New features should have correctness tests to validate behavior.
  • Benchmarks have been added to monitor performance of new features. (not required)

@wmaxey wmaxey added this to the 1.8.0 milestone Jan 21, 2022
@wmaxey wmaxey requested review from griwes and jrhemstad January 21, 2022 01:50
@wmaxey wmaxey self-assigned this Jan 21, 2022
@wmaxey
Copy link
Member Author

wmaxey commented Jan 21, 2022

Closes #51

@wmaxey
Copy link
Member Author

wmaxey commented Jan 21, 2022

@griwes / @jrhemstad If anyone wants to take a particular look at changes for initializer_list I think the current solution is brutally hacky. Potentially the best we can do is add an implicit constructor for a cuda::std::initializer_list, but I'm uncertain if we want to add something so confusing. Same issue as tuple_size honestly in that we can't go around conflicting with the host standard library.

@wmaxey
Copy link
Member Author

wmaxey commented Jan 21, 2022

@wmaxey wmaxey added the testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). label Jan 21, 2022
@jrhemstad
Copy link
Collaborator

@griwes / @jrhemstad If anyone wants to take a particular look at changes for initializer_list I think the current solution is brutally hacky.

Can you summarize the problem/solution? I'm guessing something to do with std::initializer_list gets special treatment from the compiler that a "user-defined" cuda::std::initializer_list won't get?

#include "iterator"
#include "utility"

#include <initializer_list>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrhemstad This is mostly what I was talking about. We can't use cuda::std::initializer_list, due to the implicit construction of a std::initializer_list.

The compiler might insert something like the below, and if we are really certain on using a cuda::std::initializer_list we would need to add an implicit conversion which makes 'our' initializer_list behave weird: https://gcc.godbolt.org/z/EG739eEdP

http://eel.is/c++draft/dcl.init.list#5

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't see how we're going to get around this. intializer_list is a type that has special rules that the compiler follows that we can't really emulate with cuda::std::initializer_list.

@@ -1749,14 +1749,14 @@ reverse_iterator<_Tp*> rend(_Tp (&__array)[_Np])

template <class _Ep>
_LIBCUDACXX_INLINE_VISIBILITY _LIBCUDACXX_CONSTEXPR_AFTER_CXX14
reverse_iterator<const _Ep*> rbegin(initializer_list<_Ep> __il)
reverse_iterator<const _Ep*> rbegin(::std::initializer_list<_Ep> __il)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrhemstad hacks here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine to me.

@wmaxey wmaxey added testing: internal ci passed Passed internal NVIDIA CI (DVS). and removed testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). labels Feb 3, 2022
@wmaxey wmaxey requested a review from jrhemstad February 3, 2022 07:40
@wmaxey wmaxey changed the title WIP: Expose support for <iterator> and <array> Expose support for <iterator> and <array> Feb 3, 2022
@wmaxey wmaxey force-pushed the feature/array branch 2 times, most recently from cfa48ad to 312aaf0 Compare February 8, 2022 05:33
@@ -7,7 +7,7 @@ root_dir=$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)
stdlib_headers=$(<"${root_dir}/maintenance/stdlib-headers")
header_replacements=$(echo "${stdlib_headers}" | sed 's#<\(.*\)>#-e s:<\1>:<cuda/std/\1>:g#')

find "${root_dir}/test" -name "*.cpp" |
find "${root_dir}/test/std/iterators" -name "*.cpp" |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you meant to keep this ;>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fixed now.

@@ -13,7 +13,7 @@
/*
algorithm synopsis

#include <initializer_list>
#include <::std::initializer_list>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tell me you've just run search+replace without telling me... :D

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fixed. >.>

@@ -3302,6 +3336,7 @@ partition_copy(_InputIterator __first, _InputIterator __last,
// partition_point

template<class _ForwardIterator, class _Predicate>
_LIBCUDACXX_INLINE_VISIBILITY
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably also want _LIBCUDACXX_EXECUTION_SPACE_SPECIFIER for the ones here and above? I assume this is where you invented the new macro.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wary to do that, when building source files in libcxx there are places where some extra ABI flags are set that may matter. The ones I've marked are mainly internal and inline.

Putting _LIBCUDACXX_INLINE_VISIBLITY on those causes them to fail to build due to the linkage flags.

e.g.

#if __has_attribute(internal_linkage)
#  define _LIBCUDACXX_INTERNAL_LINKAGE __attribute__ ((internal_linkage))
#else
#  define _LIBCUDACXX_INTERNAL_LINKAGE _LIBCUDACXX_ALWAYS_INLINE
#endif


_Tp __elems_[_Size];

// No explicit construct/copy/destroy for aggregate type
_LIBCUDACXX_INLINE_VISIBILITY void fill(const value_type& __u) {
_CUDA_VSTD::fill_n(__elems_, _Size, __u);
fill_n(__elems_, _Size, __u);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and swap_ranges below) are now ADL calls, we need to keep them qualified.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replaced _CUDA_VSTD:: on this and below.

@@ -714,7 +714,7 @@ public:
_LIBCUDACXX_INLINE_VISIBILITY _LIBCUDACXX_CONSTEXPR_AFTER_CXX14
reference operator*() const {_Iter __tmp = current; return *--__tmp;}
_LIBCUDACXX_INLINE_VISIBILITY _LIBCUDACXX_CONSTEXPR_AFTER_CXX14
pointer operator->() const {return _CUDA_VSTD::addressof(operator*());}
pointer operator->() const {return addressof(operator*());}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be qualified to avoid ADL.

@@ -1168,7 +1168,7 @@ public:
_LIBCUDACXX_INLINE_VISIBILITY _LIBCUDACXX_CONSTEXPR_AFTER_CXX14
move_iterator& operator-=(difference_type __n) {__i -= __n; return *this;}
_LIBCUDACXX_INLINE_VISIBILITY _LIBCUDACXX_CONSTEXPR_AFTER_CXX14
reference operator[](difference_type __n) const { return static_cast<reference>(__i[__n]); }
reference operator[](difference_type __n) const { return move(__i[__n]); }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

@@ -1749,14 +1749,14 @@ reverse_iterator<_Tp*> rend(_Tp (&__array)[_Np])

template <class _Ep>
_LIBCUDACXX_INLINE_VISIBILITY _LIBCUDACXX_CONSTEXPR_AFTER_CXX14
reverse_iterator<const _Ep*> rbegin(initializer_list<_Ep> __il)
reverse_iterator<const _Ep*> rbegin(::std::initializer_list<_Ep> __il)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine to me.

Comment on lines 234 to 238
NV_IF_ELSE_TARGET(
NV_IS_DEVICE,
__trap();,
_CUDA_VSTD::abort();
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhere above you're made a macro for this, though using if/else on __CUDA_ARCH__ instead of if target. Should we replace that with what's in here, and just use the macro here to avoid repeating this construct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which macro was this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with _LIBCUDACXX_UNREACHABLE

@wmaxey wmaxey force-pushed the feature/array branch 3 times, most recently from 8804caf to fe8f048 Compare February 18, 2022 19:58
@wmaxey wmaxey merged commit b4762a7 into main Feb 19, 2022
@wmaxey wmaxey deleted the feature/array branch February 19, 2022 00:40
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
testing: internal ci passed Passed internal NVIDIA CI (DVS).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants