This repository has been archived by the owner on Mar 21, 2024. It is now read-only.
libcu++ 1.1.0 (CUDA Toolkit 11.0)
libcu++ 1.1.0 introduces the world's first implementation of the Standard C++20 synchronization library: <cuda/[std/]barrier>
, <cuda/std/latch>
, <cuda/std/semaphore>
, cuda::[std::]atomic_flag::test
, cuda::[std::]atomic::wait
, and cuda::[std::]atomic::notify*
. An extension for managing asynchronous local copies, cuda::memcpy_async
is introduced as well. It also adds <cuda/std/chrono>
, <cuda/std/ratio>
, and most of <cuda/std/functional>
.
ABI Breaking Changes
- ABI version 2 has been introduced and is now the default. A new ABI version was introduced because it is our policy to do so in every major CUDA toolkit release. ABI version 1 is no longer supported.
API Breaking Changes
- Atomics on Pascal + Windows are disabled because the platform does not support them and on this platform the CUDA driver rejects binaries containing these operations.
New Features
<cuda/[std/]barrier>
: C++20'scuda::[std::]barrier
, an asynchronous thread coordination mechanism whose lifetime consists of a sequence of barrier phases, where each phase allows at most an expected number of threads to block until the expected number of threads arrive at the barrier. It is backported to C++11. Thecuda::barrier
variant takes an additionalcuda::thread_scope
parameter.<cuda/barrier>
:cuda::memcpy_async
, asynchronous local copies. This facility is NOT for transferring data between threads or transferring data between host and device; it is not acudaMemcpyAsync
replacement or abstraction. It usescuda::[std::]barrier
s objects to synchronize the copies.<cuda/std/functional>
: common function objects, such ascuda::std::plus
,cuda::std::minus
, etc.cuda::std::function
,cuda::std::bind
,cuda::std::hash
, andcuda::std::reference_wrapper
are omitted.
Other Enhancements
- Upgraded to a newer version of upstream libc++.
- Standalone NVRTC support.
- C++17 support.
- NVCC + GCC 9 support.
- NVCC + Clang 9 support.
- Build with warnings-as-errors.
Issues Fixed
- Made
__cuda_memcmp
inline to fix ODR violations when compiling multiple translation units.