You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The most significant change in this version regards the way callbacks/host functions are supported. This change is motivated mostly as preparation for the upcoming introduction of CUDA graph support (not in this version), which will impose some stricter constraints on callbacks - precluding the hack we have been using so far.
So far, a callback was any object invokable with an std::stream_t parameter. From now on, we support two kinds of callback:
A plain function - not a closure, which may be invoked with a pointer to an arbitrary type: cuda::stream_t::enqueue_t::host_function_call(Argument * user_data)
An object invokable with no parameters - a closure, to which one cannot provide any additional information: cuda::stream_t::enqueue_t::host_invokable(Invokable& invokable)
This lets us avoid the combination of heap allocation at enqueue and deallocation at launch - which works well enough for now, but will not be possible when the same callback needs to be invoked multiple times. Also, it was in contradiction of our presumption not to add layers of abstraction over what CUDA itself provides.
Of course, the release also has s the "usual" long list of minor fixes.
Make copy_parameters_t user-facing and beef it up #466 Can now perform copies using cuda::memory::copy_parameters_t<N> (for N=2 or 3), a wrapper of the CUDA driver's richest parameters structure with multiple convenience functions, for maximum configurability of a copy operation. But - this structure is not currently "fool-proof", so use with care and initialize all relevant fields.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
The most significant change in this version regards the way callbacks/host functions are supported. This change is motivated mostly as preparation for the upcoming introduction of CUDA graph support (not in this version), which will impose some stricter constraints on callbacks - precluding the hack we have been using so far.
So far, a callback was any object invokable with an
std::stream_t
parameter. From now on, we support two kinds of callback:cuda::stream_t::enqueue_t::host_function_call(Argument * user_data)
cuda::stream_t::enqueue_t::host_invokable(Invokable& invokable)
This lets us avoid the combination of heap allocation at enqueue and deallocation at launch - which works well enough for now, but will not be possible when the same callback needs to be invoked multiple times. Also, it was in contradiction of our presumption not to add layers of abstraction over what CUDA itself provides.
Of course, the release also has s the "usual" long list of minor fixes.
Changes to existing API
cuda::kernel::get()
now takes a device, not a kernel - since it can't really do anything useful for non-primary kernels (which is where apriori-compiled kernels are available)API additions
cuda::memory::region_t
's when enqueueing copy operations on streams (and thus alsocuda::span<T>
's)cuda::memory::copy_parameters_t<N>
(for N=2 or 3), a wrapper of the CUDA driver's richest parameters structure with multiple convenience functions, for maximum configurability of a copy operation. But - this structure is not currently "fool-proof", so use with care and initialize all relevant fields.cuda::pointer_t
Bug fixes
device::get()
no longer incorrectly marked asnoexcept
allocate_managed()~ in
context.hpp`flush_remote_writes()
operation on a stream (this is one of the "batch stream memory operations)apriori_compiled_kernel_t::get_attribute
should be marked inline #449apriori_compiled_kernel_t::get_attribute()
was missing aninline
decorationcuda::profiling::mark::range_start()
andrange_end()
were callingcreate_attributions()
the wrong wayCleanup and warning avoidance
Compatibility
Other changes
constexpr
This discussion was created from the release Version 0.6.2 RC1: Stream callback semantics change, bug fixes.
Beta Was this translation helpful? Give feedback.
All reactions