Adopting code to `cuda::kernel::wrap` API changes #486

fvanmaele · 2023-04-03T15:50:07Z

I'm having some old code which uses the old API of cuda::kernel::wrap, which was modified in commit bc53844. The call looks as follows:

		    return cuda::kernel::wrap(
                cuda::device::current::detail_::get_id(),
		        kernel::reduce<M_ic,
		                       block_dim_ic,
		                       num_partitions_per_block_ic,
		                       num_rhs_batch_ic,
		                       pivoting,
		                       int,
		                       T,
		                       TR,
		                       typename make_scalar_type<T>::type>);

kernel::reduce above is a void function. See: https://mp-force.ziti.uni-heidelberg.de/fvanmaele/tridigpu/-/blob/master/include/tridigpu/reduction.h#L58-80 and https://mp-force.ziti.uni-heidelberg.de/fvanmaele/tridigpu/-/blob/master/include/tridigpu/reduction.h#L105

How can I adopt this to the new interface?
Unfortunately I am not familiar with this project, and I couldn't find a migration guide for the API either. Changing to cuda::kernel::get was unsuccessful.

The text was updated successfully, but these errors were encountered:

eyalroz · 2023-04-03T18:17:25Z

tl;dr: cuda::kernel::get(my_device, my_plain_vanilla_kernel_function) is known to work; see the execution control example.

The effect of that big change on the wrap() method is that it now requires a CUDA context. People who are used to the runtime API don't know about context - and that's ok - but wrap() is the lowest-level non-detail_ kind of API call I offer, and it must really "not do anything", so it needs the context. Under the hood, the CUDA runtime API uses something called the "primary context" on each device.

So, your options are:

Getting the handle of the primary context on the device (which is not entirely trivial since that thing is reference counted; but auto my_pcontext = my_device.primary_context(), keeping that object alive, and using my_pcontext.handle() does the trick.
Using the convenience function template I offer to more runtime-API-oriented people, cuda::kernel::get(const device_t& device, KernelFunctionPtr function_ptr) where you only pass a device and a kernel function. That takes care of the context magic for you. Note: It takes a device, not a device id!

eyalroz · 2023-04-07T17:57:47Z

Assuming the question is answered to @fvanmaele 's satisfaction... please comment again if that's not the case.

eyalroz added the question label Apr 3, 2023

eyalroz self-assigned this Apr 3, 2023

eyalroz closed this as completed Apr 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adopting code to `cuda::kernel::wrap` API changes #486

Adopting code to `cuda::kernel::wrap` API changes #486

fvanmaele commented Apr 3, 2023 •

edited

Loading

eyalroz commented Apr 3, 2023 •

edited

Loading

eyalroz commented Apr 7, 2023

Adopting code to cuda::kernel::wrap API changes #486

Adopting code to cuda::kernel::wrap API changes #486

Comments

fvanmaele commented Apr 3, 2023 • edited Loading

eyalroz commented Apr 3, 2023 • edited Loading

eyalroz commented Apr 7, 2023

Adopting code to `cuda::kernel::wrap` API changes #486

Adopting code to `cuda::kernel::wrap` API changes #486

fvanmaele commented Apr 3, 2023 •

edited

Loading

eyalroz commented Apr 3, 2023 •

edited

Loading