Emit diagnostic when attempting to use extended device lambda in contexts that require querying the return type #1688

jrhemstad · 2022-05-12T23:34:43Z

It is a very common pitfall of Thrust users to attempt to use a __device__ lambda with Thrust algorithms or iterators that fails in silent or obscure ways.

This is frequently due to the limitation that you cannot reliably query the return type of an extended lambda in host code. Specifically:

As described above, the CUDA compiler replaces a device extended lambda defined in a host function with a placeholder type defined in namespace scope. This placeholder type does not define a operator() function equivalent to the original lambda declaration. An attempt to determine the return type or parameter types of the operator() function may therefore work incorrectly in host code, as the code processed by the host compiler will be semantically different than the input code processed by the CUDA compiler. However, it is OK to introspect the return type or parameter types of the operator() function within device code. Note that this restriction does not apply to host device extended lambdas.

This means that any code in Thrust that relies on std::invoke_result_t or std::result_of or similar in host code will fail with a __device__ lambda, e.g.,

thrust/thrust/system/cuda/detail/transform_scan.h

Line 58 in d35f44f

using result_type = std::invoke_result_t<TransformOp, input_type>;

This will frequently fail silently or fail to compile with incredibly obscure compile errors. This is such a nuisance that I avoid __device__ lambdas all together with Thrust.

However, I recently learned that nvcc provides intrinsic type traits to query if a type is an extended lambda.

The compiler provides type traits to detect closure types for extended lambdas at compile time:

__nv_is_extended_device_lambda_closure_type(type): If 'type' is the closure class created for an extended device lambda, then the trait is true, otherwise it is false.

__nv_is_extended_host_device_lambda_closure_type(type): If 'type' is the closure class created for an extended host device lambda, then the trait is true, otherwise it is false.

This means Thrust can emit a useful diagnostic when a user attempts to use a __device__ lambda in a situation where that would be problematic (e.g., std::invoke_result_t).

One easy way to do this in thrust would be to introduce a consistent wrapper for invoke_result_t/result_of that simply static_asserts that the callable is not an extended lambda.

namespace thrust::detail{
   template <typename F, typename... Args>
   struct invoke_result_t : std::invoke_result_t<F, Args...>{
       static_assert(!__nv_is_extended_device_lambda_closure_type(F), "Attempt to use an extended __device__ lambda in a context that requires querying its return type. Use a named function object or a __host__ __device__ lambda instead.");
}

There are likely many other useful ways Thrust could use these traits. Deducing (or at least verifying) execution space from the callable comes to mind.

The text was updated successfully, but these errors were encountered:

gevtushenko · 2022-05-13T12:12:57Z

We are relying on the thrust/cub version of invoke_result_t.

This means that any code in Thrust that relies on std::invoke_result_t or std::result_of or similar will fail with a device lambda, e.g.

According to the documentation, it's not any code, it's host code:

it is OK to introspect the return type or parameter types of the operator() function within device code.

I'm using invoke_result_t to introspect the return type within device code. Should the code be something like the following one instead?

static_assert(__builtin_is_device_code() || !__nv_is_extended_device_lambda_closure_type(F), "Attempt to use an extended __device__ lambda in a context that requires querying its return type. Use a named function object or a __host__ __device__ lambda instead.");

Alternatively, we might provide thrust::invoke_device_result_t without this assert.

alliepiper · 2022-05-13T13:54:05Z

Related to #779.

jrhemstad · 2022-05-13T14:12:59Z

I'm using invoke_result_t to introspect the return type within device code. Should the code be something like the following one instead?

That's a great point.

This looks to work the way we want:

template <typename F, typename... Args>
struct invoke_result : std::invoke_result<F, Args...> {
#if defined(__NVCC__) && defined(__CUDACC_EXTENDED_LAMBDA__) && !defined(__CUDA_ARCH__)
    static_assert(!__nv_is_extended_device_lambda_closure_type(F),
                  "Attempt to use an extended __device__ lambda in a context "
                  "that requires querying its return type. Use a named "
                  "function object or a __host__ __device__ lambda instead.");
    )               
#endif
};

though those guards may be overkill in-lieu of just using !defined(__CUDA_ARCH__). nvc++ doesn't have "extended" lambdas, so we don't have to worry about that.

https://godbolt.org/z/j6qYT9d8E

alliepiper · 2022-08-04T19:05:28Z

@senior-zero @jrhemstad Is there more to do for this? Can we close it?

gevtushenko · 2022-08-04T19:23:21Z

I can see std::invoke_result_t usage in thrust, I'll probably have to changes these places to use cuda::std::invoke_result_t instead.

jrhemstad added P1: should have Necessary, but not critical. type: enhancement New feature or request. good first issue Good for newcomers. compiler: nvcc Specific to the NVCC compiler. backend: CUDA Related to the CUDA backend labels May 12, 2022

jrhemstad mentioned this issue May 13, 2022

Update cuda::std::invoke_result to emit diagnostic when attempting to query return type of extended lambda in host code NVIDIA/libcudacxx#277

Closed

alliepiper added this to the 2.0.0 milestone May 13, 2022

alliepiper assigned jrhemstad May 16, 2022

alliepiper added the release: breaking change Include in "Breaking Changes" section of release notes. label May 16, 2022

jrhemstad mentioned this issue Jul 11, 2022

Reduction functors (cuco::static_reduction_map refactoring 1/N) NVIDIA/cuCollections#187

Closed

alliepiper assigned gevtushenko and unassigned jrhemstad Jul 25, 2022

gevtushenko linked a pull request Aug 4, 2022 that will close this issue

Emit diagnostics for device lambdas #1760

Merged

gevtushenko closed this as completed in #1760 Aug 5, 2022

jrhemstad added this to CCCL Aug 11, 2022

jrhemstad removed this from CCCL Aug 11, 2022

gevtushenko mentioned this issue Dec 6, 2022

Unable to use transform_output_iterator for output of copy_if with CUDA #1650

Closed

ahendriksen mentioned this issue Jun 9, 2023

[FEA] Masked NN for connect_components rapidsai/raft#1445

Merged

tarang-jain mentioned this issue Jun 22, 2023

Add libcudacxx as dependency rapidsai/raft#1606

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emit diagnostic when attempting to use extended device lambda in contexts that require querying the return type #1688

Emit diagnostic when attempting to use extended device lambda in contexts that require querying the return type #1688

jrhemstad commented May 12, 2022 •

edited

Loading

gevtushenko commented May 13, 2022

alliepiper commented May 13, 2022

jrhemstad commented May 13, 2022

alliepiper commented Aug 4, 2022

gevtushenko commented Aug 4, 2022

Emit diagnostic when attempting to use extended device lambda in contexts that require querying the return type #1688

Emit diagnostic when attempting to use extended device lambda in contexts that require querying the return type #1688

Comments

jrhemstad commented May 12, 2022 • edited Loading

gevtushenko commented May 13, 2022

alliepiper commented May 13, 2022

jrhemstad commented May 13, 2022

alliepiper commented Aug 4, 2022

gevtushenko commented Aug 4, 2022

jrhemstad commented May 12, 2022 •

edited

Loading