Closed
Description
Is this an optimization feature for SYCL ?
If the address of a __global__
function parameter is taken, the compiler will ordinarily make a copy of the kernel parameter in thread local memory and use the address of the copy, to partially support C++ semantics, which allow each thread to modify its own local copy of function parameters. Annotating a __global__
function parameter with __grid_constant__
ensures that the compiler will not create a copy of the kernel parameter in thread local memory, but will instead use the generic address of the parameter itself. Avoiding the local copy may result in improved performance.
#include <cuda.h>
struct S {
char4 a;
int x;
};
__device__ void unknown_function(S const&) {}
__global__ void kernel(const __grid_constant__ S s) {
// s.x += threadIdx.x; // Undefined Behavior: tried to modify read-only memory
// Compiler will _not_ create a per-thread thread local copy of "s":
unknown_function(s);
}
int main() {
S car;
kernel<<<1,1>>>(car);
return 0;
}
Reference
https://docs.nvidia.com/cuda/cuda-c-programming-guide/#grid-constant