You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NVPTX64 support for module splitting has been added with #4107
llvm-test-suite has DeviceCodeSplit/split-per-kernel.cpp passing but failing with DeviceCodeSplit/split-per-source-main.cpp failing.
Investigation showed that failure was due to the CUDA plugin being unable to find the kernel names in the binary.
This is because the current implementation uses a regex search, which cannot create a list of names for piProgramGetInfo with module splitting per source.
Proposed fix: #4565
Add piProgramHasKernel to PI and use cuModuleGetFunction and avoid creating a list of function names owned by a program with a regex search.
A secondary pull request to llvm-test-suite will be made once #4565 passes to enable DeviceCodeSplit/split-per-kernel.cpp and DeviceCodeSplit/split-per-source-main.cpp for CUDA and HIP backends.
…1560)
The pass was raising TODOs when a function both had a fir.boxproc<> argument
and a fir.type<> argument (even if the fir.type<> did not contain a
fir.boxproc itself).
Prevent the TODO from firing when a fir.type<> does not actually contain
a fir.boxproc. Add the location for the remaining TODO (it will be
needed when procedure pointer components are supported in lowering).
FYI, I actually tried to just implement the TODO, but I there is a funny
issue. When creating the new fir::RecordType, since the name and context
are the same as the type being translated, fir::RecordType:get just
returns the existing type, and there is no way to change it (finalize()
does nothing since it is already finalized). So this will require to add
the ability to mutate the existing type, and I am not sure what are the
MLIR constraints here, so I escaped and left the TODO for that case.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier, PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D127633
Co-authored-by: Jean Perier <jperier@nvidia.com>
Only
off
value for-fsycl-device-code-split
is compatible with-fsycl-targets=
set for NVIDIA CUDA API.Should we add support for
-fsycl-device-code-split
?Revealed by KhronosGroup/SYCL-CTS#51.
The text was updated successfully, but these errors were encountered: