Move CUDA sources from intel/llvm to UR #896

fabiomestre · 2023-09-26T15:05:10Z

This PR moves the CUDA adapter for the intel/llvm repository to the unified runtime repository. This was done using git subtrees to preserve the commit history.

A follow-up PR on intel/llvm will remove the sources from that repo.

This moves the CUDA plugin implementation to Unified Runtime; and changes the pi_cuda plugin to use pi2ur to implement PI. The changes to the implementation have been kept to a minimum and should be functionally the same. Documentation and comments have been moved verbatim, other than changing PI references to UR. This PR is based on top of the Level Zero adapter (#8744) so will only be ready when that is merged. --------- Co-authored-by: Petr Vesely <petr.vesely@codeplay.com> Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com> Co-authored-by: Martin Morrison-Grant <martin.morrisongrant@codeplay.com> Co-authored-by: Aaron Greig <aaron.greig@codeplay.com>

Resolves the warnings as errors reported in [post merge](https://github.com/intel/llvm/actions/runs/5266121277/jobs/9519634360) as a result of merging #9512. Additionally move pre-processor guards to resolve unused global variables which would also fail in this build configuration (clang & SYCL_ENABLE_WERROR=ON).

…(#9938) * Call to `hipEventElapsedTime` return `hipErrorNotReady` when the timestamp has not yet been `recorded` on one or both events. Calling `hipEventSynchronize` block until the event is ready. * The issue showed itself when profiling sycl-blas benchmark. * Enable support for cuda / hip in event_profiling_info.cpp

…RSION (#9873) This should have been an obvious update of Unified Runtime tag to support UR_DEVICE_INFO_IP_VERSION, required in intel/llvm#9843 (just tag update, nothing else), but it also brought many API breaks caused by this patch: oneapi-src#536. So the current PR updates our codebase in accordance with changed UR API. --------- Signed-off-by: Dmitry Vodopyanov <dmitry.vodopyanov@intel.com> Co-authored-by: Callum Fare <callum@codeplay.com> Co-authored-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>

This fixes a change in `hasBeenSynchronized` accidentally introduced during the UR port

…(#10055) * Reverts back the update to calculation of threads per block for 0th dimension when primary ranges are involved. That could cause out of range access.

Includes a large number of missing `pi_result` mappings, so previously reported errors should no longer map to just `PI_ERROR_UNKNOWN`. NFCI for the adapters.

These checks are causing issues for very large USM allocations because the `MAX_MEM_ALLOC_SIZE` reported is lower than what CUDA actually supports. We will follow up with an update on the reported `MAX_MEM_ALLOC_SIZE`, but it makes sense to remove the checks either way, as the CUDA allocation functions will return an error if they can't allocate the memory.

# Level Zero Backend Support for SYCL Graphs This is the second patch of a series that adds support for an [experimental command graph extension](intel/llvm#5626) A snapshot of the complete work can be seen in draft PR #9375 which has support all the specification defined ways of adding nodes and edges to the graph, including both Explicit and Record & Replay graph construction. The two types of nodes currently implemented are kernel execution and memcpy commands. See https://github.com/reble/llvm#implementation-status for the status of our total work. ## Scope This second patch focuses on the required PI/UR support for the experimental command-buffer feature in the Level Zero adapter: * PI stubs for all adapters to enable compilation, no functionality. * Command-buffer implementation for the Level Zero UR adapter. * Stubs for the CUDA UR adapter to enable compilation, no functionality. ## Following Split PRs Future follow-up PRs with the remainder of our work on the extension will include: * Hooking up backend to graphs runtime, bugfixes and other feature additions, will add symbols but not break the ABI. (3/4) * Add end-to-end tests for SYCL Graph extension. (4/4) * NFC changes - Design doc and codeowner update. ## Authors Co-authored-by: Pablo Reble <pablo.reble@intel.com> Co-authored-by: Julian Miller <julian.miller@intel.com> Co-authored-by: Ben Tracy <ben.tracy@codeplay.com> Co-authored-by: Ewan Crawford <ewan@codeplay.com> Co-authored-by: Maxime France-Pillois <maxime.francepillois@codeplay.com> --------- Co-authored-by: Ewan Crawford <ewan@codeplay.com> Co-authored-by: Maxime France-Pillois <maxime.francepillois@codeplay.com>

The destructor should be calling `cuDevicePrimaryCtxRelease`

Fixes error found in [post-commit CI](https://github.com/intel/llvm/actions/runs/5454766342/jobs/9925392005) after the merge of intel/llvm#9992 ``` /__w/llvm/llvm/src/sycl/plugins/hip/pi_hip.cpp:5635:24: error: unused parameter 'sync_point' [-Werror,-Wunused-parameter] 5635 | pi_ext_sync_point *sync_point) { | ^ /__w/llvm/llvm/src/sycl/plugins/hip/pi_hip.cpp:5691:12: error: unused parameter 'dst_row_pitch' [-Werror,-Wunused-parameter] 5691 | size_t dst_row_pitch, size_t dst_slice_pitch, | ^ /__w/llvm/llvm/src/sycl/plugins/hip/pi_hip.cpp:5691:34: error: unused parameter 'dst_slice_pitch' [-Werror,-Wunused-parameter] 5691 | size_t dst_row_pitch, size_t dst_slice_pitch, | ^ 3 errors generated. ```

This patch re-introduces the fix from intel/llvm#8765 Which seems to have been accidentally dropped by the UR port.

… classes (#10104) This PR fixes an issue with overhead when calling the same kernel multiple times in a loop. Right now, some calls to CUDA API happen every time the kernel is invoked to query the same information. Those calls were moved to `device` and `kernel` constructors and the info was cached as private members of those classes.

These checks are already performed by the UR validation layer

This implements the current extension doc from intel/llvm#6104 in the CUDA backend only. Fixes intel/llvm#7543. Fixes intel/llvm#6749. --------- Signed-off-by: JackAKirk <jack.kirk@codeplay.com> Co-authored-by: Nicolas Miller <nicolas.miller@codeplay.com> Co-authored-by: JackAKirk <chezjakirk@gmail.com> Co-authored-by: Steffen Larsen <steffen.larsen@intel.com>

Currently if `cuInit` fails while getting platforms it returns UR_RESULT_SUCCESS. Instead we use `UR_CHECK_ERROR` to return the correct error code.

This was preventing out-of-tree build of the adapter for standalone use with unified runtime. Signed-off-by: Piotr Balcer <piotr.balcer@intel.com>

…s images extension (#10112) # Experimental Implementation of SYCL Bindless Images Extension This commit stands as the second commit of four to make code review easier, implementing revision 4 of the [bindless images extension proposal](intel/llvm#9842). ## Scope This PR covers changes made to the PI and the UR. This includes - Extending PI with extension functions - Updating UR FetchContent commit and implementing [UR bindless images experimental features](https://oneapi-src.github.io/unified-runtime/core/EXP-BINDLESS-IMAGES.html) on the CUDA adaptor ## Following Split PRs - [3/4] Implement the user-facing SYCL extension - [4/4] Add tests ## Authors Co-authored-by: Isaac Ault <isaac.ault@codeplay.com> Co-authored-by: Hugh Bird <hugh.bird@codeplay.com> Co-authored-by: Duncan Brawley <duncan.brawley@codeplay.com> Co-authored-by: Przemek Malon <przemek.malon@codeplay.com> Co-authored-by: Chedy Najjar <chedy.najjar@codeplay.com> Co-authored-by: Sean Stirling <sean.stirling@codeplay.com> Co-authored-by: Peter Zuzek <peter@codeplay.com>

…#10248)

In CUDA objects are represented as integers rather than opaque handles. This patch fixes a segmentation fault when creating a device handle from a native handle by avoiding dereferencing a pointer which should be treated as an integer.

- Fixed compiler errors/warnings related to unused and uninitialized variables and parameters. Post-commit fix for PR: intel/llvm#10112

After the recent device partition changes in the UR spec (i.e. ur_device_partition_property_t), level_zero, cuda and hip adapters are returning incorrect values and types for UR_DEVICE_INFO_SUPPORTED_PARTITIONS and UR_DEVICE_INFO_PARTITION_TYPE. This PR fixes this issues with the adapters and updates pi2ur to correctly convert between ur_device_partition_properties_t and pi_device_partition_property.

…#9294) This change adds a SYCL interface to the Level Zero APIs zexDriverImportExternalPointer and zexDriverReleaseImportedPointer. These functions are used for importing host memory into USM for the duration of data transfer to increase bandwidth.

Cache the max local mem size so that we can call less CUDA driver entry points at `urEnqueueKernelLaunch`. Also allows us to query the value set for `SYCL_PI_CUDA_MAX_LOCAL_MEM_SIZE` using `device.get_info<sycl::info::device::local_mem_size>()`

During the port to UR the CUDA and HIP PI plugin ABI's were unintentionally changed. There does not appear to be symbol checks for these plugins, unlike the [Level Zero symbol check](https://github.com/intel/llvm/blob/sycl/sycl/test/abi/pi_level_zero_symbol_check.dump) and [OpenCL symbol check](https://github.com/intel/llvm/blob/sycl/sycl/test/abi/pi_opencl_symbol_check.dump). As such, the ABI change went unnoticed until intel/llvm#10490 was opened using the same approach for the OpenCL port, which [failed](https://github.com/intel/llvm/actions/runs/5610646255/job/15200624025?pr=10490) the OpenCL symbol check. This PR restores the expected ABI for the CUDA and HIP plugins and introduces new CUDA and HIP symbol check tests.

Bump the Unified Runtime commit, and make adapter changes needed for the newly added adapter handles (see oneapi-src#715 for details) This fixes #10066 by providing an implementation of `piPluginGetLastError` in pi2ur.

The `UR_RESULT_ADAPTER_SPECIFIC_ERROR` was not returning an error to the SYCL RT which meant all errors were treated as warnings and ignored unless `SYCL_RT_WARNING_LEVEL` is set to geq 2. This changes things so the adapter specific error is now reported as such, meaning all uses `UR_RESULT_ADAPTER_SPECIFIC_ERROR` meant as warnings are now caught as errors. --------- Co-authored-by: Hugh Delaney <hughd94@gmail.com>

In the CUDA/HIP adapters `urKernelSetArgValue` was being used to implement both `urKernelSetArgValue` & `urKernelSetArgLocal`. However, if the validation layer is enabled in UR then the path to set local arg is never taken since it includes a check that `pArgValue` is not null. This PR: * Implements `urKernelSetArgLocal` for CUDA/HIP adapters * Changes `pi2ur` to call `urKernelSetArgLocal` when `arg_value` is `nullptr` * Implements `urKernelSetArgLocal` for L0 adapter - this just calls back into `urKernelSetArgValue`.

Fix the license headers at the top of each source file in the unified runtime directory. --------- Co-authored-by: Alexey Bader <alexey.bader@intel.com>

This PR adds missing functions in the hip backend to allow for interoperability in programs that create sycl objects from native hip objects. The new function implementations are: - `make_device` - `make_queue` - `make_event` Note that it would really make sense for intel/llvm#10491 to be merged first because this PR makes the same code change in pi2ur, for a fix that is attributed to #10491. --------- Signed-off-by: Jack Kirk <jack.kirk@codeplay.com>

intel/llvm#10691 missed these extra cases. This PR adds the needed ifdefs

… (#11023) Resolve same issue fixed in [PR](intel/llvm#10034) Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com>

Lots of hip/cu driver API calls were wrapped in `ur::assertion(res == CU_SUCCESS)` etc which: - Means that any native error messages returned from the affected driver api calls were lost. - Since these APIs report errors asynchronously, such that they are thrown from the last API call rather than the call which led to the error, previous asynchronous error messages from different APIs to the ones wrapped by the `ur::assertion` could also be lost depending on user code. These problems are fixed by swapping these assertions with `UR_CHECK_ERROR`. Note that in the future UR may want to adjust `UR_CHECK_ERROR` so that it throws `UR_RESULT_ERROR_ADAPTER_SPECIFIC` etc instead of using `std::cerr` etc to report the error etc. But I think it makes sense to still use `UR_CHECK_ERROR` to wrap driver API calls because it means that the __LINE__, __FUNCTION__ etc info can be correctly passed to native error reporting. --------- Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

- Support was added for the following image channel types: - `unorm_int8` - `unorm_int16` - `snorm_int8` - `snorm_int16` - Reading these types through `read_image` returns the denormalized floating point data. - A test was added for these new types. - Support for the following packed normalized image channel types was removed from the proposal: - `unorm_short_565` - `unorm_short_555` - `unorm_int_101010` - This was done due to lack of device support. If the need for these types arises in the future, we can revisit support for these types.

Fetch the latest revision of unified runtime: Notable changes * Some command buffer entry-points have been renamed, also 2 additions of membufferFill and USMFill * UrInit/urTearDown have been removed - replaced with loader only versions.

This change adds a new aspect for esimd, `ext_intel_esimd`, and annotates the two fundamental esimd classes, `simd_obj_impl` and `simd_view_impl` with the `uses_aspect` attribute. `simd_obj_impl` is the base class of `simd` and `simd_mask` which are the fundamental user-facing classes. `simd_view_impl` is the base of only `simd_view`. `simd_obj_impl` is not a base of `simd_view_impl`, but every `simd_view_impl` requires a reference to a `simd` or `simd_mask` at construction time, so I am not sure if we truly need to annotate `simd_view_impl`, but I added it to be safe. It also adds a new PI device info query, `PI_EXT_INTEL_DEVICE_INFO_ESIMD_SUPPORT` that is used to query at runtime if a device supports ESIMD. For UR-based plugins, we map that to `UR_DEVICE_INFO_ESIMD_SUPPOR`. The implementation simply returns false for cuda, hip and native_cpu. For l0 and opencl, we check that the device is an intel gpu by querying the device type is gpu and the vendor id is `0x8086`. For ESIMD emulator we simply return true. I would appreciate careful review on the plugin changes in particular, as I am not an expert. This change also updates the esimd spec to document the new aspect. In a future change, I plan to use the new aspect to remove the requirement for the `SYCL_ESIMD_FUNCTION` and `SYCL_ESIMD_KERNEL` macros that set function attributes, but I am not doing that as part of this PR as it requires more investigation. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

Reverts intel/llvm#11155. I'm almost sure this is what broke our CI for the past few days.

- Return correct error code in urContextCreateWithNativeHandle - Add UR_DEVICE_INFO_VIRTUAL_MEMORY_SUPPORT to urDeviceGetInfo - Move asserts from urEnqueueMemBufferFill to UR validation layers (commit 2c533e6 on UR repository) - Make behaviour of urEventSetCallback consistent with other unsupported entrypoints. - urProgramGetInfo and urQueueGetInfo now return UR_RESULT_ERROR_UNSUPPORTED_ENUMERATION for unsupported cases. - Add entrypoints for urSamplerGetNativeHandle and urSamplerCreateWithNativeHandle - Fix UR_USM_ALLOC_INFO_BASE_PTR and UR_USM_ALLOC_INFO_SIZE implementation.

`UR_CHECK_ERROR` was designed to return `ur_result_t`, however in practice it was guaranteed to only ever return `UR_RESULT_SUCCESS`, as other paths would either terminate, abort or throw. This in turns leads to poor quality/error prone code, as the codebase was littered with: * statements not checking the return value - depending on the compiler generating a warning, * extra check on the return which was only ever going to be true. Some care was required, as the codebase has a habit of accumulating err codes across branches, so depending on the use case the initial value of `ur_result_t Result`s had to be set accordingly (now that `UR_CHECK_ERROR` does not return).

source/adapters/cuda/CMakeLists.txt

source/adapters/cuda/ur_interface_loader.cpp

veselypeta · 2023-09-27T12:54:01Z

source/adapters/cuda/CMakeLists.txt

 endif()

+if(UMF_ENABLE_POOL_TRACKING)
+  target_compile_definitions("ur_adapter_cuda" PRIVATE UMF_ENABLE_POOL_TRACKING)


Should we add this option in the readme?

as far as I know the UMF team are reworking stuff so that this option isn't needed, hopefully its days are numbered

This is already an option in UR. It was just not being used in the adapter

veselypeta · 2023-09-27T12:55:10Z

source/adapters/cuda/adapter.cpp

@@ -0,0 +1,89 @@
+//===--------- adapter.cpp - CUDA Adapter ---------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.


These licenses will need updating too, but a follow up PR is fine.

I will add this in a follow up PR

fabiomestre closed this Sep 26, 2023

fabiomestre reopened this Sep 26, 2023

callumfare and others added 28 commits September 26, 2023 16:44

[SYCL] Fix self-build compilation after PR #9873 (#10018)

47b134a

[SYCL][CUDA] Fix broken synchronization of CUstreams (#10031)

b992d69

This fixes a change in `hasBeenSynchronized` accidentally introduced during the UR port

Partially reverts back the update on guess local work size function. …

c71ee7c

…(#10055) * Reverts back the update to calculation of threads per block for 0th dimension when primary ranges are involved. That could cause out of range access.

[SYCL][UR] Bump UR version and add missing values to pi2ur (#10049)

7381044

Includes a large number of missing `pi_result` mappings, so previously reported errors should no longer map to just `PI_ERROR_UNKNOWN`. NFCI for the adapters.

[SYCL][UR][CUDA] Fix typo in ur_device_handle_t (#10130)

0123900

The destructor should be calling `cuDevicePrimaryCtxRelease`

[SYCL][CUDA] Fix cuDeviceGetUuid for older CUDA version (#10205)

8558159

This patch re-introduces the fix from intel/llvm#8765 Which seems to have been accidentally dropped by the UR port.

[SYCL][UR][CUDA] Remove unnecessary UR entry point validation (#9969)

470aaf1

These checks are already performed by the UR validation layer

[PI][UR] Report error from cuInit (#10353)

5967f17

Currently if `cuInit` fails while getting platforms it returns UR_RESULT_SUCCESS. Instead we use `UR_CHECK_ERROR` to return the correct error code.

[SYCL][CUDA] Remove SYCL dependencies from CUDA UR adapter (#10135)

4701e1e

This was preventing out-of-tree build of the adapter for standalone use with unified runtime. Signed-off-by: Piotr Balcer <piotr.balcer@intel.com>

[SYCL][UR][CUDA] Fix some issues in cuda adapter's urDeviceGetInfo. (…

26eba89

…#10248)

[SYCL][Bindless] Fix compiler Werrors (#10464)

e1ae1e2

- Fixed compiler errors/warnings related to unused and uninitialized variables and parameters. Post-commit fix for PR: intel/llvm#10112

[SYCL][UR] Bump UR and implement adapter handles (#10349)

4c12a32

Bump the Unified Runtime commit, and make adapter changes needed for the newly added adapter handles (see oneapi-src#715 for details) This fixes #10066 by providing an implementation of `piPluginGetLastError` in pi2ur.

veselypeta and others added 14 commits September 26, 2023 16:44

[UR][CUDA][HIP][L0] Cleanup licence header (#10824)

a75b944

Fix the license headers at the top of each source file in the unified runtime directory. --------- Co-authored-by: Alexey Bader <alexey.bader@intel.com>

[UR][CUDA] Ifdef remaining CUDA 11 datatypes (#10922)

da60f90

intel/llvm#10691 missed these extra cases. This PR adds the needed ifdefs

[SYCL][CUDA] Reintroduce removing of size checks from USM allocations…

901787e

… (#11023) Resolve same issue fixed in [PR](intel/llvm#10034) Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com>

[SYCL][UR][CUDA] Mirror SYCL_PI env vars with UR prefix (#10045)

54cead7

Revert "[PI][UR][HIP][CUDA][L0][NCPU] Pull in latest UR fixes." (#11227)

40e0901

Reverts intel/llvm#11155. I'm almost sure this is what broke our CI for the past few days.

[SYCL][NFC] Remove unused lambda capture (#11269)

9c3acb1

fabiomestre force-pushed the fabio/move_cuda_adapter branch 5 times, most recently from 8879199 to 245afd0 Compare September 27, 2023 11:26

[L0] Add missing loader interfaces to L0 adapter

8985991

fabiomestre force-pushed the fabio/move_cuda_adapter branch from 245afd0 to 3accba7 Compare September 27, 2023 12:25

fabiomestre marked this pull request as ready for review September 27, 2023 12:31

callumfare approved these changes Sep 27, 2023

View reviewed changes

source/adapters/cuda/CMakeLists.txt Show resolved Hide resolved

source/adapters/cuda/ur_interface_loader.cpp Show resolved Hide resolved

veselypeta reviewed Sep 27, 2023

View reviewed changes

veselypeta approved these changes Sep 27, 2023

View reviewed changes

fabiomestre force-pushed the fabio/move_cuda_adapter branch from a5c8d9e to 9d35c99 Compare September 27, 2023 13:16

[CUDA] Move adapter sources into the cuda adapter folder

e04e2f9

fabiomestre force-pushed the fabio/move_cuda_adapter branch from e48c9b8 to e04e2f9 Compare September 27, 2023 14:05

fabiomestre merged commit 9461c79 into oneapi-src:adapters Sep 27, 2023

kbenzie mentioned this pull request Sep 29, 2023

[SYCL][CUDA] Implement ext_oneapi_queue_priority intel/llvm#11297

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move CUDA sources from intel/llvm to UR #896

Move CUDA sources from intel/llvm to UR #896

Uh oh!

fabiomestre commented Sep 26, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

veselypeta Sep 27, 2023

Uh oh!

aarongreig Sep 27, 2023

Uh oh!

fabiomestre Sep 27, 2023

Uh oh!

veselypeta Sep 27, 2023

Uh oh!

fabiomestre Sep 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

Move CUDA sources from intel/llvm to UR #896

Move CUDA sources from intel/llvm to UR #896

Uh oh!

Conversation

fabiomestre commented Sep 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

veselypeta Sep 27, 2023

Choose a reason for hiding this comment

Uh oh!

aarongreig Sep 27, 2023

Choose a reason for hiding this comment

Uh oh!

fabiomestre Sep 27, 2023

Choose a reason for hiding this comment

Uh oh!

veselypeta Sep 27, 2023

Choose a reason for hiding this comment

Uh oh!

fabiomestre Sep 27, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

fabiomestre commented Sep 26, 2023 •

edited

Loading