[UR][HIP] Move HIP adapter sourcer to Unified Runtime #903

veselypeta · 2023-09-28T12:13:58Z

This PR:

Moves the adapter code from intel/llvm to unified runtime
Updates licenses to unified-runtime
Includes image.cpp in the cmake
Adds the virtual memory loader interfaces

This moves the HIP plugin implementation to Unified Runtime; and changes the pi_hip plugin to use pi2ur to implement PI. The changes to the implementation have been kept to a minimum and should be functionally the same. Documentation and comments have been moved verbatim, other than changing PI references to UR. This PR is based on top of the CUDA adapter (intel/llvm#9512) so will only be ready when that is merged. --------- Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com> Co-authored-by: Petr Vesely <veselypeta@gmail.com> Co-authored-by: Callum Fare <callum@codeplay.com> Co-authored-by: Aaron Greig <aaron.greig@codeplay.com>

fixes a post commit failure for hip porting [PR](intel/llvm#9617) Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com>

After the recent device partition changes in the UR spec (i.e. ur_device_partition_property_t), level_zero, cuda and hip adapters are returning incorrect values and types for UR_DEVICE_INFO_SUPPORTED_PARTITIONS and UR_DEVICE_INFO_PARTITION_TYPE. This PR fixes this issues with the adapters and updates pi2ur to correctly convert between ur_device_partition_properties_t and pi_device_partition_property.

…#9294) This change adds a SYCL interface to the Level Zero APIs zexDriverImportExternalPointer and zexDriverReleaseImportedPointer. These functions are used for importing host memory into USM for the duration of data transfer to increase bandwidth.

The primary context has been default for a while in CUDA PI/Adapter. See intel/llvm#8197. This PR brings the HIP adapter up to speed. It also changes the scoped context to only take a `ur_device_handle_t` since this is coupled with a native primary context in HIP

During the port to UR the CUDA and HIP PI plugin ABI's were unintentionally changed. There does not appear to be symbol checks for these plugins, unlike the [Level Zero symbol check](https://github.com/intel/llvm/blob/sycl/sycl/test/abi/pi_level_zero_symbol_check.dump) and [OpenCL symbol check](https://github.com/intel/llvm/blob/sycl/sycl/test/abi/pi_opencl_symbol_check.dump). As such, the ABI change went unnoticed until intel/llvm#10490 was opened using the same approach for the OpenCL port, which [failed](https://github.com/intel/llvm/actions/runs/5610646255/job/15200624025?pr=10490) the OpenCL symbol check. This PR restores the expected ABI for the CUDA and HIP plugins and introduces new CUDA and HIP symbol check tests.

Bump the Unified Runtime commit, and make adapter changes needed for the newly added adapter handles (see oneapi-src#715 for details) This fixes #10066 by providing an implementation of `piPluginGetLastError` in pi2ur.

event->record was being called before event->start, which resulted in event->record failing silently. This fixes that.

This change is necessary to workaround a delightful bug in either HIP runtime, or the HIP spec. It's discussed at length in github.com/intel/llvm/issues/7252 but for the purposes of this patch, it suffices to say that a call to `hipMemPrefetchAsync` is *required* for correctness in the face of global atomic operations on (*at least*) shared USM allocations. The architecture of this change is slightly strange on first sight in that we reduntantly track allocation information in several places. The context now keeps track of all USM mappings. We require a mapping of pointers to the allocated size, but these allocations aren't pinned to any particular queue or HIP stream. The `hipMemPrefetchAsync`, however, requires the associated HIP stream object, and the size of the allocation. The stream comes hot-off-the-queue *only* just before a kernel is launched, so we need to defer the prefetch until we have that information. Finally, the kernel itself keeps track of pointer arguments in a more accessible way so we can determine which of the kernel's pointer arguments do, in-fact, point to USM allocations.

In the CUDA/HIP adapters `urKernelSetArgValue` was being used to implement both `urKernelSetArgValue` & `urKernelSetArgLocal`. However, if the validation layer is enabled in UR then the path to set local arg is never taken since it includes a check that `pArgValue` is not null. This PR: * Implements `urKernelSetArgLocal` for CUDA/HIP adapters * Changes `pi2ur` to call `urKernelSetArgLocal` when `arg_value` is `nullptr` * Implements `urKernelSetArgLocal` for L0 adapter - this just calls back into `urKernelSetArgValue`.

In the HIP adapter the HIP module is not set until the program is built with `urProgramBuild`, therefore we should check that the module actually needs to be unloaded in `urProgramRelease`. The following will result in a failure, but should still be valid UR trace: ```cpp uint8_t *source = "<some ptx>"; ur_program_handle_t prog; urProgramCreate(context, device, sizeof(source), ptxSource, nullptr, &prog); urProgramRelease(prog); // fails when it tries to unload the module. ```

Fix the license headers at the top of each source file in the unified runtime directory. --------- Co-authored-by: Alexey Bader <alexey.bader@intel.com>

This PR adds missing functions in the hip backend to allow for interoperability in programs that create sycl objects from native hip objects. The new function implementations are: - `make_device` - `make_queue` - `make_event` Note that it would really make sense for intel/llvm#10491 to be merged first because this PR makes the same code change in pi2ur, for a fix that is attributed to #10491. --------- Signed-off-by: Jack Kirk <jack.kirk@codeplay.com>

Adds `hipDeviceAttributeManagedMemory` attribute check to verify the HIP device supports managed memory in order to use the prefetch API as well as checking the pointer to migrate is actually a managed allocation (USM/SVM). If the check fails we return early and set a warning message (with `UR_RESULT_SUCCESS` as return code). As a follow-up. we may need to detect system-allocated memory and either continue with just a warning message or throw a more targeted error message and exit.

Lots of hip/cu driver API calls were wrapped in `ur::assertion(res == CU_SUCCESS)` etc which: - Means that any native error messages returned from the affected driver api calls were lost. - Since these APIs report errors asynchronously, such that they are thrown from the last API call rather than the call which led to the error, previous asynchronous error messages from different APIs to the ones wrapped by the `ur::assertion` could also be lost depending on user code. These problems are fixed by swapping these assertions with `UR_CHECK_ERROR`. Note that in the future UR may want to adjust `UR_CHECK_ERROR` so that it throws `UR_RESULT_ERROR_ADAPTER_SPECIFIC` etc instead of using `std::cerr` etc to report the error etc. But I think it makes sense to still use `UR_CHECK_ERROR` to wrap driver API calls because it means that the __LINE__, __FUNCTION__ etc info can be correctly passed to native error reporting. --------- Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Based on #10758 Co-authored-by: Ronan Keryell <ronan@keryell.fr>

…parameter 'Alignment' warning (#11154) Fixes https://github.com/intel/llvm/actions/runs/6154418223/job/16699759620

Fetch the latest revision of unified runtime: Notable changes * Some command buffer entry-points have been renamed, also 2 additions of membufferFill and USMFill * UrInit/urTearDown have been removed - replaced with loader only versions.

This change adds a new aspect for esimd, `ext_intel_esimd`, and annotates the two fundamental esimd classes, `simd_obj_impl` and `simd_view_impl` with the `uses_aspect` attribute. `simd_obj_impl` is the base class of `simd` and `simd_mask` which are the fundamental user-facing classes. `simd_view_impl` is the base of only `simd_view`. `simd_obj_impl` is not a base of `simd_view_impl`, but every `simd_view_impl` requires a reference to a `simd` or `simd_mask` at construction time, so I am not sure if we truly need to annotate `simd_view_impl`, but I added it to be safe. It also adds a new PI device info query, `PI_EXT_INTEL_DEVICE_INFO_ESIMD_SUPPORT` that is used to query at runtime if a device supports ESIMD. For UR-based plugins, we map that to `UR_DEVICE_INFO_ESIMD_SUPPOR`. The implementation simply returns false for cuda, hip and native_cpu. For l0 and opencl, we check that the device is an intel gpu by querying the device type is gpu and the vendor id is `0x8086`. For ESIMD emulator we simply return true. I would appreciate careful review on the plugin changes in particular, as I am not an expert. This change also updates the esimd spec to document the new aspect. In a future change, I plan to use the new aspect to remove the requirement for the `SYCL_ESIMD_FUNCTION` and `SYCL_ESIMD_KERNEL` macros that set function attributes, but I am not doing that as part of this PR as it requires more investigation. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

Reverts intel/llvm#11155. I'm almost sure this is what broke our CI for the past few days.

The main fix here is to enable prefetch functionality when the HIP backend is built with rocm versions < 5. The change to the prefix/mem_advise UR hip adapter APIs fixes four e2e-tests that fail test-e2e only for HIP 4.x versions (These failures don't come up in the CI since it tests using rocm 5.x). The change to Tracing/image_printers.cpp is effectively re-XFailing this test for rocm 4 only. rocm 4 doesn't support `hipCreateSurfaceObject` which is called by this test. I'm not sure this legacy image functionality (Which as I understand it will be replaced by bindless images) is really working for rocm 5 in a meaningful way, since most other legacy image e2e tests are XFAIL for hip. But this test can still be useful for us for ROCM 5 compatibility testing. The XFAIL was recently removed here intel/llvm@745febe which led to us finding the rocm 4 fail. I guess that the CI at some point switched from testing rocm 4 to testing rocm 5, which meant this test stopped failing in CI. --------- Signed-off-by: Jack Kirk <jack.kirk@codeplay.com>

`UR_CHECK_ERROR` was designed to return `ur_result_t`, however in practice it was guaranteed to only ever return `UR_RESULT_SUCCESS`, as other paths would either terminate, abort or throw. This in turns leads to poor quality/error prone code, as the codebase was littered with: * statements not checking the return value - depending on the compiler generating a warning, * extra check on the return which was only ever going to be true. Some care was required, as the codebase has a habit of accumulating err codes across branches, so depending on the use case the initial value of `ur_result_t Result`s had to be set accordingly (now that `UR_CHECK_ERROR` does not return).

fabiomestre

LGTM

[Optional] Since this PR is going to be rebased, it could be cleaner to manually squash the 5 commits related to the code move.

Removes the adapter code for the HIP plugin which is now available as part of unified runtime as part of oneapi-src/unified-runtime#903

omarahmed1111 and others added 27 commits July 13, 2023 16:01

[SYCL][UR][HIP] remove unused parameter in ur hip (#10357)

e56d66b

fixes a post commit failure for hip porting [PR](intel/llvm#9617) Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com>

[SYCL][UR] Bump UR and implement adapter handles (#10349)

3ca5fba

Bump the Unified Runtime commit, and make adapter changes needed for the newly added adapter handles (see oneapi-src#715 for details) This fixes #10066 by providing an implementation of `piPluginGetLastError` in pi2ur.

[CUDA][HIP][UR] Fix race condition in adapters (#10826)

c3280d9

event->record was being called before event->start, which resulted in event->record failing silently. This fixes that.

[UR][CUDA][HIP][L0] Cleanup licence header (#10824)

e0c87a5

Fix the license headers at the top of each source file in the unified runtime directory. --------- Co-authored-by: Alexey Bader <alexey.bader@intel.com>

[SYCL][UR] Implement usm memory pool for hip adapter. (#11065)

73527bc

Based on #10758 Co-authored-by: Ronan Keryell <ronan@keryell.fr>

[SYCL][UR] Remove unused MaxAllocSize member variable and fix unused …

e08d63b

…parameter 'Alignment' warning (#11154) Fixes https://github.com/intel/llvm/actions/runs/6154418223/job/16699759620

Revert "[PI][UR][HIP][CUDA][L0][NCPU] Pull in latest UR fixes." (#11227)

eb86b33

Reverts intel/llvm#11155. I'm almost sure this is what broke our CI for the past few days.

[SYCL] Fix uninitialized variables (#11289)

f5e0ed5

Merge branch 'tmp_hip' into petr/move_hip

899f831

[UR][HIP] Copy adapter code from intel/llvm

2cc31e5

[UR][HIP] Fixup licenses and loader interface

20297af

[UR] Fix include path

fbd14e1

veselypeta mentioned this pull request Sep 28, 2023

[HIP][UR] Remove hip adapter source code from intel/llvm intel/llvm#11343

Merged

[HIP][UR] Fix hip include dir

48a03ac

callumfare approved these changes Sep 29, 2023

View reviewed changes

al42and mentioned this pull request Sep 29, 2023

[SYCL][CUDA] Implement ext_oneapi_queue_priority intel/llvm#11297

Merged

fabiomestre approved these changes Oct 2, 2023

View reviewed changes

omarahmed1111 approved these changes Oct 2, 2023

View reviewed changes

kbenzie approved these changes Oct 2, 2023

View reviewed changes

veselypeta merged commit 1c6fa3c into oneapi-src:adapters Oct 2, 2023

veselypeta deleted the petr/move_hip branch October 2, 2023 14:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[UR][HIP] Move HIP adapter sourcer to Unified Runtime #903

[UR][HIP] Move HIP adapter sourcer to Unified Runtime #903

Uh oh!

veselypeta commented Sep 28, 2023 •

edited

Loading

Uh oh!

fabiomestre left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

[UR][HIP] Move HIP adapter sourcer to Unified Runtime #903

[UR][HIP] Move HIP adapter sourcer to Unified Runtime #903

Uh oh!

Conversation

veselypeta commented Sep 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fabiomestre left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

veselypeta commented Sep 28, 2023 •

edited

Loading