-
Notifications
You must be signed in to change notification settings - Fork 124
[UR][HIP] Move HIP adapter sourcer to Unified Runtime #903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This moves the HIP plugin implementation to Unified Runtime; and changes the pi_hip plugin to use pi2ur to implement PI. The changes to the implementation have been kept to a minimum and should be functionally the same. Documentation and comments have been moved verbatim, other than changing PI references to UR. This PR is based on top of the CUDA adapter (intel/llvm#9512) so will only be ready when that is merged. --------- Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com> Co-authored-by: Petr Vesely <veselypeta@gmail.com> Co-authored-by: Callum Fare <callum@codeplay.com> Co-authored-by: Aaron Greig <aaron.greig@codeplay.com>
fixes a post commit failure for hip porting [PR](intel/llvm#9617) Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com>
After the recent device partition changes in the UR spec (i.e. ur_device_partition_property_t), level_zero, cuda and hip adapters are returning incorrect values and types for UR_DEVICE_INFO_SUPPORTED_PARTITIONS and UR_DEVICE_INFO_PARTITION_TYPE. This PR fixes this issues with the adapters and updates pi2ur to correctly convert between ur_device_partition_properties_t and pi_device_partition_property.
…#9294) This change adds a SYCL interface to the Level Zero APIs zexDriverImportExternalPointer and zexDriverReleaseImportedPointer. These functions are used for importing host memory into USM for the duration of data transfer to increase bandwidth.
The primary context has been default for a while in CUDA PI/Adapter. See intel/llvm#8197. This PR brings the HIP adapter up to speed. It also changes the scoped context to only take a `ur_device_handle_t` since this is coupled with a native primary context in HIP
During the port to UR the CUDA and HIP PI plugin ABI's were unintentionally changed. There does not appear to be symbol checks for these plugins, unlike the [Level Zero symbol check](https://github.com/intel/llvm/blob/sycl/sycl/test/abi/pi_level_zero_symbol_check.dump) and [OpenCL symbol check](https://github.com/intel/llvm/blob/sycl/sycl/test/abi/pi_opencl_symbol_check.dump). As such, the ABI change went unnoticed until intel/llvm#10490 was opened using the same approach for the OpenCL port, which [failed](https://github.com/intel/llvm/actions/runs/5610646255/job/15200624025?pr=10490) the OpenCL symbol check. This PR restores the expected ABI for the CUDA and HIP plugins and introduces new CUDA and HIP symbol check tests.
Bump the Unified Runtime commit, and make adapter changes needed for the newly added adapter handles (see oneapi-src#715 for details) This fixes #10066 by providing an implementation of `piPluginGetLastError` in pi2ur.
event->record was being called before event->start, which resulted in event->record failing silently. This fixes that.
This change is necessary to workaround a delightful bug in either HIP runtime, or the HIP spec. It's discussed at length in github.com/intel/llvm/issues/7252 but for the purposes of this patch, it suffices to say that a call to `hipMemPrefetchAsync` is *required* for correctness in the face of global atomic operations on (*at least*) shared USM allocations. The architecture of this change is slightly strange on first sight in that we reduntantly track allocation information in several places. The context now keeps track of all USM mappings. We require a mapping of pointers to the allocated size, but these allocations aren't pinned to any particular queue or HIP stream. The `hipMemPrefetchAsync`, however, requires the associated HIP stream object, and the size of the allocation. The stream comes hot-off-the-queue *only* just before a kernel is launched, so we need to defer the prefetch until we have that information. Finally, the kernel itself keeps track of pointer arguments in a more accessible way so we can determine which of the kernel's pointer arguments do, in-fact, point to USM allocations.
In the CUDA/HIP adapters `urKernelSetArgValue` was being used to implement both `urKernelSetArgValue` & `urKernelSetArgLocal`. However, if the validation layer is enabled in UR then the path to set local arg is never taken since it includes a check that `pArgValue` is not null. This PR: * Implements `urKernelSetArgLocal` for CUDA/HIP adapters * Changes `pi2ur` to call `urKernelSetArgLocal` when `arg_value` is `nullptr` * Implements `urKernelSetArgLocal` for L0 adapter - this just calls back into `urKernelSetArgValue`.
In the HIP adapter the HIP module is not set until the program is built with `urProgramBuild`, therefore we should check that the module actually needs to be unloaded in `urProgramRelease`. The following will result in a failure, but should still be valid UR trace: ```cpp uint8_t *source = "<some ptx>"; ur_program_handle_t prog; urProgramCreate(context, device, sizeof(source), ptxSource, nullptr, &prog); urProgramRelease(prog); // fails when it tries to unload the module. ```
Fix the license headers at the top of each source file in the unified runtime directory. --------- Co-authored-by: Alexey Bader <alexey.bader@intel.com>
This PR adds missing functions in the hip backend to allow for interoperability in programs that create sycl objects from native hip objects. The new function implementations are: - `make_device` - `make_queue` - `make_event` Note that it would really make sense for intel/llvm#10491 to be merged first because this PR makes the same code change in pi2ur, for a fix that is attributed to #10491. --------- Signed-off-by: Jack Kirk <jack.kirk@codeplay.com>
Adds `hipDeviceAttributeManagedMemory` attribute check to verify the HIP device supports managed memory in order to use the prefetch API as well as checking the pointer to migrate is actually a managed allocation (USM/SVM). If the check fails we return early and set a warning message (with `UR_RESULT_SUCCESS` as return code). As a follow-up. we may need to detect system-allocated memory and either continue with just a warning message or throw a more targeted error message and exit.
Lots of hip/cu driver API calls were wrapped in `ur::assertion(res == CU_SUCCESS)` etc which: - Means that any native error messages returned from the affected driver api calls were lost. - Since these APIs report errors asynchronously, such that they are thrown from the last API call rather than the call which led to the error, previous asynchronous error messages from different APIs to the ones wrapped by the `ur::assertion` could also be lost depending on user code. These problems are fixed by swapping these assertions with `UR_CHECK_ERROR`. Note that in the future UR may want to adjust `UR_CHECK_ERROR` so that it throws `UR_RESULT_ERROR_ADAPTER_SPECIFIC` etc instead of using `std::cerr` etc to report the error etc. But I think it makes sense to still use `UR_CHECK_ERROR` to wrap driver API calls because it means that the __LINE__, __FUNCTION__ etc info can be correctly passed to native error reporting. --------- Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Based on #10758 Co-authored-by: Ronan Keryell <ronan@keryell.fr>
…parameter 'Alignment' warning (#11154) Fixes https://github.com/intel/llvm/actions/runs/6154418223/job/16699759620
Fetch the latest revision of unified runtime: Notable changes * Some command buffer entry-points have been renamed, also 2 additions of membufferFill and USMFill * UrInit/urTearDown have been removed - replaced with loader only versions.
This change adds a new aspect for esimd, `ext_intel_esimd`, and annotates the two fundamental esimd classes, `simd_obj_impl` and `simd_view_impl` with the `uses_aspect` attribute. `simd_obj_impl` is the base class of `simd` and `simd_mask` which are the fundamental user-facing classes. `simd_view_impl` is the base of only `simd_view`. `simd_obj_impl` is not a base of `simd_view_impl`, but every `simd_view_impl` requires a reference to a `simd` or `simd_mask` at construction time, so I am not sure if we truly need to annotate `simd_view_impl`, but I added it to be safe. It also adds a new PI device info query, `PI_EXT_INTEL_DEVICE_INFO_ESIMD_SUPPORT` that is used to query at runtime if a device supports ESIMD. For UR-based plugins, we map that to `UR_DEVICE_INFO_ESIMD_SUPPOR`. The implementation simply returns false for cuda, hip and native_cpu. For l0 and opencl, we check that the device is an intel gpu by querying the device type is gpu and the vendor id is `0x8086`. For ESIMD emulator we simply return true. I would appreciate careful review on the plugin changes in particular, as I am not an expert. This change also updates the esimd spec to document the new aspect. In a future change, I plan to use the new aspect to remove the requirement for the `SYCL_ESIMD_FUNCTION` and `SYCL_ESIMD_KERNEL` macros that set function attributes, but I am not doing that as part of this PR as it requires more investigation. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Reverts intel/llvm#11155. I'm almost sure this is what broke our CI for the past few days.
The main fix here is to enable prefetch functionality when the HIP backend is built with rocm versions < 5. The change to the prefix/mem_advise UR hip adapter APIs fixes four e2e-tests that fail test-e2e only for HIP 4.x versions (These failures don't come up in the CI since it tests using rocm 5.x). The change to Tracing/image_printers.cpp is effectively re-XFailing this test for rocm 4 only. rocm 4 doesn't support `hipCreateSurfaceObject` which is called by this test. I'm not sure this legacy image functionality (Which as I understand it will be replaced by bindless images) is really working for rocm 5 in a meaningful way, since most other legacy image e2e tests are XFAIL for hip. But this test can still be useful for us for ROCM 5 compatibility testing. The XFAIL was recently removed here intel/llvm@745febe which led to us finding the rocm 4 fail. I guess that the CI at some point switched from testing rocm 4 to testing rocm 5, which meant this test stopped failing in CI. --------- Signed-off-by: Jack Kirk <jack.kirk@codeplay.com>
`UR_CHECK_ERROR` was designed to return `ur_result_t`, however in practice it was guaranteed to only ever return `UR_RESULT_SUCCESS`, as other paths would either terminate, abort or throw. This in turns leads to poor quality/error prone code, as the codebase was littered with: * statements not checking the return value - depending on the compiler generating a warning, * extra check on the return which was only ever going to be true. Some care was required, as the codebase has a habit of accumulating err codes across branches, so depending on the use case the initial value of `ur_result_t Result`s had to be set accordingly (now that `UR_CHECK_ERROR` does not return).
callumfare
approved these changes
Sep 29, 2023
fabiomestre
approved these changes
Oct 2, 2023
Contributor
fabiomestre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[Optional] Since this PR is going to be rebased, it could be cleaner to manually squash the 5 commits related to the code move.
omarahmed1111
approved these changes
Oct 2, 2023
kbenzie
approved these changes
Oct 2, 2023
steffenlarsen
pushed a commit
to intel/llvm
that referenced
this pull request
Oct 9, 2023
Removes the adapter code for the HIP plugin which is now available as part of unified runtime as part of oneapi-src/unified-runtime#903
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR: