Skip to content

Conversation

@veselypeta
Copy link
Contributor

@veselypeta veselypeta commented Sep 28, 2023

This PR:

  • Moves the adapter code from intel/llvm to unified runtime
  • Updates licenses to unified-runtime
  • Includes image.cpp in the cmake
  • Adds the virtual memory loader interfaces

omarahmed1111 and others added 27 commits July 13, 2023 16:01
This moves the HIP plugin implementation to Unified Runtime; and changes
the pi_hip plugin to use pi2ur to implement PI. The changes to the
implementation have been kept to a minimum and should be functionally
the same. Documentation and comments have been moved verbatim, other
than changing PI references to UR.

This PR is based on top of the CUDA adapter
(intel/llvm#9512) so will only be ready when
that is merged.

---------

Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com>
Co-authored-by: Petr Vesely <veselypeta@gmail.com>
Co-authored-by: Callum Fare <callum@codeplay.com>
Co-authored-by: Aaron Greig <aaron.greig@codeplay.com>
fixes a post commit failure for hip porting
[PR](intel/llvm#9617)

Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com>
After the recent device partition changes in the UR spec (i.e.
ur_device_partition_property_t), level_zero, cuda and hip adapters are
returning incorrect values and types for
UR_DEVICE_INFO_SUPPORTED_PARTITIONS and UR_DEVICE_INFO_PARTITION_TYPE.

This PR fixes this issues with the adapters and updates pi2ur to
correctly convert between ur_device_partition_properties_t and
pi_device_partition_property.
…#9294)

This change adds a SYCL interface to the Level Zero APIs
zexDriverImportExternalPointer and zexDriverReleaseImportedPointer.
These functions are used for importing host memory into USM for the
duration of data transfer to increase bandwidth.
The primary context has been default for a while in CUDA PI/Adapter. See
intel/llvm#8197.

This PR brings the HIP adapter up to speed.

It also changes the scoped context to only take a `ur_device_handle_t`
since this is coupled with a native primary context in HIP
During the port to UR the CUDA and HIP PI plugin ABI's were
unintentionally changed. There does not appear to be symbol checks for
these plugins, unlike the [Level Zero symbol
check](https://github.com/intel/llvm/blob/sycl/sycl/test/abi/pi_level_zero_symbol_check.dump)
and [OpenCL symbol
check](https://github.com/intel/llvm/blob/sycl/sycl/test/abi/pi_opencl_symbol_check.dump).
As such, the ABI change went unnoticed until
intel/llvm#10490 was opened using the same
approach for the OpenCL port, which
[failed](https://github.com/intel/llvm/actions/runs/5610646255/job/15200624025?pr=10490)
the OpenCL symbol check.

This PR restores the expected ABI for the CUDA and HIP plugins and
introduces new CUDA and HIP symbol check tests.
Bump the Unified Runtime commit, and make adapter changes needed for the
newly added adapter handles (see
oneapi-src#715 for details)

This fixes #10066 by providing an implementation of
`piPluginGetLastError` in pi2ur.
event->record was being called before event->start, which resulted in
event->record failing silently. This fixes that.
This change is necessary to workaround a delightful bug in either HIP
runtime, or the HIP spec.

It's discussed at length in github.com/intel/llvm/issues/7252 but for
the purposes of this patch, it suffices to say that a call to
`hipMemPrefetchAsync` is *required* for correctness in the face of
global atomic operations on (*at least*) shared USM allocations.

The architecture of this change is slightly strange on first sight in
that we reduntantly track allocation information in several places. The
context now keeps track of all USM mappings. We require a mapping of
pointers to the allocated size, but these allocations aren't pinned to
any particular queue or HIP stream.
The `hipMemPrefetchAsync`, however, requires the associated HIP stream
object, and the size of the allocation. The stream comes
hot-off-the-queue *only* just before a kernel is launched, so we need to
defer the prefetch until we have that information.

Finally, the kernel itself keeps track of pointer arguments in a more
accessible way so we can determine which of the kernel's pointer
arguments do, in-fact, point to USM allocations.
In the CUDA/HIP adapters `urKernelSetArgValue` was being used to
implement both `urKernelSetArgValue` & `urKernelSetArgLocal`. However,
if the validation layer is enabled in UR then the path to set local arg
is never taken since it includes a check that `pArgValue` is not null.

This PR:
 * Implements `urKernelSetArgLocal` for CUDA/HIP adapters
* Changes `pi2ur` to call `urKernelSetArgLocal` when `arg_value` is
`nullptr`
* Implements `urKernelSetArgLocal` for L0 adapter - this just calls back
into `urKernelSetArgValue`.
In the HIP adapter the HIP module is not set until the program is built
with `urProgramBuild`, therefore we should check that the module
actually needs to be unloaded in `urProgramRelease`.

The following will result in a failure, but should still be valid UR
trace:

```cpp
uint8_t *source = "<some ptx>";
ur_program_handle_t prog;
urProgramCreate(context, device, sizeof(source), ptxSource, nullptr, &prog);
urProgramRelease(prog); // fails when it tries to unload the module.
```
Fix the license headers at the top of each source file in the unified
runtime directory.

---------

Co-authored-by: Alexey Bader <alexey.bader@intel.com>
This PR adds missing functions in the hip backend to allow for
interoperability in programs that create sycl objects from native hip
objects. The new function implementations are:

- `make_device`
- `make_queue`
- `make_event`

Note that it would really make sense for
intel/llvm#10491 to be merged first because this
PR makes the same code change in pi2ur, for a fix that is attributed to
#10491.

---------

Signed-off-by: Jack Kirk <jack.kirk@codeplay.com>
Adds `hipDeviceAttributeManagedMemory` attribute check to verify the HIP
device supports managed memory in order to use the prefetch API as well
as checking the pointer to migrate is actually a managed allocation
(USM/SVM).

If the check fails we return early and set a warning message (with
`UR_RESULT_SUCCESS` as return code).

As a follow-up. we may need to detect system-allocated memory and either
continue with just a warning message or throw a more targeted error
message and exit.
Lots of hip/cu driver API calls were wrapped in `ur::assertion(res ==
CU_SUCCESS)` etc which:

- Means that any native error messages returned from the affected driver
api calls were lost.
- Since these APIs report errors asynchronously, such that they are
thrown from the last API call rather than the call which led to the
error, previous asynchronous error messages from different APIs to the
ones wrapped by the `ur::assertion` could also be lost depending on user
code.

These problems are fixed by swapping these assertions with
`UR_CHECK_ERROR`.

Note that in the future UR may want to adjust `UR_CHECK_ERROR` so that
it throws `UR_RESULT_ERROR_ADAPTER_SPECIFIC` etc instead of using
`std::cerr` etc to report the error etc. But I think it makes sense to
still use `UR_CHECK_ERROR` to wrap driver API calls because it means
that the __LINE__, __FUNCTION__ etc info can be correctly passed to
native error reporting.

---------

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Based on #10758

Co-authored-by: Ronan Keryell <ronan@keryell.fr>
Fetch the latest revision of unified runtime:

Notable changes
* Some command buffer entry-points have been renamed, also 2 additions
of membufferFill and USMFill
* UrInit/urTearDown have been removed - replaced with loader only
versions.
This change adds a new aspect for esimd, `ext_intel_esimd`, and
annotates the two fundamental esimd classes, `simd_obj_impl` and
`simd_view_impl` with the `uses_aspect` attribute.

`simd_obj_impl` is the base class of `simd` and `simd_mask` which are
the fundamental user-facing classes.
`simd_view_impl` is the base of only `simd_view`. `simd_obj_impl` is not
a base of `simd_view_impl`, but every `simd_view_impl` requires a
reference to a `simd` or `simd_mask` at construction time, so I am not
sure if we truly need to annotate `simd_view_impl`, but I added it to be
safe.

It also adds a new PI device info query,
`PI_EXT_INTEL_DEVICE_INFO_ESIMD_SUPPORT` that is used to query at
runtime if a device supports ESIMD. For UR-based plugins, we map that to
`UR_DEVICE_INFO_ESIMD_SUPPOR`.

The implementation simply returns false for cuda, hip and native_cpu.
For l0 and opencl, we check that the device is an intel gpu by querying
the device type is gpu and the vendor id is `0x8086`.
For ESIMD emulator we simply return true.

I would appreciate careful review on the plugin changes in particular,
as I am not an expert.

This change also updates the esimd spec to document the new aspect.

In a future change, I plan to use the new aspect to remove the
requirement for the `SYCL_ESIMD_FUNCTION` and `SYCL_ESIMD_KERNEL` macros
that set function attributes, but I am not doing that as part of this PR
as it requires more investigation.

---------

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Reverts intel/llvm#11155. I'm almost sure this is what broke our CI for
the past few days.
The main fix here is to enable prefetch functionality when the HIP
backend is built with rocm versions < 5.
The change to the prefix/mem_advise UR hip adapter APIs fixes four
e2e-tests that fail test-e2e only for HIP 4.x versions (These failures
don't come up in the CI since it tests using rocm 5.x).

The change to Tracing/image_printers.cpp is effectively re-XFailing this
test for rocm 4 only. rocm 4 doesn't support `hipCreateSurfaceObject`
which is called by this test. I'm not sure this legacy image
functionality (Which as I understand it will be replaced by bindless
images) is really working for rocm 5 in a meaningful way, since most
other legacy image e2e tests are XFAIL for hip. But this test can still
be useful for us for ROCM 5 compatibility testing. The XFAIL was
recently removed here
intel/llvm@745febe
which led to us finding the rocm 4 fail. I guess that the CI at some
point switched from testing rocm 4 to testing rocm 5, which meant this
test stopped failing in CI.

---------

Signed-off-by: Jack Kirk <jack.kirk@codeplay.com>
`UR_CHECK_ERROR` was designed to return `ur_result_t`, however in
practice it was guaranteed to only ever return `UR_RESULT_SUCCESS`, as
other paths would either terminate, abort or throw.

This in turns leads to poor quality/error prone code, as the codebase
was littered with:
* statements not checking the return value - depending on the compiler
generating a warning,
* extra check on the return which was only ever going to be true.

Some care was required, as the codebase has a habit of accumulating err
codes across branches, so depending on the use case the initial value of
`ur_result_t Result`s had to be set accordingly (now that
`UR_CHECK_ERROR` does not return).
Copy link
Contributor

@fabiomestre fabiomestre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

[Optional] Since this PR is going to be rebased, it could be cleaner to manually squash the 5 commits related to the code move.

@veselypeta veselypeta merged commit 1c6fa3c into oneapi-src:adapters Oct 2, 2023
@veselypeta veselypeta deleted the petr/move_hip branch October 2, 2023 14:41
steffenlarsen pushed a commit to intel/llvm that referenced this pull request Oct 9, 2023
Removes the adapter code for the HIP plugin which is now available as
part of unified runtime as part of
oneapi-src/unified-runtime#903
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.