Skip to content

Conversation

@fabiomestre
Copy link
Contributor

@fabiomestre fabiomestre commented Sep 26, 2023

This PR moves the CUDA adapter for the intel/llvm repository to the unified runtime repository. This was done using git subtrees to preserve the commit history.

A follow-up PR on intel/llvm will remove the sources from that repo.

@fabiomestre fabiomestre reopened this Sep 26, 2023
callumfare and others added 28 commits September 26, 2023 16:44
This moves the CUDA plugin implementation to Unified Runtime; and
changes the pi_cuda plugin to use pi2ur to implement PI. The changes to
the implementation have been kept to a minimum and should be
functionally the same. Documentation and comments have been moved
verbatim, other than changing PI references to UR.

This PR is based on top of the Level Zero adapter (#8744) so will only
be ready when that is merged.

---------

Co-authored-by: Petr Vesely <petr.vesely@codeplay.com>
Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com>
Co-authored-by: Martin Morrison-Grant <martin.morrisongrant@codeplay.com>
Co-authored-by: Aaron Greig <aaron.greig@codeplay.com>
Resolves the warnings as errors reported in [post
merge](https://github.com/intel/llvm/actions/runs/5266121277/jobs/9519634360)
as a result of merging #9512. Additionally move pre-processor guards to
resolve unused global variables which would also fail in this build
configuration (clang & SYCL_ENABLE_WERROR=ON).
…(#9938)

* Call to `hipEventElapsedTime` return `hipErrorNotReady` when the
timestamp has not yet been `recorded` on one or both events. Calling
`hipEventSynchronize` block until the event is ready.
* The issue showed itself when profiling sycl-blas benchmark.
* Enable support for cuda / hip in event_profiling_info.cpp
…RSION (#9873)

This should have been an obvious update of Unified Runtime tag to
support UR_DEVICE_INFO_IP_VERSION, required in
intel/llvm#9843 (just tag update, nothing else),
but it also brought many API breaks caused by this patch:
oneapi-src#536. So the current
PR updates our codebase in accordance with changed UR API.

---------

Signed-off-by: Dmitry Vodopyanov <dmitry.vodopyanov@intel.com>
Co-authored-by: Callum Fare <callum@codeplay.com>
Co-authored-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
This fixes a change in `hasBeenSynchronized` accidentally introduced
during the UR port
…(#10055)

* Reverts back the update to calculation of threads per block for 0th
dimension when primary ranges are involved. That could cause out of
range access.
Includes a large number of missing `pi_result` mappings, so previously
reported errors should no longer map to just `PI_ERROR_UNKNOWN`. NFCI
for the adapters.
These checks are causing issues for very large USM allocations because
the `MAX_MEM_ALLOC_SIZE` reported is lower than what CUDA actually
supports.

We will follow up with an update on the reported `MAX_MEM_ALLOC_SIZE`,
but it makes sense to remove the checks either way, as the CUDA
allocation functions will return an error if they can't allocate the
memory.
# Level Zero Backend Support for SYCL Graphs
This is the second patch of a series that adds support for an
[experimental command graph
extension](intel/llvm#5626)

A snapshot of the complete work can be seen in draft PR #9375 which has
support all the specification defined ways of
adding nodes and edges to the graph, including both Explicit and Record
& Replay graph construction. The two types of nodes currently
implemented are kernel execution and memcpy commands.

See https://github.com/reble/llvm#implementation-status for the status
of our total work.

## Scope
This second patch focuses on the required PI/UR support for the
experimental command-buffer feature in the Level Zero adapter:
* PI stubs for all adapters to enable compilation, no functionality.
* Command-buffer implementation for the Level Zero UR adapter.
* Stubs for the CUDA UR adapter to enable compilation, no functionality.

## Following Split PRs
Future follow-up PRs with the remainder of our work on the extension
will include:
* Hooking up backend to graphs runtime, bugfixes and other feature
additions, will add symbols but not break the ABI. (3/4)
* Add end-to-end tests for SYCL Graph extension. (4/4)
* NFC changes - Design doc and codeowner update.

## Authors
Co-authored-by: Pablo Reble <pablo.reble@intel.com>
Co-authored-by: Julian Miller <julian.miller@intel.com>
Co-authored-by: Ben Tracy <ben.tracy@codeplay.com>
Co-authored-by: Ewan Crawford <ewan@codeplay.com>
Co-authored-by: Maxime France-Pillois
<maxime.francepillois@codeplay.com>

---------

Co-authored-by: Ewan Crawford <ewan@codeplay.com>
Co-authored-by: Maxime France-Pillois <maxime.francepillois@codeplay.com>
The destructor should be calling `cuDevicePrimaryCtxRelease`
Fixes error found in [post-commit
CI](https://github.com/intel/llvm/actions/runs/5454766342/jobs/9925392005)
after the merge of intel/llvm#9992

```
 /__w/llvm/llvm/src/sycl/plugins/hip/pi_hip.cpp:5635:24: error: unused parameter 'sync_point' [-Werror,-Wunused-parameter]
 5635 |     pi_ext_sync_point *sync_point) {
      |                        ^
/__w/llvm/llvm/src/sycl/plugins/hip/pi_hip.cpp:5691:12: error: unused parameter 'dst_row_pitch' [-Werror,-Wunused-parameter]
 5691 |     size_t dst_row_pitch, size_t dst_slice_pitch,
      |            ^
/__w/llvm/llvm/src/sycl/plugins/hip/pi_hip.cpp:5691:34: error: unused parameter 'dst_slice_pitch' [-Werror,-Wunused-parameter]
 5691 |     size_t dst_row_pitch, size_t dst_slice_pitch,
      |                                  ^
3 errors generated.
```
This patch re-introduces the fix from
intel/llvm#8765

Which seems to have been accidentally dropped by the UR port.
… classes (#10104)

This PR fixes an issue with overhead when calling the same kernel
multiple times in a loop.

Right now, some calls to CUDA API happen every time the kernel is
invoked to query the same information. Those calls were moved to
`device` and `kernel` constructors and the info was cached as private
members of those classes.
These checks are already performed by the UR validation layer
This implements the current extension doc from
intel/llvm#6104 in the CUDA backend only.

Fixes intel/llvm#7543.
Fixes intel/llvm#6749.

---------

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Co-authored-by: Nicolas Miller <nicolas.miller@codeplay.com>
Co-authored-by: JackAKirk <chezjakirk@gmail.com>
Co-authored-by: Steffen Larsen <steffen.larsen@intel.com>
Currently if `cuInit` fails while getting platforms it returns
UR_RESULT_SUCCESS. Instead we use `UR_CHECK_ERROR` to return the correct
error code.
This was preventing out-of-tree build of the adapter for standalone use
with unified runtime.

Signed-off-by: Piotr Balcer <piotr.balcer@intel.com>
…s images extension (#10112)

# Experimental Implementation of SYCL Bindless Images Extension

This commit stands as the second commit of four to make code review
easier, implementing revision 4 of the [bindless images extension
proposal](intel/llvm#9842).

## Scope

This PR covers changes made to the PI and the UR. This includes

- Extending PI with extension functions
- Updating UR FetchContent commit and implementing [UR bindless images
experimental
features](https://oneapi-src.github.io/unified-runtime/core/EXP-BINDLESS-IMAGES.html)
on the CUDA adaptor

## Following Split PRs

- [3/4] Implement the user-facing SYCL extension
- [4/4] Add tests

## Authors

Co-authored-by: Isaac Ault <isaac.ault@codeplay.com>
Co-authored-by: Hugh Bird <hugh.bird@codeplay.com>
Co-authored-by: Duncan Brawley <duncan.brawley@codeplay.com>
Co-authored-by: Przemek Malon <przemek.malon@codeplay.com>
Co-authored-by: Chedy Najjar <chedy.najjar@codeplay.com>
Co-authored-by: Sean Stirling <sean.stirling@codeplay.com>
Co-authored-by: Peter Zuzek <peter@codeplay.com>
In CUDA objects are represented as integers rather than opaque handles.
This patch fixes a segmentation fault when creating a device handle from
a native handle by avoiding dereferencing a pointer which should be
treated as an integer.
- Fixed compiler errors/warnings related to unused and uninitialized
variables and parameters.

Post-commit fix for PR: intel/llvm#10112
After the recent device partition changes in the UR spec (i.e.
ur_device_partition_property_t), level_zero, cuda and hip adapters are
returning incorrect values and types for
UR_DEVICE_INFO_SUPPORTED_PARTITIONS and UR_DEVICE_INFO_PARTITION_TYPE.

This PR fixes this issues with the adapters and updates pi2ur to
correctly convert between ur_device_partition_properties_t and
pi_device_partition_property.
…#9294)

This change adds a SYCL interface to the Level Zero APIs
zexDriverImportExternalPointer and zexDriverReleaseImportedPointer.
These functions are used for importing host memory into USM for the
duration of data transfer to increase bandwidth.
Cache the max local mem size so that we can call less CUDA driver entry
points at `urEnqueueKernelLaunch`. Also allows us to query the value set
for `SYCL_PI_CUDA_MAX_LOCAL_MEM_SIZE` using
`device.get_info<sycl::info::device::local_mem_size>()`
During the port to UR the CUDA and HIP PI plugin ABI's were
unintentionally changed. There does not appear to be symbol checks for
these plugins, unlike the [Level Zero symbol
check](https://github.com/intel/llvm/blob/sycl/sycl/test/abi/pi_level_zero_symbol_check.dump)
and [OpenCL symbol
check](https://github.com/intel/llvm/blob/sycl/sycl/test/abi/pi_opencl_symbol_check.dump).
As such, the ABI change went unnoticed until
intel/llvm#10490 was opened using the same
approach for the OpenCL port, which
[failed](https://github.com/intel/llvm/actions/runs/5610646255/job/15200624025?pr=10490)
the OpenCL symbol check.

This PR restores the expected ABI for the CUDA and HIP plugins and
introduces new CUDA and HIP symbol check tests.
Bump the Unified Runtime commit, and make adapter changes needed for the
newly added adapter handles (see
oneapi-src#715 for details)

This fixes #10066 by providing an implementation of
`piPluginGetLastError` in pi2ur.
The `UR_RESULT_ADAPTER_SPECIFIC_ERROR` was not returning an error to the
SYCL RT which meant all errors were treated as warnings and ignored
unless `SYCL_RT_WARNING_LEVEL` is set to geq 2. This changes things so
the adapter specific error is now reported as such, meaning all uses
`UR_RESULT_ADAPTER_SPECIFIC_ERROR` meant as warnings are now caught as
errors.

---------

Co-authored-by: Hugh Delaney <hughd94@gmail.com>
veselypeta and others added 14 commits September 26, 2023 16:44
In the CUDA/HIP adapters `urKernelSetArgValue` was being used to
implement both `urKernelSetArgValue` & `urKernelSetArgLocal`. However,
if the validation layer is enabled in UR then the path to set local arg
is never taken since it includes a check that `pArgValue` is not null.

This PR:
 * Implements `urKernelSetArgLocal` for CUDA/HIP adapters
* Changes `pi2ur` to call `urKernelSetArgLocal` when `arg_value` is
`nullptr`
* Implements `urKernelSetArgLocal` for L0 adapter - this just calls back
into `urKernelSetArgValue`.
Fix the license headers at the top of each source file in the unified
runtime directory.

---------

Co-authored-by: Alexey Bader <alexey.bader@intel.com>
This PR adds missing functions in the hip backend to allow for
interoperability in programs that create sycl objects from native hip
objects. The new function implementations are:

- `make_device`
- `make_queue`
- `make_event`

Note that it would really make sense for
intel/llvm#10491 to be merged first because this
PR makes the same code change in pi2ur, for a fix that is attributed to
#10491.

---------

Signed-off-by: Jack Kirk <jack.kirk@codeplay.com>
intel/llvm#10691 missed these extra cases. This
PR adds the needed ifdefs
… (#11023)

Resolve same issue fixed in
[PR](intel/llvm#10034)

Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com>
Lots of hip/cu driver API calls were wrapped in `ur::assertion(res ==
CU_SUCCESS)` etc which:

- Means that any native error messages returned from the affected driver
api calls were lost.
- Since these APIs report errors asynchronously, such that they are
thrown from the last API call rather than the call which led to the
error, previous asynchronous error messages from different APIs to the
ones wrapped by the `ur::assertion` could also be lost depending on user
code.

These problems are fixed by swapping these assertions with
`UR_CHECK_ERROR`.

Note that in the future UR may want to adjust `UR_CHECK_ERROR` so that
it throws `UR_RESULT_ERROR_ADAPTER_SPECIFIC` etc instead of using
`std::cerr` etc to report the error etc. But I think it makes sense to
still use `UR_CHECK_ERROR` to wrap driver API calls because it means
that the __LINE__, __FUNCTION__ etc info can be correctly passed to
native error reporting.

---------

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
- Support was added for the following image channel types:
  - `unorm_int8`
  - `unorm_int16`
  - `snorm_int8`
  - `snorm_int16`

- Reading these types through `read_image` returns the denormalized
floating point data.

- A test was added for these new types.

- Support for the following packed normalized image channel types was
removed from the proposal:
  - `unorm_short_565`
  - `unorm_short_555`
  - `unorm_int_101010`

- This was done due to lack of device support. If the need for these
types arises in the future, we can revisit support for these types.
Fetch the latest revision of unified runtime:

Notable changes
* Some command buffer entry-points have been renamed, also 2 additions
of membufferFill and USMFill
* UrInit/urTearDown have been removed - replaced with loader only
versions.
This change adds a new aspect for esimd, `ext_intel_esimd`, and
annotates the two fundamental esimd classes, `simd_obj_impl` and
`simd_view_impl` with the `uses_aspect` attribute.

`simd_obj_impl` is the base class of `simd` and `simd_mask` which are
the fundamental user-facing classes.
`simd_view_impl` is the base of only `simd_view`. `simd_obj_impl` is not
a base of `simd_view_impl`, but every `simd_view_impl` requires a
reference to a `simd` or `simd_mask` at construction time, so I am not
sure if we truly need to annotate `simd_view_impl`, but I added it to be
safe.

It also adds a new PI device info query,
`PI_EXT_INTEL_DEVICE_INFO_ESIMD_SUPPORT` that is used to query at
runtime if a device supports ESIMD. For UR-based plugins, we map that to
`UR_DEVICE_INFO_ESIMD_SUPPOR`.

The implementation simply returns false for cuda, hip and native_cpu.
For l0 and opencl, we check that the device is an intel gpu by querying
the device type is gpu and the vendor id is `0x8086`.
For ESIMD emulator we simply return true.

I would appreciate careful review on the plugin changes in particular,
as I am not an expert.

This change also updates the esimd spec to document the new aspect.

In a future change, I plan to use the new aspect to remove the
requirement for the `SYCL_ESIMD_FUNCTION` and `SYCL_ESIMD_KERNEL` macros
that set function attributes, but I am not doing that as part of this PR
as it requires more investigation.

---------

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Reverts intel/llvm#11155. I'm almost sure this is what broke our CI for
the past few days.
- Return correct error code in urContextCreateWithNativeHandle
- Add UR_DEVICE_INFO_VIRTUAL_MEMORY_SUPPORT to urDeviceGetInfo
- Move asserts from urEnqueueMemBufferFill to UR validation layers
(commit 2c533e6 on UR repository)
- Make behaviour of urEventSetCallback consistent with other unsupported
entrypoints.
- urProgramGetInfo and urQueueGetInfo now return
UR_RESULT_ERROR_UNSUPPORTED_ENUMERATION for unsupported cases.
- Add entrypoints for urSamplerGetNativeHandle and
urSamplerCreateWithNativeHandle
- Fix UR_USM_ALLOC_INFO_BASE_PTR and UR_USM_ALLOC_INFO_SIZE
implementation.
`UR_CHECK_ERROR` was designed to return `ur_result_t`, however in
practice it was guaranteed to only ever return `UR_RESULT_SUCCESS`, as
other paths would either terminate, abort or throw.

This in turns leads to poor quality/error prone code, as the codebase
was littered with:
* statements not checking the return value - depending on the compiler
generating a warning,
* extra check on the return which was only ever going to be true.

Some care was required, as the codebase has a habit of accumulating err
codes across branches, so depending on the use case the initial value of
`ur_result_t Result`s had to be set accordingly (now that
`UR_CHECK_ERROR` does not return).
@fabiomestre fabiomestre force-pushed the fabio/move_cuda_adapter branch 5 times, most recently from 8879199 to 245afd0 Compare September 27, 2023 11:26
@fabiomestre fabiomestre force-pushed the fabio/move_cuda_adapter branch from 245afd0 to 3accba7 Compare September 27, 2023 12:25
@fabiomestre fabiomestre marked this pull request as ready for review September 27, 2023 12:31
endif()

if(UMF_ENABLE_POOL_TRACKING)
target_compile_definitions("ur_adapter_cuda" PRIVATE UMF_ENABLE_POOL_TRACKING)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add this option in the readme?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I know the UMF team are reworking stuff so that this option isn't needed, hopefully its days are numbered

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already an option in UR. It was just not being used in the adapter

@@ -0,0 +1,89 @@
//===--------- adapter.cpp - CUDA Adapter ---------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These licenses will need updating too, but a follow up PR is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add this in a follow up PR

@fabiomestre fabiomestre force-pushed the fabio/move_cuda_adapter branch from a5c8d9e to 9d35c99 Compare September 27, 2023 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.