Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 103 additions & 2 deletions sycl/doc/design/CommandGraph.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,8 +149,8 @@ yet been implemented.
Implementation of UR command-buffers
for each of the supported SYCL 2020 backends.

Currently Level Zero and CUDA backends are implemented.
More sub-sections will be added here as other backends are supported.
Backends which are implemented currently are: [Level Zero](#level-zero),
[CUDA](#cuda), and partial support for [OpenCL](#opencl).

### Level Zero

Expand Down Expand Up @@ -246,3 +246,104 @@ the executable CUDA Graph that represent this series of operations.
An executable CUDA Graph, which contains all commands and synchronization
information, is saved in the UR command-buffer to allow for efficient
graph resubmission.

### OpenCL

SYCL-Graph is only enabled for an OpenCL backend when the
[cl_khr_command_buffer](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer)
extension is available, however this information isn't available until runtime
due to OpenCL implementations being loaded through an ICD.

The `ur_exp_command_buffer` string is conditionally returned from the OpenCL
command-buffer UR backend at runtime based on `cl_khr_command_buffer` support
to indicate that the graph extension should be enabled. This is information
is propagated to the SYCL user via the
`device.get_info<info::device::graph_support>()` query for graph extension
support.

#### Limitations

Due to the API mapping gaps documented in the following section, OpenCL as a
SYCL backend cannot fully support the graph API. Instead, there are
limitations in the types of nodes which a user can add to a graph, using
an unsupported node type will cause a sycl exception to be thrown in graph
finalization with error code `sycl::errc::feature_not_supported` and a message
mentioning the unsupported command. For example,

```
terminate called after throwing an instance of 'sycl::_V1::exception'
what(): USM copy command not supported by graph backend
```

The types of commands which are unsupported, and lead to this exception are:
* `handler::copy(src, dest)` - Where `src` is an accessor and `dest` is a pointer.
This corresponds to a memory buffer read command.
* `handler::copy(src, dest)` - Where `src` is an pointer and `dest` is an accessor.
This corresponds to a memory buffer write command.
* `handler::copy(src, dest)` or `handler::memcpy(dest, src)` - Where both `src` and
`dest` are USM pointers. This corresponds to a USM copy command.

Note that `handler::copy(src, dest)` where both `src` and `dest` are an accessor
is supported, as a memory buffer copy command exists in the OpenCL extension.

#### UR API Mapping

There are some gaps in both the OpenCL and UR specifications for Command
Buffers shown in the list below. There are implementations in the UR OpenCL
adapter where there is matching support for each function in the list.

| UR | OpenCL | Supported |
| --- | --- | --- |
| urCommandBufferCreateExp | clCreateCommandBufferKHR | Yes |
| urCommandBufferRetainExp | clRetainCommandBufferKHR | Yes |
| urCommandBufferReleaseExp | clReleaseCommandBufferKHR | Yes |
| urCommandBufferFinalizeExp | clFinalizeCommandBufferKHR | Yes |
| urCommandBufferAppendKernelLaunchExp | clCommandNDRangeKernelKHR | Yes |
| urCommandBufferAppendUSMMemcpyExp | | No |
| urCommandBufferAppendUSMFillExp | | No |
| urCommandBufferAppendMembufferCopyExp | clCommandCopyBufferKHR | Yes |
| urCommandBufferAppendMemBufferWriteExp | | No |
| urCommandBufferAppendMemBufferReadExp | | No |
| urCommandBufferAppendMembufferCopyRectExp | clCommandCopyBufferRectKHR | Yes |
| urCommandBufferAppendMemBufferWriteRectExp | | No |
| urCommandBufferAppendMemBufferReadRectExp | | No |
| urCommandBufferAppendMemBufferFillExp | clCommandFillBufferKHR | Yes |
| urCommandBufferEnqueueExp | clEnqueueCommandBufferKHR | Yes |
| | clCommandBarrierWithWaitListKHR | No |
| | clCommandCopyImageKHR | No |
| | clCommandCopyImageToBufferKHR | No |
| | clCommandFillImageKHR | No |
| | clGetCommandBufferInfoKHR | No |
| | clCommandSVMMemcpyKHR | No |
| | clCommandSVMMemFillKHR | No |

We are looking to address these gaps in the future so that SYCL-Graph can be
fully supported on a `cl_khr_command_buffer` backend.

#### UR Command-Buffer Implementation

Many of the OpenCL functions take a `cl_command_queue` parameter which is not
present in most of the UR functions. Instead, when a new command buffer is
created in `urCommandBufferCreateExp` we also create and maintain a new
internal `ur_queue_handle_t` with a reference stored inside of the
`ur_exp_command_buffer_handle_t_` struct. The internal queue is retained and
released whenever the owning command buffer is retained or released.

With command buffers being an OpenCL extension, each function is accessed by
loading a function pointer to its implementation. These are defined in a common
header file in the UR OpenCL adapter. The symbols for the functions are however
defined in [OpenCL-Headers](https://github.com/KhronosGroup/OpenCL-Headers/blob/main/CL/cl_ext.h)
but it is not known at this time what version of the headers will be used in
the UR GitHub CI configuration, so loading the function pointers will be used
until this can be verified. A future piece of work would be replacing the
custom defined symbols with the ones from OpenCL-Headers.

#### Available OpenCL Command-Buffer Implementations

Publicly available implementations of `cl_khr_command_buffer` that can be used
to enable the graph extension in OpenCL:

- [OneAPI Construction Kit](https://github.com/codeplaysoftware/oneapi-construction-kit) (must enable `OCL_EXTENSION_cl_khr_command_buffer` when building)
- [PoCL](http://portablecl.org/)
- [Command-Buffer Emulation Layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/efeae73139ddf064fafce565cc39640af10d900f/layers/10_cmdbufemu)

2 changes: 1 addition & 1 deletion sycl/doc/design/images/SYCL-Graph-Architecture.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
47 changes: 18 additions & 29 deletions sycl/plugins/native_cpu/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,38 +1,27 @@
# Plugin for SYCL Native CPU
# Create shared library for libpi_nativecpu.so

# Get the Native CPU adapter sources so they can be shared with the Native CPU PI plugin
get_target_property(UR_NATIVE_CPU_ADAPTER_SOURCES ur_adapter_native_cpu SOURCES)

add_sycl_plugin(native_cpu
SOURCES
"pi_native_cpu.cpp"
${UR_NATIVE_CPU_ADAPTER_SOURCES}
# Some code is shared with the UR adapter
"../unified_runtime/pi2ur.hpp"
"../unified_runtime/pi2ur.cpp"
"../unified_runtime/ur/ur.hpp"
"../unified_runtime/ur/ur.cpp"
"../unified_runtime/ur/adapters/native_cpu/adapter.cpp"
"../unified_runtime/ur/adapters/native_cpu/command_buffer.cpp"
"../unified_runtime/ur/adapters/native_cpu/common.cpp"
"../unified_runtime/ur/adapters/native_cpu/common.hpp"
"../unified_runtime/ur/adapters/native_cpu/context.cpp"
"../unified_runtime/ur/adapters/native_cpu/context.hpp"
"../unified_runtime/ur/adapters/native_cpu/device.cpp"
"../unified_runtime/ur/adapters/native_cpu/device.hpp"
"../unified_runtime/ur/adapters/native_cpu/enqueue.cpp"
"../unified_runtime/ur/adapters/native_cpu/event.cpp"
"../unified_runtime/ur/adapters/native_cpu/image.cpp"
"../unified_runtime/ur/adapters/native_cpu/kernel.cpp"
"../unified_runtime/ur/adapters/native_cpu/kernel.hpp"
"../unified_runtime/ur/adapters/native_cpu/memory.cpp"
"../unified_runtime/ur/adapters/native_cpu/memory.hpp"
"../unified_runtime/ur/adapters/native_cpu/platform.cpp"
"../unified_runtime/ur/adapters/native_cpu/platform.hpp"
"../unified_runtime/ur/adapters/native_cpu/program.cpp"
"../unified_runtime/ur/adapters/native_cpu/program.hpp"
"../unified_runtime/ur/adapters/native_cpu/queue.cpp"
"../unified_runtime/ur/adapters/native_cpu/queue.hpp"
"../unified_runtime/ur/adapters/native_cpu/sampler.cpp"
"../unified_runtime/ur/adapters/native_cpu/ur_interface_loader.cpp"
"../unified_runtime/ur/adapters/native_cpu/usm.cpp"
"../unified_runtime/ur/adapters/native_cpu/usm_p2p.cpp"
"${sycl_inc_dir}/sycl/detail/pi.h"
"${sycl_inc_dir}/sycl/detail/pi.hpp"
"pi_native_cpu.cpp"
"pi_native_cpu.hpp"
INCLUDE_DIRS
${CMAKE_CURRENT_SOURCE_DIR}/../unified_runtime
${sycl_inc_dir}
${CMAKE_CURRENT_SOURCE_DIR}/../unified_runtime # for Unified Runtime
${UNIFIED_RUNTIME_SOURCE_DIR}/source/ # for adapters/native_cpu
LIBRARIES
sycl
UnifiedRuntime-Headers
UnifiedRuntimeCommon
)

set_target_properties(pi_native_cpu PROPERTIES LINKER_LANGUAGE CXX)
14 changes: 7 additions & 7 deletions sycl/plugins/native_cpu/pi_native_cpu.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,13 @@

#include <pi2ur.hpp>

#include <ur/adapters/native_cpu/context.hpp>
#include <ur/adapters/native_cpu/device.hpp>
#include <ur/adapters/native_cpu/kernel.hpp>
#include <ur/adapters/native_cpu/memory.hpp>
#include <ur/adapters/native_cpu/platform.hpp>
#include <ur/adapters/native_cpu/program.hpp>
#include <ur/adapters/native_cpu/queue.hpp>
#include <adapters/native_cpu/context.hpp>
#include <adapters/native_cpu/device.hpp>
#include <adapters/native_cpu/kernel.hpp>
#include <adapters/native_cpu/memory.hpp>
#include <adapters/native_cpu/platform.hpp>
#include <adapters/native_cpu/program.hpp>
#include <adapters/native_cpu/queue.hpp>

struct _pi_context : ur_context_handle_t_ {
using ur_context_handle_t_::ur_context_handle_t_;
Expand Down
64 changes: 13 additions & 51 deletions sycl/plugins/unified_runtime/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,12 @@ if("hip" IN_LIST SYCL_ENABLE_PLUGINS)
endif()
if("opencl" IN_LIST SYCL_ENABLE_PLUGINS)
set(UR_BUILD_ADAPTER_OPENCL ON)
set(UR_OPENCL_ICD_LOADER_LIBRARY OpenCL-ICD)
set(UR_OPENCL_ICD_LOADER_LIBRARY OpenCL-ICD CACHE FILEPATH
"Path of the OpenCL ICD Loader library" FORCE)
endif()
if("native_cpu" IN_LIST SYCL_ENABLE_PLUGINS)
set(UR_BUILD_ADAPTER_NATIVE_CPU ON)
endif()
# TODO: Set UR_BUILD_ADAPTER_NATIVE_CPU once adapter moved

# Disable errors from warnings while building the UR.
# And remember origin flags before doing that.
Expand All @@ -54,13 +57,11 @@ if(SYCL_PI_UR_USE_FETCH_CONTENT)
include(FetchContent)

set(UNIFIED_RUNTIME_REPO "https://github.com/oneapi-src/unified-runtime.git")
# commit ec7982bac6cb3a6b9ed610cd6b7cb41fcbc780dc
# Merge: 62e6d2f9 5fb82924
# commit aaa4661f5c32e6dcb43248ed7575de6971852cc3
# Author: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
# Date: Wed Nov 8 13:32:46 2023 +0000
# Merge pull request #1022 from 0x12CC/l0_usm_error_checking_2
# [UR][L0] Propagate OOM errors from `USMAllocationMakeResident`
set(UNIFIED_RUNTIME_TAG ec7982bac6cb3a6b9ed610cd6b7cb41fcbc780dc)
# Date: Fri Dec 15 16:05:36 2023 +0000
# Set version to v0.8.2
set(UNIFIED_RUNTIME_TAG aaa4661f5c32e6dcb43248ed7575de6971852cc3)

if(SYCL_PI_UR_OVERRIDE_FETCH_CONTENT_REPO)
set(UNIFIED_RUNTIME_REPO "${SYCL_PI_UR_OVERRIDE_FETCH_CONTENT_REPO}")
Expand Down Expand Up @@ -159,49 +160,6 @@ endif()

add_sycl_plugin(unified_runtime ${UNIFIED_RUNTIME_PLUGIN_ARGS})

if("native_cpu" IN_LIST SYCL_ENABLE_PLUGINS)
add_sycl_library("ur_adapter_native_cpu" SHARED
SOURCES
"ur/ur.cpp"
"ur/ur.hpp"
"ur/adapters/native_cpu/adapter.cpp"
"ur/adapters/native_cpu/command_buffer.cpp"
"ur/adapters/native_cpu/common.cpp"
"ur/adapters/native_cpu/common.hpp"
"ur/adapters/native_cpu/context.cpp"
"ur/adapters/native_cpu/context.hpp"
"ur/adapters/native_cpu/device.cpp"
"ur/adapters/native_cpu/device.hpp"
"ur/adapters/native_cpu/enqueue.cpp"
"ur/adapters/native_cpu/event.cpp"
"ur/adapters/native_cpu/image.cpp"
"ur/adapters/native_cpu/kernel.cpp"
"ur/adapters/native_cpu/kernel.hpp"
"ur/adapters/native_cpu/memory.cpp"
"ur/adapters/native_cpu/memory.hpp"
"ur/adapters/native_cpu/platform.cpp"
"ur/adapters/native_cpu/platform.hpp"
"ur/adapters/native_cpu/program.cpp"
"ur/adapters/native_cpu/program.hpp"
"ur/adapters/native_cpu/queue.cpp"
"ur/adapters/native_cpu/queue.hpp"
"ur/adapters/native_cpu/sampler.cpp"
"ur/adapters/native_cpu/ur_interface_loader.cpp"
"ur/adapters/native_cpu/usm.cpp"
"ur/adapters/native_cpu/usm_p2p.cpp"
LIBRARIES
UnifiedRuntime-Headers
Threads::Threads
OpenCL-Headers
)

set_target_properties("ur_adapter_native_cpu" PROPERTIES
VERSION "0.0.0"
SOVERSION "0"
)
endif()


if(TARGET UnifiedRuntimeLoader)
set_target_properties(hello_world PROPERTIES EXCLUDE_FROM_ALL 1 EXCLUDE_FROM_DEFAULT_BUILD 1)
# Install the UR loader.
Expand Down Expand Up @@ -238,3 +196,7 @@ endif()
if ("opencl" IN_LIST SYCL_ENABLE_PLUGINS)
add_dependencies(sycl-runtime-libraries ur_adapter_opencl)
endif()

if ("native_cpu" IN_LIST SYCL_ENABLE_PLUGINS)
add_dependencies(sycl-runtime-libraries ur_adapter_native_cpu)
endif()
Loading