From ddf916a3bd8c5361203d5e26e744e61862d5b4cb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Peter=20=C5=BDu=C5=BEek?= Date: Fri, 10 Feb 2023 16:13:10 +0000 Subject: [PATCH 1/6] [SYCL][Doc] Bindless Images proposal revision 1 Initial proposal for adding support for Bindless Images in SYCL. Co-authored-by: Przemek Malon Co-authored-by: Isaac Ault Co-authored-by: Sean Stirling Co-authored-by: Duncan Brawley --- .../sycl_ext_oneapi_bindless_images.asciidoc | 471 ++++++++++++++++++ 1 file changed, 471 insertions(+) create mode 100644 sycl/doc/extensions/experimental/sycl_ext_oneapi_bindless_images.asciidoc diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bindless_images.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bindless_images.asciidoc new file mode 100644 index 0000000000000..5e9658621940c --- /dev/null +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bindless_images.asciidoc @@ -0,0 +1,471 @@ +# sycl_ext_oneapi_bindless_images + +:source-highlighter: coderay +:coderay-linenums-mode: table + +// This section needs to be after the document title. +:doctype: book +:toc2: +:toc: left +:encoding: utf-8 +:lang: en +:dpcpp: pass:[DPC++] + +// Set the default source code type in this document to C++, +// for syntax highlighting purposes. This is needed because +// docbook uses c++ and html5 uses cpp. +:language: {basebackend@docbook:c++:cpp} + +## Notice + +[%hardbreaks] +Copyright (C) Codeplay Software Limited. All rights reserved. + +Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks +of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by +permission by Khronos. + +## Contact + +To report problems with this extension, please open a new issue at: + +https://github.com/intel/llvm/issues + +## Dependencies + +This extension is written against the SYCL 2020 revision 6 specification. All +references below to the "core SYCL specification" or to section numbers in the +SYCL specification refer to that revision. + +## Status + +This is an experimental extension specification, intended to provide early +access to features and gather community feedback. Interfaces defined in this +specification are implemented in {dpcpp}, but they are not finalized and may +change incompatibly in future versions of {dpcpp} without prior notice. +*Shipping software products should not rely on APIs defined in this +specification.* + +## Backend support status + +This extension is currently implemented in DPC++ only when using the CUDA backend. +Attempting to use this extension in kernels +that run on other devices or backends may result in undefined behavior. +Be aware that the compiler is not able to issue a diagnostic to warn you if this happens. +Work is ongoing to support other backends. + +## Overview + +Images in SYCL 1.2.1 were designed to work with OpenCL. +SYCL 2020 tried to make them work with other backends as well +by splitting the image type into sampled and unsampled images. +However, SYCL 2020 images still didn't quite meet user expectations. +There was feedback about various use cases where the current model falls short +(see examples at the end of this document for some of the use cases). + +The main issue is requesting access to each individual image. +There are many use cases where there's a large number of images, +potentially a number that's not known at compile time, +which is very difficult or even impossible to do with the current model. + +That's why we propose in this document a new extension for SYCL 2020 images. +Per the proposal, users would be able to separate memory allocation for the image +from the actual image creation. +An image is represented by an opaque handle that can be passed directly into a kernel +without requesting access. +In many ways this model resembles more the USM model when accessing data on the device, +but it's specialized for dealing with images. + +The proposed model does not replace SYCL 2020 images, +it is instead meant as building blocks for implementing SYCL 2020 images on top of it. + +## Specification + +### Feature test macro + +This extension provides a feature-test macro as described in the core SYCL +specification. An implementation supporting this extension must predefine the +macro `SYCL_EXT_ONEAPI_BINDLESS_IMAGES` to one of the values defined in the +table below. Applications can test for the existence of this macro to +determine if the implementation supports this feature, or applications can test +the macro's value to determine which of the extension's features the +implementation supports. + +[%header,cols="10,90"] +|=== +| Value | Description +| 202302 | The APIs of this experimental extension are not versioned, so the feature-test macro always has this value. +|=== + +### Image descriptor + +```cpp +namespace sycl::ext::oneapi { + +struct image_descriptor { + unsigned int width; + unsigned int height; + unsigned int depth; + image_channel_type channel_type; + image_channel_order channel_order; + unsigned int row_pitch; +}; + +} +``` + +The image descriptor represents the image dimensions, channel type, channel order, and row pitch. + +Note that `image_channel_type` and `image_channel_order` existed in SYCL 1.2.1, +but were removed in SYCL 2020 in favor of a single, unified enum class. +We propose separating them again to enable better flexibility +and to avoid combinatorial complexity. + +### Allocating image memory + +```cpp +namespace sycl::ext::oneapi { + +void* allocate_image(const context& syclContext, image_descriptor desc); + +void free_image(const context& syclContext, void* memory_handle); + +} +``` + +The process of creating an image is two-fold: +allocate an image, then create an image handle from the allocation. + +The first step is to allocate memory for an image. +`allocate_image` can be used for this, +as it allocates memory based on the requirements outlined by the image descriptor. +The function returns a pointer to the newly allocated memory. +Note that the image is an opaque type, so the pointer cannot be dereferenced on the host. + +After we're done with the image, we need to cleanup the memory using `free_image`. + + +[NOTE] +==== +In the DPC++ CUDA backend, this will allocate a `CUarray` type. +==== + +An alternative is to allocate memory using device USM allocations. +In addition to the existing USM allocation functions, a new USM pitched allocation function is proposed: + +```cpp +namespace sycl::ext::oneapi { + +void* pitched_alloc_device(size_t* ret_pitch, size_t width_in_bytes, size_t height + unsigned int element_size_bytes, const queue& queue); + +} +``` + +This function will allocate a memory region aimed to be used for two-dimensional images. +It places appropriate padding at the end of image rows and returns the pitch value used. + +For one-dimensional images, existing USM allocation functions are sufficient. + +Three-dimensional images do not support USM. + +The next step, creating the image, is the same regardless of how the memory was allocated. + +### Obtaining a handle to the image + +```cpp +namespace sycl::ext::oneapi { + +using unsampled_image_handle = /* Implementation defined */; +using sampled_image_handle = /* Implementation defined */; + +unsampled_image_handle create_image(const context& syclContext, void* devPtr); +sampled_image_handle create_image(const context& syclContext, void* devPtr, + sampler& sampler); + +void destroy_image_handle(const context& syclContext, + sampled_image_handle& imageHandle); +void destroy_image_handle(const context& syclContext, + unsampled_image_handle& imageHandle); + +} +``` + +Once we have allocated memory, we can pass it into the `create_image` function +to obtain a `sampled_image_handle` or `unsampled_image_handle`. +These objects are opaque types that represent an image object. +They can be captured by value into a SYCL kernel, +or they can be passed in a buffer as a dynamic array of images +(see examples below). + +We can either provide a sampler or not when creating the image. + +[NOTE] +==== +In the DPC++ CUDA backend a sampled image will correspond to a CUDA texture. +An unsampled image will usually correspond to a CUDA surface, unless the memory the +texture is created from is USM, then it will correspond to a CUDA texture. +==== + +After we're done with the image, we need to destroy the handle using `destroy_image_handle`. + +### Explicit copies + +```cpp +namespace sycl::ext::oneapi { + +enum image_copy_flags : unsigned int { + HtoD = 0, // Host to device + DtoH = 1, // Device to host + DtoD = 2, // Device to device +}; + +void copy_image(const sycl::queue& syclQueue, void* dst_ptr, void* src_ptr, + image_descriptor desc, image_copy_flags flags); + +} +``` + +`copy_image` can be used to copy data between host and device, +or alternatively between two devices. +`dst_ptr` and `src_ptr` can represent host or device pointers, depending on the value +of `image_copy_flags`. +The direction of the copy is determined by passing in a value of `image_copy_flags`. +The image descriptor is used to determine the dimensions of the image, and the image +row pitch, when copying the data. + + +[NOTE] +==== +The copy operation should be made a method on the queue in the future, +and the copy flags might be made redundant as we should be able to determine +whether given pointers are device, host, or USM pointers +==== + +### Reading and writing inside the kernel + +```cpp +namespace sycl::ext::oneapi { + +template +DataT read_image(const image_handle &imageHandle, const CoordT &coords); + +template +void write_image(const image_handle &imageHandle, const CoordT &Coords, + const DataT &Color); + +} +``` + +Inside a kernel it's possible to read an image using `read_image` via the image handle, +which returns a value of the user specified type `DataT`. +Similarly, it's possible to write to an image via a handle using `write_image`. + +`write_image` is not available for sampled images, or images created from USM memory. + +`DataT` must correspond to the type specified in the `image_descriptor` +when the image was created. + +The coordinates are specified as an `int` for 1D images, `int2` for 2D images, +and `int4` for 3D images (not `int3`). + + +[NOTE] +==== +Sampling will likely require passing a sampler handle to the `read_image` function. +The current proposal does not require this, as CUDA ties samplers and textures at time of creation. +==== + +## Examples + +### 1D image read/write + +```cpp +queue q; +auto ctxt = q.get_context(); + +constexpr size_t N = 512; +std::vector dataIn(N); +float exp = 512; +for (int i = 0; i < N; i++) { + dataIn[i] = float4(i, i, i, i); +} + +// Image descriptor - can use the same for both images +ext::oneapi::image_descriptor desc({N, 0, 0}, image_channel_order::rgba, + image_channel_type::fp32); + +// Extension: returns the device pointer to the allocated memory +auto imgMemoryIn = ext::oneapi::allocate_image(ctxt, desc); +auto imgMemoryOut = ext::oneapi::allocate_image(ctxt, desc); + +// Extension: copy over data to device +ext::oneapi::copy_image(q, imgMemoryIn, dataIn.data(), desc, + ext::oneapi::image_copy_flags::HtoD); + +// Extension: create the image and return the handle +ext::oneapi::image_handle imgIn = + ext::oneapi::create_image(ctxt, imgMemoryIn); +ext::oneapi::image_handle imgOut = + ext::oneapi::create_image(ctxt, imgMemoryOut); + +q.submit([&](handler &cgh) { + // No need to request access, handles captured by value + + cgh.parallel_for(N, [=](id<1> id) { + // Extension: read image data from handle + float4 px1 = + ext::oneapi::read_image(imgIn, int(id[0])); + + // Extension: write to image data using handle + sycl::ext::oneapi::write_image(imgOut, int(id[0]), px1); + }); +}); + +// Using image handles requires manual synchronization +q.wait_and_throw(); + +// Cleanup +ext::oneapi::destroy_image_handle(ctxt, imgIn); +ext::oneapi::destroy_image_handle(ctxt, imgOut); +ext::oneapi::free_image(ctxt, imgMemoryIn); +ext::oneapi::free_image(ctxt, imgMemoryOut); +``` + +### Reading from a dynamically sized array of 2D images + +```cpp +device dev; +queue q(dev); +auto ctxt = q.get_context(); + +// declare image data +size_t numImages = 5; +size_t width = 7; +size_t height = 3; +size_t N = width * height; +std::vector out(N); +std::vector expected(N); +std::vector dataIn(N); +for (int i = 0; i < width; i++) { + for (int j = 0; j < height; j++) { + expected[j + (height * i)] = (j + (height * i)) * numImages; + dataIn[j + (height * i)] = {j + (height * i), 0, 0, 0}; + } +} + +// Image descriptor - can use the same for all images +sycl::ext::oneapi::image_descriptor desc( + {width, height}, image_channel_order::rgba, image_channel_type::fp32); + +// Allocate each image and save the device ptrs +std::vector imgAllocations; +for (int i = 0; i < numImages; i++) { +// Extension: returns the device pointer to the allocated memory +auto device_ptr = sycl::ext::oneapi::allocate_image(ctxt, desc); +if (device_ptr == nullptr) { + std::cout << "Error allocating image!" << std::endl; + return 1; +} +imgAllocations.push_back(device_ptr); +} + +// Copy over data to device for each image +for (int i = 0; i < numImages; i++) { +// Extension: copy over data to device +sycl::ext::oneapi::copy_image(q, imgAllocations[i], dataIn.data(), desc, + sycl::ext::oneapi::image_copy_flags::HtoD); +} + +// Create the images and return the handles +std::vector imgHandles; +for (int i = 0; i < numImages; i++) { +// Extension: create the image and return the handle +sycl::ext::oneapi::unsampled_image_handle imgHandle = + sycl::ext::oneapi::create_image(ctxt, imgAllocations[i]); +imgHandles.push_back(imgHandle); +} + +// Cuda stores data in column-major fashion +// SYCL deals with indexing in row-major fashion +// Reverse output buffer dimensions and access to convert +// the cuda column-major data back to row-major +buffer buf((float *)out.data(), range<2>{height, width}); +buffer imgHandlesBuf( + imgHandles.data(), range<1>{numImages}); +q.submit([&](handler &cgh) { + auto outAcc = + buf.get_access(cgh, range<2>{height, width}); + + auto imgHandleAcc = + imgHandlesBuf.get_access(cgh, range<1>{numImages}); + + cgh.parallel_for( + nd_range<2>{{width, height}, {width, height}}, [=](nd_item<2> it) { + size_t dim0 = it.get_local_id(0); + size_t dim1 = it.get_local_id(1); + + // Sum each image by reading their handle + float sum = 0; + for (int i = 0; i < numImages; i++) { + // Extension: read image data from handle + sum += (sycl::ext::oneapi::read_image( + imgHandleAcc[i], int2(dim0, dim1)))[0]; + } + outAcc[id<2>{dim1, dim0}] = sum; + }); +}); + +// Using image handles requires manual synchronization +q.wait_and_throw(); + +// Cleanup +for (int i = 0; i < numImages; i++) { + sycl::ext::oneapi::destroy_image_handle(ctxt, imgHandles[i]); + sycl::ext::oneapi::free_image(ctxt, imgAllocations[i]); +} +``` + +## Implementation notes + +The current DPC++ prototype only implements the proposal for the CUDA backend, +however we are actively exploring Level Zero with SPIR-V. +We are looking at other backend as well in order to ensure the extension can work across different backends. + +## Issues + +### No dependency tracking + +Because this extension allows images to work in a USM-like model, +there are similar limitations to using USM for non-images, +mainly the lack of dependency tracking and the need for users to manually synchronize operations. + +### Limitations when using USM as image memory + +When USM is used as the image memory, these are the general limitations: + +* Not possible to write images, just read. + +Then there are dimension specific limitations: + +* 1D - Linear interpolation not possible in the CUDA backend. + A workaround is to allocate 2D pitched memory with a height of 1. +* 2D - Only works with `pitched_alloc_device`. +* 3D - No support at the moment. + +### Not supported yet + +These features still need to be handled: + +* Level Zero and SPIR-V support +* Mipmapping +* Interop with various backends +* etc. + +## Revision History + +[%header,cols="10,15,75"] +|=== +| Rev | Date | Changes +| 1 | 2023-02-03 | Initial draft +|=== From c1e92c1d5f843329330536c3a1e30ee870d4f6b5 Mon Sep 17 00:00:00 2001 From: Przemek Malon Date: Wed, 22 Feb 2023 16:54:43 +0000 Subject: [PATCH 2/6] [SYCL][Doc] Bindless Images revision 2 * Converted to ASCIIDOC * Added `image_mem_handle` for image memory allocated with `allocate_image` * Added ability to create images from USM * Added new way to copy images, removed requirement for copy direction * Added image memory information getters to reflect `cuArray3DGetDescriptor` functionality --- .../sycl_ext_oneapi_bindless_images.asciidoc | 360 +++++++++++------- 1 file changed, 215 insertions(+), 145 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bindless_images.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bindless_images.asciidoc index 5e9658621940c..781c275d08b5b 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bindless_images.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bindless_images.asciidoc @@ -1,7 +1,8 @@ -# sycl_ext_oneapi_bindless_images +# Bindless Images :source-highlighter: coderay :coderay-linenums-mode: table +:dpcpp: pass:[DPC++] // This section needs to be after the document title. :doctype: book @@ -9,35 +10,32 @@ :toc: left :encoding: utf-8 :lang: en -:dpcpp: pass:[DPC++] +:blank: pass:[ +] // Set the default source code type in this document to C++, // for syntax highlighting purposes. This is needed because // docbook uses c++ and html5 uses cpp. :language: {basebackend@docbook:c++:cpp} -## Notice + +== Notice [%hardbreaks] -Copyright (C) Codeplay Software Limited. All rights reserved. +Copyright (C) Codeplay. All rights reserved. Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks -of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by +of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by permission by Khronos. -## Contact -To report problems with this extension, please open a new issue at: +== Dependencies -https://github.com/intel/llvm/issues - -## Dependencies - -This extension is written against the SYCL 2020 revision 6 specification. All +This extension is written against the SYCL 2020 revision 6 specification. All references below to the "core SYCL specification" or to section numbers in the SYCL specification refer to that revision. -## Status + +== Status This is an experimental extension specification, intended to provide early access to features and gather community feedback. Interfaces defined in this @@ -46,15 +44,16 @@ change incompatibly in future versions of {dpcpp} without prior notice. *Shipping software products should not rely on APIs defined in this specification.* -## Backend support status -This extension is currently implemented in DPC++ only when using the CUDA backend. -Attempting to use this extension in kernels -that run on other devices or backends may result in undefined behavior. -Be aware that the compiler is not able to issue a diagnostic to warn you if this happens. -Work is ongoing to support other backends. +== Backend support status -## Overview +This extension is currently implemented in {dpcpp} only for GPU devices and +only when using the CUDA backend. Attempting to use this extension in +kernels that run on other devices or backends will not work. +Be aware that the compiler may not be able to issue a diagnostic to +warn you if this happens. + +== Overview Images in SYCL 1.2.1 were designed to work with OpenCL. SYCL 2020 tried to make them work with other backends as well @@ -69,19 +68,21 @@ potentially a number that's not known at compile time, which is very difficult or even impossible to do with the current model. That's why we propose in this document a new extension for SYCL 2020 images. -Per the proposal, users would be able to separate memory allocation for the image -from the actual image creation. -An image is represented by an opaque handle that can be passed directly into a kernel -without requesting access. -In many ways this model resembles more the USM model when accessing data on the device, -but it's specialized for dealing with images. +Per the proposal, users would be able to separate memory allocation for the +image from the actual image creation. +An image is represented by an opaque handle that can be passed directly into +a kernel without requesting access. +In many ways this model resembles more the USM model when accessing data on +the device, but it's specialized for dealing with images. The proposed model does not replace SYCL 2020 images, -it is instead meant as building blocks for implementing SYCL 2020 images on top of it. +it is instead meant as building blocks for implementing SYCL 2020 images on +top of it. + -## Specification +== Specification -### Feature test macro +=== Feature test macro This extension provides a feature-test macro as described in the core SYCL specification. An implementation supporting this extension must predefine the @@ -91,13 +92,13 @@ determine if the implementation supports this feature, or applications can test the macro's value to determine which of the extension's features the implementation supports. -[%header,cols="10,90"] -|=== -| Value | Description -| 202302 | The APIs of this experimental extension are not versioned, so the feature-test macro always has this value. -|=== +[frame="none",options="header"] +|====================== +|Value |Description +|202302 |Initial version of this extension +|====================== -### Image descriptor +=== Image descriptor ```cpp namespace sycl::ext::oneapi { @@ -114,21 +115,23 @@ struct image_descriptor { } ``` -The image descriptor represents the image dimensions, channel type, channel order, and row pitch. +The image descriptor represents the image dimensions, channel type, channel +order, and row pitch. Note that `image_channel_type` and `image_channel_order` existed in SYCL 1.2.1, but were removed in SYCL 2020 in favor of a single, unified enum class. We propose separating them again to enable better flexibility and to avoid combinatorial complexity. -### Allocating image memory +=== Allocating image memory ```cpp namespace sycl::ext::oneapi { -void* allocate_image(const context& syclContext, image_descriptor desc); +image_mem_handle allocate_image(const context& syclContext, + image_descriptor desc); -void free_image(const context& syclContext, void* memory_handle); +void free_image(const context& syclContext, void *memory_handle); } ``` @@ -138,40 +141,44 @@ allocate an image, then create an image handle from the allocation. The first step is to allocate memory for an image. `allocate_image` can be used for this, -as it allocates memory based on the requirements outlined by the image descriptor. -The function returns a pointer to the newly allocated memory. -Note that the image is an opaque type, so the pointer cannot be dereferenced on the host. - -After we're done with the image, we need to cleanup the memory using `free_image`. +as it allocates memory based on the requirements outlined by the image +descriptor. The function returns a handle to the newly allocated memory. +Note that the image is an opaque type, so the handle cannot be dereferenced on +the host. The layout of the allocated memory is backend-specific, and may be an +optimized layout, e.g. tile swizzle patterns. +After we're done with the image, we need to cleanup the memory using +`free_image`. -[NOTE] -==== -In the DPC++ CUDA backend, this will allocate a `CUarray` type. -==== +_In the DPC++ CUDA backend, these will allocate/deallocate a `CUarray` type._ An alternative is to allocate memory using device USM allocations. -In addition to the existing USM allocation functions, a new USM pitched allocation function is proposed: +In addition to the existing USM allocation functions, a new USM pitched +allocation function is proposed. ```cpp namespace sycl::ext::oneapi { -void* pitched_alloc_device(size_t* ret_pitch, size_t width_in_bytes, size_t height - unsigned int element_size_bytes, const queue& queue); +void *pitched_alloc_device(size_t* ret_pitch, size_t width_in_bytes, + size_t height unsigned int element_size_bytes, + const queue &queue); } ``` -This function will allocate a memory region aimed to be used for two-dimensional images. -It places appropriate padding at the end of image rows and returns the pitch value used. +This function will allocate a memory region aimed to be used for +two-dimensional images. +It places appropriate padding at the end of image rows and returns the pitch +value used. For one-dimensional images, existing USM allocation functions are sufficient. Three-dimensional images do not support USM. -The next step, creating the image, is the same regardless of how the memory was allocated. +The next step, creating the image, is the same regardless of how the memory +was allocated. -### Obtaining a handle to the image +=== Obtaining a handle to the image ```cpp namespace sycl::ext::oneapi { @@ -179,9 +186,15 @@ namespace sycl::ext::oneapi { using unsampled_image_handle = /* Implementation defined */; using sampled_image_handle = /* Implementation defined */; -unsampled_image_handle create_image(const context& syclContext, void* devPtr); -sampled_image_handle create_image(const context& syclContext, void* devPtr, - sampler& sampler); +unsampled_image_handle create_image(const context &syclContext, + image_mem_handle mem_handle); +sampled_image_handle create_image(const context &syclContext, + image_mem_handle mem_handle, + sampler &sampler); + +unsampled_image_handle create_image(const context &syclContext, void *usm_ptr); +sampled_image_handle create_image(const context &syclContext, void *usm_ptr, + sampler &sampler); void destroy_image_handle(const context& syclContext, sampled_image_handle& imageHandle); @@ -194,74 +207,86 @@ void destroy_image_handle(const context& syclContext, Once we have allocated memory, we can pass it into the `create_image` function to obtain a `sampled_image_handle` or `unsampled_image_handle`. These objects are opaque types that represent an image object. -They can be captured by value into a SYCL kernel, -or they can be passed in a buffer as a dynamic array of images -(see examples below). +They can be captured by value into a SYCL kernel, or they can be passed in a +buffer as a dynamic array of images (see examples below). We can either provide a sampler or not when creating the image. -[NOTE] -==== -In the DPC++ CUDA backend a sampled image will correspond to a CUDA texture. -An unsampled image will usually correspond to a CUDA surface, unless the memory the -texture is created from is USM, then it will correspond to a CUDA texture. -==== +_In the DPC++ CUDA backend a sampled image will correspond to a CUDA texture. +An unsampled image will usually correspond to a CUDA surface, unless the memory +the texture is created from USM, then it will correspond to a CUDA texture._ -After we're done with the image, we need to destroy the handle using `destroy_image_handle`. +After we're done with the image, we need to destroy the handle using +`destroy_image_handle`. -### Explicit copies +=== Explicit copies ```cpp -namespace sycl::ext::oneapi { - -enum image_copy_flags : unsigned int { - HtoD = 0, // Host to device - DtoH = 1, // Device to host - DtoD = 2, // Device to device +namespace sycl { +class queue { +public: + event ext_image_memcpy(ext::oneapi::image_mem_handle Dest, void *Src, + const ext::oneapi::image_descriptor &Desc); + event ext_image_memcpy(ext::oneapi::image_mem_handle Dest, void *Src, + const ext::oneapi::image_descriptor &Desc, + event DepEvent); + event ext_image_memcpy(ext::oneapi::image_mem_handle Dest, void *Src, + const ext::oneapi::image_descriptor &Desc, + const std::vector &DepEvents); + + event ext_image_memcpy(void *Dest, ext::oneapi::image_mem_handle Src, + const ext::oneapi::image_descriptor &Desc); + event ext_image_memcpy(void *Dest, ext::oneapi::image_mem_handle Src, + const ext::oneapi::image_descriptor &Desc, + event DepEvent); + event ext_image_memcpy(void *Dest, ext::oneapi::image_mem_handle Src, + const ext::oneapi::image_descriptor &Desc, + const std::vector &DepEvents); + + event ext_image_memcpy(void *Dest, void *Src, + const ext::oneapi::image_descriptor &Desc); + event ext_image_memcpy(void *Dest, void *Src, + const ext::oneapi::image_descriptor &Desc, + event DepEvent); + event ext_image_memcpy(void *Dest, void *Src, + const ext::oneapi::image_descriptor &Desc, + const std::vector &DepEvents); }; - -void copy_image(const sycl::queue& syclQueue, void* dst_ptr, void* src_ptr, - image_descriptor desc, image_copy_flags flags); - } ``` -`copy_image` can be used to copy data between host and device, -or alternatively between two devices. -`dst_ptr` and `src_ptr` can represent host or device pointers, depending on the value -of `image_copy_flags`. -The direction of the copy is determined by passing in a value of `image_copy_flags`. -The image descriptor is used to determine the dimensions of the image, and the image -row pitch, when copying the data. +An `ext_image_memcpy` function is proposed as a method of the queue, to +copy image memory. It can be used to copy image memory from host to device, or +device to host. The image descriptor is used to determine the dimensions of +the image, and the image row pitch, when copying the data. +For images allocated using USM, existing SYCL functionality can used to +copy their memory, but we also provide `ext_image_copy` functions that take +USM pointers. -[NOTE] -==== -The copy operation should be made a method on the queue in the future, -and the copy flags might be made redundant as we should be able to determine -whether given pointers are device, host, or USM pointers -==== - -### Reading and writing inside the kernel +=== Reading and writing inside the kernel ```cpp namespace sycl::ext::oneapi { -template -DataT read_image(const image_handle &imageHandle, const CoordT &coords); +DataT read_image(const unsampled_image_handle &imageHandle, + const CoordT &coords); +DataT read_image(const sampled_image_handle &imageHandle, + const CoordT &coords); template -void write_image(const image_handle &imageHandle, const CoordT &Coords, - const DataT &Color); +void write_image(const unsampled_image_handle &imageHandle, + const CoordT &Coords, const DataT &Color); } ``` -Inside a kernel it's possible to read an image using `read_image` via the image handle, -which returns a value of the user specified type `DataT`. +Inside a kernel it's possible to read an image using `read_image` via the image +handle, which returns a value of the user specified type `DataT`. Similarly, it's possible to write to an image via a handle using `write_image`. -`write_image` is not available for sampled images, or images created from USM memory. +`write_image` is not available for sampled images, or images created from USM +memory. `DataT` must correspond to the type specified in the `image_descriptor` when the image was created. @@ -269,16 +294,43 @@ when the image was created. The coordinates are specified as an `int` for 1D images, `int2` for 2D images, and `int4` for 3D images (not `int3`). +Note that all images must be used in either read-only or write-only fashion +within a single kernel invocation; read/write images are not supported. -[NOTE] -==== -Sampling will likely require passing a sampler handle to the `read_image` function. -The current proposal does not require this, as CUDA ties samplers and textures at time of creation. -==== +_Sampling will likely require passing a sampler handle to the `read_image` +function. The current proposal does not require this, as CUDA can tie samplers +and textures at time of creation._ -## Examples +=== Getting image information from non-USM image memory -### 1D image read/write +Extension functions are provided to retrieve information about images allocated +using the `image_allocate` function. + +```cpp +namespace sycl::ext::oneapi { + +sycl::range<3> get_image_range(const sycl::context &syclContext, + const image_mem_handle mem_handle); + +unsigned int get_image_flags(const sycl::context &syclContext, + const image_mem_handle mem_handle); + +sycl::image_channel_type +get_image_channel_type(const sycl::context &syclContext, + const image_mem_handle mem_handle); + +unsigned int get_image_num_channels(const sycl::context &syclContext, + const image_mem_handle mem_handle); + +} +``` + +_These functions provide the same information that CUDA's +`cuArray3DGetDescriptor` provides._ + +== Examples + +=== 1D image read/write ```cpp queue q; @@ -299,15 +351,17 @@ ext::oneapi::image_descriptor desc({N, 0, 0}, image_channel_order::rgba, auto imgMemoryIn = ext::oneapi::allocate_image(ctxt, desc); auto imgMemoryOut = ext::oneapi::allocate_image(ctxt, desc); +// Extension: create the image and return the handle +ext::oneapi::image_handle imgIn = + ext::oneapi::create_image(ctxt, imgMemoryIn, desc); +ext::oneapi::image_handle imgOut = + ext::oneapi::create_image(ctxt, imgMemoryOut, desc); + // Extension: copy over data to device -ext::oneapi::copy_image(q, imgMemoryIn, dataIn.data(), desc, - ext::oneapi::image_copy_flags::HtoD); +q.ext_image_copy(imgMemoryIn, dataIn.data(), desc); -// Extension: create the image and return the handle -ext::oneapi::image_handle imgIn = - ext::oneapi::create_image(ctxt, imgMemoryIn); -ext::oneapi::image_handle imgOut = - ext::oneapi::create_image(ctxt, imgMemoryOut); +// Wait for copy operation to finish +q.wait(); q.submit([&](handler &cgh) { // No need to request access, handles captured by value @@ -332,7 +386,8 @@ ext::oneapi::free_image(ctxt, imgMemoryIn); ext::oneapi::free_image(ctxt, imgMemoryOut); ``` -### Reading from a dynamically sized array of 2D images +=== Reading from a dynamically sized array of 2D images + ```cpp device dev; @@ -358,32 +413,34 @@ for (int i = 0; i < width; i++) { sycl::ext::oneapi::image_descriptor desc( {width, height}, image_channel_order::rgba, image_channel_type::fp32); -// Allocate each image and save the device ptrs -std::vector imgAllocations; +// Allocate each image and save the handles +std::vector imgAllocations; for (int i = 0; i < numImages; i++) { -// Extension: returns the device pointer to the allocated memory -auto device_ptr = sycl::ext::oneapi::allocate_image(ctxt, desc); -if (device_ptr == nullptr) { - std::cout << "Error allocating image!" << std::endl; - return 1; -} -imgAllocations.push_back(device_ptr); + // Extension: returns the handle to the allocated memory + auto img_mem = sycl::ext::oneapi::allocate_image(ctxt, desc); + if (img_mem == nullptr) { + std::cout << "Error allocating image!" << std::endl; + return 1; + } + imgAllocations.push_back(img_mem); } // Copy over data to device for each image for (int i = 0; i < numImages; i++) { -// Extension: copy over data to device -sycl::ext::oneapi::copy_image(q, imgAllocations[i], dataIn.data(), desc, - sycl::ext::oneapi::image_copy_flags::HtoD); + // Extension: copy over data to device + q.ext_image_copy(imgAllocations[i], dataIn.data(), desc); } +// Wait for copy operations to finish +q.wait(); + // Create the images and return the handles std::vector imgHandles; for (int i = 0; i < numImages; i++) { -// Extension: create the image and return the handle -sycl::ext::oneapi::unsampled_image_handle imgHandle = - sycl::ext::oneapi::create_image(ctxt, imgAllocations[i]); -imgHandles.push_back(imgHandle); + // Extension: create the image and return the handle + sycl::ext::oneapi::unsampled_image_handle imgHandle = + sycl::ext::oneapi::create_image(ctxt, imgAllocations[i], desc); + imgHandles.push_back(imgHandle); } // Cuda stores data in column-major fashion @@ -426,21 +483,23 @@ for (int i = 0; i < numImages; i++) { } ``` -## Implementation notes +== Implementation notes The current DPC++ prototype only implements the proposal for the CUDA backend, however we are actively exploring Level Zero with SPIR-V. -We are looking at other backend as well in order to ensure the extension can work across different backends. +We are looking at other backend as well in order to ensure the extension can +work across different backends. -## Issues +== Issues -### No dependency tracking +=== No dependency tracking Because this extension allows images to work in a USM-like model, there are similar limitations to using USM for non-images, -mainly the lack of dependency tracking and the need for users to manually synchronize operations. +mainly the lack of dependency tracking and the need for users to manually +synchronize operations. -### Limitations when using USM as image memory +=== Limitations when using USM as image memory When USM is used as the image memory, these are the general limitations: @@ -453,7 +512,7 @@ Then there are dimension specific limitations: * 2D - Only works with `pitched_alloc_device`. * 3D - No support at the moment. -### Not supported yet +=== Not supported yet These features still need to be handled: @@ -462,10 +521,21 @@ These features still need to be handled: * Interop with various backends * etc. -## Revision History +== Revision History + +[frame="none",options="header"] +|====================== +|Rev |Date |Changes +|1 |2023-02-03 | Initial draft +|2 |2023-02-23 | - Added `image_mem_handle` for image memory allocated with + `allocate_image` + + - Added ability to create images from USM + + - Added new way to copy images, removed requirement for copy + direction + + - Added image memory information getters to reflect + `cuArray3DGetDescriptor` functionality +|====================== -[%header,cols="10,15,75"] -|=== -| Rev | Date | Changes -| 1 | 2023-02-03 | Initial draft -|=== From 4b117f8928f56a351503dbea1efb3a0300f99370 Mon Sep 17 00:00:00 2001 From: Przemek Malon Date: Fri, 17 Mar 2023 13:28:09 +0000 Subject: [PATCH 3/6] [SYCL][Doc] Bindless Images revision 3 Co-authored-by: Przemyslaw Malon Co-authored-by: Sean Stirling - Some text clarifications. - Unsampled images can no longer be created from USM. - Added SYCL 1.2.1 `image_channel_order` and `image_channel_type` structs. - Added `image_type` to enable construction of layered, mipmap, and cubemap images in the future. - Added device information descriptors for querying pitched allocation size and alignment requirement. - Added `ext_oneapi_copy` methods for the `sycl::handler`. - `ext_oneapi_copy` functions now take the `Src` as the first parameter. - Created `image_mem` as a RAII style class. - Renamed `allocate_image` to `alloc_image_mem` - `pitched_alloc_device` can now take an `image_descriptor`. - Added interoperability features - Added support to query bindless image and interoperability capabilities - Added mipmap support --- .../sycl_ext_oneapi_bindless_images.asciidoc | 1805 ++++++++++++++--- 1 file changed, 1545 insertions(+), 260 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bindless_images.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bindless_images.asciidoc index 781c275d08b5b..d4b66def79e44 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bindless_images.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_bindless_images.asciidoc @@ -55,30 +55,51 @@ warn you if this happens. == Overview -Images in SYCL 1.2.1 were designed to work with OpenCL. -SYCL 2020 tried to make them work with other backends as well -by splitting the image type into sampled and unsampled images. +Images in SYCL 1.2.1 were designed to work with OpenCL. SYCL 2020 tried to make +them more versatile by splitting the image type into sampled and unsampled +images. This enabled SYCL 2020 images to work better with other backends. However, SYCL 2020 images still didn't quite meet user expectations. There was feedback about various use cases where the current model falls short (see examples at the end of this document for some of the use cases). -The main issue is requesting access to each individual image. -There are many use cases where there's a large number of images, -potentially a number that's not known at compile time, -which is very difficult or even impossible to do with the current model. - -That's why we propose in this document a new extension for SYCL 2020 images. -Per the proposal, users would be able to separate memory allocation for the -image from the actual image creation. -An image is represented by an opaque handle that can be passed directly into -a kernel without requesting access. -In many ways this model resembles more the USM model when accessing data on -the device, but it's specialized for dealing with images. +One of the key issues is requesting access to arbitrary images through handles, +and not accessors. Accessing images through handles instead of accessors grants +much more flexibility to the user, at the expense of automatic data dependency +tracking. Bypassing accessors allows users to implement programs where the +number of images is not known at compile-time, such as a texture atlas where one +image holds references to other images. This kind of feature is impossible to +implement with the accessor model outlined in the core specification. + +These shortcomings are why we propose a new extension for SYCL 2020 images. +Per our proposal, users would be able to separate memory allocation for the +image from the actual image creation. Images will be represented by opaque +handle types that can be passed directly into a kernel without requesting +access. In many ways, this model more closely resembles the USM model when +accessing data on the device, but it's specialized for dealing with images. The proposed model does not replace SYCL 2020 images, it is instead meant as building blocks for implementing SYCL 2020 images on top of it. +In addition to bindless images, this document also proposes an interoperability +extension providing functionality to allow users to import external memory and +semaphore objects from other APIs, such as Vulkan or DirectX. + +Importing memory allows it to be shared between APIs without the need to +duplicate allocations and perform multiple copies between host and device to +ensure that said memory is kept uniform across those APIs at all times. + +Importing semaphores will also allow SYCL to schedule command groups and queue +operations that depend on completion of GPU commands submitted by external APIs. + +[NOTE] +==== +The interoperability outlined in this document concerns only the importing of +external API objects into the SYCL runtime. We do not expose exportation of SYCL +objects to external APIs. Interoperability capabilities vary between APIs. For +example, CUDA allows the import of external memory and semaphores, but does not +allow export of its own resources. +==== == Specification @@ -94,111 +115,406 @@ implementation supports. [frame="none",options="header"] |====================== -|Value |Description -|202302 |Initial version of this extension +|Rev |Description +|1 |Initial draft of the proposal +|2 |Second revision of the proposal +|3 |Third revision of the proposal |====================== +See the revision history at the bottom of this document for features added in +each revision. + +=== Querying bindless image support + +We provide the following device queries to retrieve information on whether a +SYCL implementation provides support for various bindless image features. + +The device information descriptors for these queries are: + +[frame="none",options="header"] +|====================== +|Device descriptor |Return type |Description +|`info::device::ext_bindless_image_support` |`bool` | +Returns `true` if the device supports creation of bindless images backed by +the `image_mem` and `image_mem_handle` APIs. +|`info::device::ext_bindless_image_1D_USM_support` |`bool` | +Returns `true` if the device supports creation of 1D bindless images backed by +USM. +|`info::device::ext_bindless_image_2D_USM_support` |`bool` | +Returns `true` if the device supports creation of 2D bindless images backed by +USM. +|`info::device::ext_bindless_image_3D_USM_support` |`bool` | +Returns `true` if the device supports creation of 3D bindless images backed by +USM. +|====================== + +[NOTE] +==== +Not all SYCL backends may provide support for bindless images constructed from USM +memory with all dimensions (1D, 2D, 3D). As an example, CUDA does not have +native support for 3D image resources constructed from USM. +==== + === Image descriptor ```cpp -namespace sycl::ext::oneapi { +namespace sycl::ext::oneapi::experimental { + +enum class image_channel_order : unsigned int { + a = 0, + r = 1, + rx = 2, + rg = 3, + rgx = 4, + ra = 5, + rgb = 6, + rgbx = 7, + rgba = 8, + argb = 9, + bgra = 10, + intensity = 11, + luminance = 12, + abgr = 13, +}; + +enum class image_channel_type : unsigned int { + snorm_int8 = 0, + snorm_int16 = 1, + unorm_int8 = 2, + unorm_int16 = 3, + unorm_short_565 = 4, + unorm_short_555 = 5, + unorm_int_101010 = 6, + signed_int8 = 7, + signed_int16 = 8, + signed_int32 = 9, + unsigned_int8 = 10, + unsigned_int16 = 11, + unsigned_int32 = 12, + fp16 = 13, + fp32 = 14, +}; + +enum class image_type : unsigned int { + standard = 0, + mipmap = 1, + interop = 2, +}; struct image_descriptor { - unsigned int width; - unsigned int height; - unsigned int depth; + size_t width; + size_t height; + size_t depth; image_channel_type channel_type; image_channel_order channel_order; - unsigned int row_pitch; + image_type type; + unsigned int num_levels; + + image_descriptor(sycl::range<1> dims, sycl::image_channel_order channel_order, + sycl::image_channel_type channel_type, + image_type type = image_type::standard, + unsigned int num_levels = 1); + + image_descriptor(sycl::range<2> dims, sycl::image_channel_order channel_order, + sycl::image_channel_type channel_type, + image_type type = image_type::standard, + unsigned int num_levels = 1); + + image_descriptor(sycl::range<3> dims, sycl::image_channel_order channel_order, + sycl::image_channel_type channel_type, + image_type type = image_type::standard, + unsigned int num_levels = 1); }; } ``` -The image descriptor represents the image dimensions, channel type, channel -order, and row pitch. +The image descriptor represents the image dimensions, channel type, and channel +order. A `type` member is also present to allow for implementation of mipmapped +and interop images. + +The `image_descriptor` shall be default constructible and follow by-value +semantics. + +[NOTE] +==== +Additional future `image_type`s _may_ include "layered" and/or "cubemap". +==== Note that `image_channel_type` and `image_channel_order` existed in SYCL 1.2.1, but were removed in SYCL 2020 in favor of a single, unified enum class. We propose separating them again to enable better flexibility and to avoid combinatorial complexity. +For the `standard` image type, the value of `num_levels` must be `1`. + +Only mipmap image types support more than one level. + +The `type` member will inform the implementation of the type of image to +create, allocate, or free. + + === Allocating image memory +The process of creating an image is two-fold: +allocate an image's memory, then create an image handle from the allocation. +Allocation of image memory can be achieved in two ways. + +==== Allocating non-USM image memory + ```cpp -namespace sycl::ext::oneapi { +namespace sycl::ext::oneapi::experimental { + +struct image_mem_handle { + using raw_handle_type = /* implementation defined */; + raw_handle_type raw_handle; +} -image_mem_handle allocate_image(const context& syclContext, - image_descriptor desc); +class image_mem { -void free_image(const context& syclContext, void *memory_handle); + image_mem(); + image_mem(const image_mem &) = delete; // no copy-construct + image_mem(image_mem &&rhs) noexcept; // move-constructor is allowed + image_mem(const sycl::context &syclContext, const image_descriptor &desc); + + ~image_mem(); + + image_mem &operator=(image_mem &&); // move-assignment is allowed + image_mem &operator=(image_mem &) = delete; // no copy-assignment + + image_mem_handle get_handle() const; + image_descriptor get_descriptor() const; + sycl::context get_context() const; + + sycl::range<3> get_range() const; + sycl::image_channel_type get_image_channel_type() const; + sycl::image_channel_type get_image_channel_order() const; + unsigned int get_image_num_channels() const; + image_type get_type() const; + + image_mem_handle get_mip_level() const; +}; + +image_mem_handle alloc_image_mem(const context& syclContext, + const image_descriptor &desc); +void free_image_mem(const context& syclContext, + image_mem_handle memHandle + image_type imgType); } ``` -The process of creating an image is two-fold: -allocate an image, then create an image handle from the allocation. +The first method of allocating device memory for images is through +`alloc_image_mem`. This takes an `image_descriptor` and `sycl::context` to +allocate device memory, appropriately sized based on the `image_descriptor`. + +Memory allocated in this way requires the user to free that memory after all +operations using the memory are completed and no more operations operating on +the memory will be scheduled. This is done using `free_image_mem`. An +`image_type` should be passed to `free_image_mem` to inform the implementation +of the type of memory to be freed. + +The `image_mem` class is a RAII class wrapper that performs allocation and +deallocation of device memory. -The first step is to allocate memory for an image. -`allocate_image` can be used for this, -as it allocates memory based on the requirements outlined by the image -descriptor. The function returns a handle to the newly allocated memory. -Note that the image is an opaque type, so the handle cannot be dereferenced on -the host. The layout of the allocated memory is backend-specific, and may be an -optimized layout, e.g. tile swizzle patterns. +The default constructor does not allocate any memory on the device and the +resulting `image_mem` object is in an uninitialized state. -After we're done with the image, we need to cleanup the memory using -`free_image`. +The constructor `image_mem(const sycl::context &, const image_descriptor &)` +is a wrapper for `alloc_image_mem` functionality. -_In the DPC++ CUDA backend, these will allocate/deallocate a `CUarray` type._ +The destructor is a wrapper for `free_image_mem` functionality. -An alternative is to allocate memory using device USM allocations. -In addition to the existing USM allocation functions, a new USM pitched -allocation function is proposed. +`image_mem` also provides some functions to get various properties of the +image memory allocation such as the range, channel type, channel order, number +of channels, number of levels, and image type. + +In the case where a mipmap has been allocated, `get_mip_level` can be used to +return an `image_mem_handle` to a specific level of the mipmap. This can then +be used to copy data to that specific level or create an image handle based on +that level. + +Note that the handle type `image_mem_handle::raw_handle_type` is an opaque type, +and the handle cannot be dereferenced on the host. The layout of the memory is +backend-specific, and may be an optimized layout, e.g. tile swizzle patterns. + +`image_mem` shall not be copy-constructible, copy-assignable, or +device-copyable. +`image_mem` shall be default-constructible, move-constructible, as well as +provide a move-assignment operator. + +`image_mem_handle` shall be default-constructible and copy-constructible. +`image_mem_handle` shall not be device-copyable. + +If the construction or destruction of the `image_mem` class fails, a +`sycl::exception` with error code `sycl::errc::memory_allocation` will be +thrown. + +Similarly, if `alloc_image_mem` or `free_image_mem` fail, a `sycl::exception` +with error code `sycl::errc::memory_allocation` will be thrown. + +[NOTE] +==== +In the DPC++ CUDA backend, `image_mem` will allocate/deallocate a +`CUarray` type (or `CUmipmappedArray` in the case of mipmap images). +==== + +===== Getting image information from `image_mem_handle` + +Extension functions are provided to retrieve information about images allocated +using the `image_mem_alloc` function. These are similar to the member functions +provided by `image_mem`. However, since the `image_mem_handle` is a minimal +struct representing just the opaque handle the underlying memory object, there +is some information that we cannot retrieve from it, namely the `image_type`, +`image_channel_order`, the `sycl::context` the memory was allocated in, and the +`image_descriptor` used to allocate the memory. ```cpp namespace sycl::ext::oneapi { -void *pitched_alloc_device(size_t* ret_pitch, size_t width_in_bytes, - size_t height unsigned int element_size_bytes, - const queue &queue); +sycl::range<3> get_image_range(const sycl::context &syclContext, + const image_mem_handle mem_handle); + +sycl::image_channel_type +get_image_channel_type(const sycl::context &syclContext, + const image_mem_handle mem_handle); + +unsigned int get_image_num_channels(const sycl::context &syclContext, + const image_mem_handle mem_handle); +} +``` + +For `get_image_range` where the underlying image memory was allocated with one +or two dimensions, the returned `sycl::range<3>` will contain zero values for +the dimensions unused by the underlying image memory object. + +==== Allocating USM image memory + +The second way to allocate image memory is to use USM allocations. SYCL already +provides a number of USM allocation functions. This proposal would add another, +pitched memory allocation, through `pitched_alloc_device`. + +```cpp +namespace sycl::ext::oneapi::experimental { + +void *pitched_alloc_device(size_t *ret_pitch, const image_descriptor &desc, + const sycl::queue &queue); +void *pitched_alloc_device(size_t *ret_pitch, const image_descriptor &desc, + const sycl::device &dev, const sycl::context &ctxt); + +void *pitched_alloc_device(size_t *ret_pitch, size_t width_in_bytes, + size_t height, unsigned int element_size_bytes, + const sycl::queue &queue); +void *pitched_alloc_device(size_t *ret_pitch, size_t width_in_bytes, + size_t height, unsigned int element_size_bytes, + const sycl::device &dev, const sycl::context &ctxt); } ``` This function will allocate a memory region aimed to be used for -two-dimensional images. -It places appropriate padding at the end of image rows and returns the pitch -value used. +two-dimensional images. It allocates memory that is guaranteed to adhere to the +device's alignment requirements for 2D USM images. + +If the user does not wish to use `pitched_alloc_device` to allocate USM 2D +images, but prefers to use another USM allocation function instead, then that +allocation must adhere to some alignment restrictions. These restrictions are +device specific, and queries for them can be found in the "Pitch alignment +restrictions and queries" section below. + +If the allocation of pitched memory fails, `pitched_alloc_device` will throw a +`sycl::exception` with error code `sycl::errc::memory_allocation`. -For one-dimensional images, existing USM allocation functions are sufficient. +=== Pitch alignment restrictions and queries -Three-dimensional images do not support USM. +For the purposes of this document, the pitch of an image memory allocation is +the distance in bytes between the first elements of adjacent rows of the image. +Some devices may require two-dimensional images to be allocated with specific +alignments for their width and pitch. The `pitched_alloc_device` API intends to +make allocation of USM memory adhering to these restrictions easy, returning an +appropriate pitch value to the user. However, if a user wishes to use another +USM allocation function, they must be aware of these restrictions, and query +the device to ensure the allocations they wish to use adhere to those +restrictions. -The next step, creating the image, is the same regardless of how the memory -was allocated. +This proposal provides a number of additional device queries that enable the +user to allocate appropriate USM memory for two-dimensional images. + +The device information descriptors for these queries are: + +[frame="none",options="header"] +|====================== +|Device descriptor |Return type |Description +|`info::device::ext_oneapi_texture_pitch_align` |`uint32_t` | +Returns the required alignment of the pitch between two rows of an image in +bytes. +|`info::device::ext_oneapi_max_texture_linear_width` |`size_t` | +Returns the maximum linear width allowed for images allocated using USM. +|`info::device::ext_oneapi_max_texture_linear_height` |`size_t` | +Returns the maximum linear height allowed for images allocated using USM. +|`info::device::ext_oneapi_max_texture_linear_pitch` |`size_t` | +Returns the maximum linear pitch allowed for images allocated using USM. +|====================== === Obtaining a handle to the image +The next step is to create the image, and obtain the handle. + ```cpp -namespace sycl::ext::oneapi { +namespace sycl::ext::oneapi::experimental { -using unsampled_image_handle = /* Implementation defined */; -using sampled_image_handle = /* Implementation defined */; +/// Opaque unsampled image handle type. +struct unsampled_image_handle { + using raw_image_handle_type = /* Implementation defined */; -unsampled_image_handle create_image(const context &syclContext, - image_mem_handle mem_handle); -sampled_image_handle create_image(const context &syclContext, - image_mem_handle mem_handle, - sampler &sampler); + raw_image_handle_type get_image_handle() const; + raw_image_handle_type image_handle; +}; -unsampled_image_handle create_image(const context &syclContext, void *usm_ptr); -sampled_image_handle create_image(const context &syclContext, void *usm_ptr, - sampler &sampler); +/// Opaque sampled image handle type. +struct sampled_image_handle { + using raw_image_handle_type = /* Implementation defined */; + using raw_sampler_handle_type = /* Implementation defined */ -void destroy_image_handle(const context& syclContext, + raw_image_handle_type image_handle; + raw_sampler_handle_type sampler_handle; + + raw_image_handle_type get_image_handle() const; + raw_sampler_handle_type get_sampler_handle() const; +}; + +// Creating an unsampled image from an `image_mem_handle` +unsampled_image_handle create_image(const sycl::context &syclContext, + image_mem_handle memHandle, + const image_descriptor &desc); + +// Creating a sampled image from an `image_mem_handle` +sampled_image_handle create_image(const sycl::context &syclContext, + image_mem_handle memHandle, + const image_descriptor &desc, + const sampler &sampler); + +// Creating an unsampled image from an `image_mem` object +unsampled_image_handle create_image(const sycl::context &syclContext, + const image_mem &memHandle, + const image_descriptor &desc); + +// Creating a sampled image from an `image_mem` object +sampled_image_handle create_image(const sycl::context &syclContext, + const image_mem &memHandle, + const image_descriptor &desc, + const sampler &sampler); + +// Creating a sampled image from a USM allocation and pitch +sampled_image_handle create_image(const sycl::context &syclContext, + const void *usmPtr, size_t pitch, + const image_descriptor &desc, + const sampler &sampler); + +// Destroying an image handle +void destroy_image_handle(const sycl::context& syclContext, sampled_image_handle& imageHandle); -void destroy_image_handle(const context& syclContext, +void destroy_image_handle(const sycl::context& syclContext, unsampled_image_handle& imageHandle); } @@ -210,276 +526,1218 @@ These objects are opaque types that represent an image object. They can be captured by value into a SYCL kernel, or they can be passed in a buffer as a dynamic array of images (see examples below). -We can either provide a sampler or not when creating the image. +We can either provide a `sampler` or not when creating the image. Doing +so will create a `sampled_image_handle`, where otherwise an +`unsampled_image_handle` would be returned. A `sampled_image_handle` should +contain a raw sampler handle that will be used when sampling an image. + +Whether an `image_descriptor` or `void *` USM allocation was passed to +`create_image`, it must have been allocated in the same context as the one +passed to `create_image`. + +If we choose to create a 2D image from a USM allocation by passing a `void *`, +we must also pass the pitch of the memory allocation. If the memory was +allocated using `pitched_alloc_device`, the pitch passed must be the one which +was returned by `pitched_alloc_device`. If the user did not use +`pitched_alloc_device` to allocate this memory, then that memory must still +adhere to device specific alignment restrictions. These restrictions and their +queries are outlined in the section "Pitch alignment restrictions and queries" +below. + +The pitch is ignored for 1D images. + +If the creation of an image fails, `create_image` will throw a `sycl::exception` +with error code `sycl::errc::runtime`. -_In the DPC++ CUDA backend a sampled image will correspond to a CUDA texture. -An unsampled image will usually correspond to a CUDA surface, unless the memory -the texture is created from USM, then it will correspond to a CUDA texture._ +The `unsampled_image_handle` and `sampled_image_handle` types shall be +default-constructible, copy-constructible, and device-copyable. + +[NOTE] +==== +In the DPC++ CUDA backend a sampled image will correspond to a CUDA texture, +whereas an unsampled image will correspond to a CUDA surface. +==== After we're done with the image, we need to destroy the handle using -`destroy_image_handle`. +`destroy_image_handle`. Destroying an image handle does not deallocate the +underlying image memory. The user is responsible for deallocation, either +through `free_image_mem`, or destroying the `image_mem` object, if one was used. === Explicit copies ```cpp namespace sycl { + +class handler { +public: + // Host to Device using `image_mem_handle` + void ext_oneapi_copy(void *Src, + ext::oneapi::experimental::image_mem_handle Dest, + const ext::oneapi::experimental::image_descriptor &Desc); + + // Host to Device using `image_mem` + void ext_oneapi_copy(void *Src, + ext::oneapi::experimental::image_mem &Dest, + const ext::oneapi::experimental::image_descriptor &Desc); + + // Device to host using `image_mem_handle` + void ext_oneapi_copy(ext::oneapi::experimental::image_mem_handle Src, + void *Dest, + const ext::oneapi::experimental::image_descriptor &Desc); + + // Device to host using `image_mem` + void ext_oneapi_copy(ext::oneapi::experimental::image_mem &Src, + void *Dest, + const ext::oneapi::experimental::image_descriptor &Desc); + + // Host to device OR device to host using USM device memory + void ext_oneapi_copy(void *Src, + void *Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + size_t Pitch); +}; + class queue { public: - event ext_image_memcpy(ext::oneapi::image_mem_handle Dest, void *Src, - const ext::oneapi::image_descriptor &Desc); - event ext_image_memcpy(ext::oneapi::image_mem_handle Dest, void *Src, - const ext::oneapi::image_descriptor &Desc, - event DepEvent); - event ext_image_memcpy(ext::oneapi::image_mem_handle Dest, void *Src, - const ext::oneapi::image_descriptor &Desc, - const std::vector &DepEvents); - - event ext_image_memcpy(void *Dest, ext::oneapi::image_mem_handle Src, - const ext::oneapi::image_descriptor &Desc); - event ext_image_memcpy(void *Dest, ext::oneapi::image_mem_handle Src, - const ext::oneapi::image_descriptor &Desc, - event DepEvent); - event ext_image_memcpy(void *Dest, ext::oneapi::image_mem_handle Src, - const ext::oneapi::image_descriptor &Desc, - const std::vector &DepEvents); - - event ext_image_memcpy(void *Dest, void *Src, - const ext::oneapi::image_descriptor &Desc); - event ext_image_memcpy(void *Dest, void *Src, - const ext::oneapi::image_descriptor &Desc, - event DepEvent); - event ext_image_memcpy(void *Dest, void *Src, - const ext::oneapi::image_descriptor &Desc, - const std::vector &DepEvents); + // Host to Device using `image_mem` + event ext_oneapi_copy(void *Src, + ext::oneapi::experimental::image_mem &Dest, + const ext::oneapi::experimental::image_descriptor &Desc); + event ext_oneapi_copy(void *Src, + ext::oneapi::experimental::image_mem &Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + event DepEvent); + event ext_oneapi_copy(void *Src, + ext::oneapi::experimental::image_mem &Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + const std::vector &DepEvents); + + // Host to Device using `image_mem_handle` + event ext_oneapi_copy(void *Src, + ext::oneapi::experimental::image_mem_handle Dest, + const ext::oneapi::experimental::image_descriptor &Desc); + event ext_oneapi_copy(void *Src, + ext::oneapi::experimental::image_mem_handle Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + event DepEvent); + event ext_oneapi_copy(void *Src, + ext::oneapi::experimental::image_mem_handle Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + const std::vector &DepEvents); + + // Device to host using `image_mem` + event ext_oneapi_copy(ext::oneapi::experimental::image_mem &Src, + void *Dest, + const ext::oneapi::experimental::image_descriptor &Desc); + event ext_oneapi_copy(ext::oneapi::experimental::image_mem &Src, + void *Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + event DepEvent); + event ext_oneapi_copy(ext::oneapi::experimental::image_mem &Src, + void *Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + const std::vector &DepEvents); + + // Device to host using `image_mem_handle` + event ext_oneapi_copy(ext::oneapi::experimental::image_mem_handle Src, + void *Dest, + const ext::oneapi::experimental::image_descriptor &Desc); + event ext_oneapi_copy(ext::oneapi::experimental::image_mem_handle Src, + void *Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + event DepEvent); + event ext_oneapi_copy(ext::oneapi::experimental::image_mem_handle Src, + void *Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + const std::vector &DepEvents); + + // Host to device OR device to host using USM device memory + event ext_oneapi_copy(void *Src, void *Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + size_t DevicePitch); + event ext_oneapi_copy(void *Src, void *Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + size_t DevicePitch, + event DepEvent); + event ext_oneapi_copy(void *Src, void *Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + size_t DevicePitch, + const std::vector &DepEvents); + }; } ``` -An `ext_image_memcpy` function is proposed as a method of the queue, to -copy image memory. It can be used to copy image memory from host to device, or -device to host. The image descriptor is used to determine the dimensions of -the image, and the image row pitch, when copying the data. - -For images allocated using USM, existing SYCL functionality can used to -copy their memory, but we also provide `ext_image_copy` functions that take -USM pointers. +To enable the copying of images an `ext_oneapi_copy` function is proposed as a +method of the queue and handler. It can be used to copy image memory, whether +allocated through USM or the `image_mem` API, from host to +device, or device to host. The image descriptor passed to the `ext_oneapi_copy` +API is used to determine the dimensions and size in memory of the image to copy. + +For images allocated using USM, existing SYCL functionality can be used to +copy their memory, but we also provide `ext_oneapi_copy` functions that take +USM pointers. If the image memory was allocated using `pitched_alloc_device`, +then the `DevicePitch` parameter must match that which was returned from +`pitched_alloc_device`. If the user opted to use another allocation function +then the `DevicePitch` parameter should be set to the width of the memory +allocation in bytes, and adhere to the alignment restrictions outlined in the +"Pitch alignment restrictions and queries" section. + +When using `ext_oneapi_copy`, the user must ensure that the memory regions +accessed through `Dest` and `Src` have the same capacity. `ext_oneapi_copy` only +supports copying images of the same dimensions. Whether copying image memory to +the device through a USM `Dest` pointer, or an `image_mem`, the host +memory is always assumed to be tightly packed. Similarly, the host memory is +assumed to be packed when copying from device to host. + +For the forms that take `image_mem`, the handle must have been allocated +with the same context used to create the `queue`. + +For the forms that take a USM pointer, the image memory must have been allocated +with the same context used to create the `queue` and the USM memory must be +accessible on the queue's device. + +If copying of an image fails, `ext_oneapi_copy` will throw a `sycl::exception` +with error code `sycl::errc::runtime`. === Reading and writing inside the kernel ```cpp -namespace sycl::ext::oneapi { +namespace sycl::ext::oneapi::experimental { -DataT read_image(const unsampled_image_handle &imageHandle, - const CoordT &coords); -DataT read_image(const sampled_image_handle &imageHandle, - const CoordT &coords); +template +DataT read_image(const unsampled_image_handle &ImageHandle, + const CoordT &Coords); +template +DataT read_image(const sampled_image_handle &ImageHandle, + const CoordT &Coords); template -void write_image(const unsampled_image_handle &imageHandle, +void write_image(unsampled_image_handle &ImageHandle, const CoordT &Coords, const DataT &Color); - } ``` -Inside a kernel it's possible to read an image using `read_image` via the image -handle, which returns a value of the user specified type `DataT`. -Similarly, it's possible to write to an image via a handle using `write_image`. +Inside a kernel, it's possible to read an image via `read_image`, passing +the image handle. For the form that takes `unsampled_image_handle`, image data +will be fetched exactly as is in device memory. For the form that takes a +`sampled_image_handle`, the image will be sampled according to the +`sampler` that was passed to the image upon construction, and is included +in the `sampled_image_handle` as `sampled_image_handle::raw_sampler_handle`. +The returned data will be of type `DataT`, which is specified by the user, and +should map to the type that the image was created with (a combination of +`image_channel_type` and `image_channel_order`). An example of a type resulting +from these two is shown below. + +For reads or writes, `DataT` must correspond to the type specified in the +`image_descriptor` when the image was created. For multi-channel types, the +resultant `DataT` should be a `sycl::vec` type. E.g., for a channel order of +`image_channel_order::rg` and channel type of `image_channel_type::fp16`, the +resultant `DataT` should be `sycl::vec`. + +It's possible to write to an image via `write_image` passing the +handle of the image to be written to, along with the coordinates to write to and +the data. + +Sampled images cannot be written to using `write_image`. + +For unsampled images, coordinates are specified by `int`, `sycl::vec`, +and `sycl::vec` for 1D, 2D, and 3D images respectively. -`write_image` is not available for sampled images, or images created from USM -memory. +Sampled image reads take `float`, `sycl::vec`, and +`sycl::vec` coordinate types for 1D, 2D, and 3D images respectively. -`DataT` must correspond to the type specified in the `image_descriptor` -when the image was created. +In the case of 3D reads or writes, the fourth element in the coordinate vector +is ignored. -The coordinates are specified as an `int` for 1D images, `int2` for 2D images, -and `int4` for 3D images (not `int3`). +Note that coordinates for 3D images take a vector of size 4, not 3. -Note that all images must be used in either read-only or write-only fashion +Note also that all images must be used in either read-only or write-only fashion within a single kernel invocation; read/write images are not supported. -_Sampling will likely require passing a sampler handle to the `read_image` -function. The current proposal does not require this, as CUDA can tie samplers -and textures at time of creation._ +== Mipmapped images -=== Getting image information from non-USM image memory +So far, we have described how to create and operate on standard bindless images. +Another type of image we propose support for is a mipmapped image. Mipmapped +images are an image type with multiple levels. Each consecutive dimension of a +mipmapped image level is smaller than the previous level. The dimensions of a +succeeding mip level is half that of the preceding level. As an example, a +two-dimensional mipmapped image where the top-most level (`level==0`) image has +a `width==16` and `height==16`, the succeeding level (`level==1`) in the mipmap +will have sizes `width==8` and `height==8`. This pattern continues until either +the final level has sizes of `width==1` and `height==1`, or the user-specified +maximum mip level has been reached. + +=== Querying mipmap support + +We provide the following device queries to retrieve information on whether a +SYCL implementation provides support for various mipmap features. + +The device information descriptors for these queries are: + +[frame="none",options="header"] +|====================== +|Device descriptor |Return type |Description +|`info::device::ext_oneapi_mipmap_support` |`bool` | +Return `true`` if the device supports allocating mipmap resources. +|`info::device::ext_oneapi_anisotropic_support` |`bool` | +Return `true`` if the device supports sampling mipmap images with anisotropic +filtering +|`info::device::ext_oneapi_max_anisotropy` |`float` | +Return the maximum anisotropic ratio supported by the device +|`info::device::ext_oneapi_mipmap_level_reference` |`bool` | +Return `true` if the device supports using images created from individual +mipmap levels +|====================== + +=== Allocation of mipmapped images + +Mipmaps are allocated in a similar manner to standard images, however, mipmaps +do not support USM backed memory. + +Mipmap memory is allocated through `alloc_image_mem`. The user should populate +the `image_descriptor` with the image type of `image_type::mipmap`, and provide +the number of mipmaps levels they wish to allocate. The value of `num_levels` +must be greater than `1`. + +Mipmap memory allocated this way requires the user to free that memory after all +operations using the memory are completed and no more operations operating on +the memory will be scheduled. This is done using `free_image_mem`, passing `image_type::mipmap`. Importantly, individual levels of a mipmap must not be +freed before calling `free_image_mem`. + +The `image_mem` class acts as a RAII class wrapper that performs allocation and +deallocation of mipmap device memory. The constructor and destructor act as a +wrapper for the functions `alloc_image_mem` and `free_image_mem` respectively. + +When the underlying memory of `image_mem` is a mipmap, `get_mip_level` can be +used to return an `image_mem_handle` to a specific level of the mipmap. This can +then be used to copy data to that specific level or create an image based on +that level. + +=== Obtaining a handle to a mipmap image + +A handle to a mipmap image is acquired in the same way as a +`sampled_image_handle`. Mipmaps can only be sampled image types. We can create a +`sampled_image_handle` to the allocated mipmap through the `create_image` +functions which take a sampler. + +Below we describe an extension to the existing SYCL `sampler` object which +allows for the sampling of mipmaps with support for level of detail(LOD) and +anisotropic filtering. To sample a mipmap correctly, the extended attributes of +this sampler must be defined. + +Attempting to create an `unsampled_image_handle` to a mipmap will result in a +`sycl::exception` with error code `sycl::errc::runtime` being thrown. + +=== Extended sampler object + +To represent the different sampling capabilities that mipmaps provide as opposed +to standard sampled images, we propose to extend the core SYCL `sampler` +class. + +We propose to add a `mipmap_filtering_mode` enum, mipmap filtering properties' +values, getter functions, and a new constructor to the SYCL `sampler` +class. + +```c++ +namespace sycl { + +enum class mipmap_filtering_mode : /* unspecified */ { + nearest, + linear, +}; + +class sampler { +public: + + sampler(coordinate_normalization_mode normalizationMode, + addressing_mode addressingMode, filtering_mode filteringMode, + mipmap_filtering_mode mipmapFilteringMode, float minMipmapLevelClamp, + float maxMipmapLevelClamp, float maxAnisotropy, + const property_list &propList = {}); + + + mipmap_filtering_mode get_mipmap_filtering_mode() const; + float get_min_mipmap_level_clamp() const; + float get_max_mipmap_level_clamp() const; + float get_max_anisotropy() const; +}; +} +``` + +`mipmapFilteringMode` dictates the method in which sampling between mipmap +levels is performed. + +`minMipmapLevelClamp` defines the minimum mipmap level from which we can sample, +with the minimum value being 0. + +`maxMipmapLevelClamp` defines the maximum mipmap level from which we can sample. +This cannot value cannot be higher than the number of allocated levels. + +`maxAnisotropy` dictates the anisotropic ratio used when samplling the mipmap +with anisotropic filtering. + +=== Copying mipmap image data + +We provide two ways in which image data can be copied to mipmap levels. + +The first method is to retrieve the handle to an individual mipmap level using +`get_mip_level`, provided by `image_mem`, and copy via the previously described +copy functions + +The user must retrieve the `image_mem_handle` themselves for +the particular level and provide an `image_descriptor` which describes that +level. For instance, copying to the second mipmap level must take a descriptor +whose width, height, depth, are all half of the first mipmap level. Providing an +incorrect descriptor can lead to undefined behaviour or throw a +`sycl::exception` with error code `sycl::errc::runtime`. + +The second method is to use one of the provided `ext_oneapi_copy` functions on +the queue or handler, which take a `Level` parameter specifying which level the +user wishes to copy from device to host, or from host to device. + +For this method, the user only needs to provide an additional `Level` +parameter to the `ext_oneapi_copy` function. The descriptor passed to +`ext_oneapi_copy` would be used to calculate the appropriate mipmap level +dimensions. + +```c++ +namespace sycl { + +class handler { +public: + // Host to Device using `image_mem_handle` and a mipmap level + void ext_oneapi_copy(void *Src, + ext::oneapi::experimental::image_mem_handle Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + unsigned int Level); + + // Host to Device using `image_mem` and a mipmap level + void ext_oneapi_copy(void *Src, + ext::oneapi::experimental::image_mem &Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + unsigned int Level); + + // Device to host using `image_mem_handle` and a mipmap level + void ext_oneapi_copy(ext::oneapi::experimental::image_mem_handle Src, + void *Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + unsigned int Level); + + // Device to host using `image_mem` and a mipmap level + void ext_oneapi_copy(ext::oneapi::experimental::image_mem &Src, + void *Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + unsigned int Level); + + +}; + +class queue { + // Host to Device using `image_mem` and a mipmap level + event ext_oneapi_copy(void *Src, + ext::oneapi::experimental::image_mem &Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + unsigned int Level); + event ext_oneapi_copy(void *Src, + ext::oneapi::experimental::image_mem &Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + unsigned int Level, + event DepEvent); + event ext_oneapi_copy(void *Src, + ext::oneapi::experimental::image_mem &Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + unsigned int Level, + const std::vector &DepEvents); + + // Host to Device using `image_mem_handle` + event ext_oneapi_copy(void *Src, + ext::oneapi::experimental::image_mem_handle Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + unsigned int Level); + event ext_oneapi_copy(void *Src, + ext::oneapi::experimental::image_mem_handle Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + unsigned int Level, + event DepEvent); + event ext_oneapi_copy(void *Src, + ext::oneapi::experimental::image_mem_handle Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + unsigned int Level, + const std::vector &DepEvents); + + // Device to host using `image_mem` and a mipmap level + event ext_oneapi_copy(ext::oneapi::experimental::image_mem &Src, + void *Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + unsigned int Level); + event ext_oneapi_copy(ext::oneapi::experimental::image_mem &Src, + void *Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + unsigned int Level, + event DepEvent); + event ext_oneapi_copy(ext::oneapi::experimental::image_mem &Src, + void *Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + unsigned int Level, + const std::vector &DepEvents); + + // Device to host using `image_mem_handle` and a mipmap level + event ext_oneapi_copy(ext::oneapi::experimental::image_mem_handle Src, + void *Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + unsigned int Level); + event ext_oneapi_copy(ext::oneapi::experimental::image_mem_handle Src, + void *Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + unsigned int Level, + event DepEvent); + event ext_oneapi_copy(ext::oneapi::experimental::image_mem_handle Src, + void *Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + unsigned int Level, + const std::vector &DepEvents); +}; +} +``` + +=== Reading a mipmap + +Inside the kernel, it's possible to read a mipmap via `read_image`, passing the +`sampled_image_handle`, the coordinates, and either the level or anisotropic +gradient values. + +The method of sampling a mipmap is different based on which `read_image` +function is used, and the sampler attributes passed upon creation of the +mipmap. + +```c++ +// Nearest/linear filtering between mip levels +template +DataT read_image(const sample_image_handle &ImageHandle, + const CoordT &Coords, + const float Level); + +// Anisotropic filtering +template +DataT read_image(const sampled_image_handle &ImageHandle, + const CoordT &Coords, + const CoordT &Dx, const CoordT &Dy); +``` + +== Interoperability + +=== Querying interoperability support + +We provide the following device queries to retrieve information on whether a +SYCL implementation provides support for various interoperability features. + +The device information descriptors for these queries are: + +[frame="none",options="header"] +|====================== +|Device descriptor |Return type |Description +|`info::device::ext_oneapi_interop_memory_import_support` |`bool` | +Return true if the device supports importing external memory resources. +|`info::device::ext_oneapi_interop_memory_export_support` |`bool` | +Return true if the device supports exporting internal memory resources. +|`info::device::ext_oneapi_interop_semaphore_import_support` |`bool` | +Return true if the device supports importing external semaphore resources. +|`info::device::ext_oneapi_interop_semaphore_export_support` |`bool` | +Return true if the device supports exporting internal event resources. +|====================== -Extension functions are provided to retrieve information about images allocated -using the `image_allocate` function. + +[NOTE] +==== +Not all SYCL backends may provide support for importing or exporting native +memory or semaphore objects. CUDA for example only supports importation of +external memory and semaphores, but provides no support for their exportation. +==== + +=== Importing external memory objects + +In order to import a memory object, an external API must provide an appropriate +handle to that memory. The exact structure and type of this handle can depend on +the external API, and the operating system the application is running on. + +In order to facilitate a number of different external memory handle types, we +propose the following structures. + +[NOTE] +==== +We only show two examples of external memory handle types here, but the +`external_mem_descriptor` struct could be templated by any number of handle +types, provided that the SYCL implementation provides support for them. +==== ```cpp -namespace sycl::ext::oneapi { +namespace sycl::ext::oneapi::experimental { -sycl::range<3> get_image_range(const sycl::context &syclContext, - const image_mem_handle mem_handle); +// POSIX file descriptor memory handle type +struct external_mem_fd { + int file_descriptor; +}; -unsigned int get_image_flags(const sycl::context &syclContext, - const image_mem_handle mem_handle); +// Windows NT memory handle type +struct external_mem_win32 { + void *handle; + const void *name; +}; -sycl::image_channel_type -get_image_channel_type(const sycl::context &syclContext, - const image_mem_handle mem_handle); +// Descriptor templated on specific external memory handle type +template +struct external_mem_handle_type { + external_mem_handle_type external_handle; + size_t size_in_bytes; +}; -unsigned int get_image_num_channels(const sycl::context &syclContext, - const image_mem_handle mem_handle); +} +``` + +The user should create an `external_mem_descriptor` templated on the appropriate +handle type for their purposes, e.g. `external_mem_fd` to describe a POSIX file +descriptor resource on Linux systems, or an `external_mem_win32` for Windows NT +resource handles. + +Once the user populates the `external_mem_descriptor` with the appropriate +`external_mem_handle_type` values, and the size of the external memory in bytes, +they can then import that memory into SYCL through `import_external_memory`. + +```cpp +namespace sycl::ext::oneapi::experimental { + +struct interop_mem_handle { + using raw_handle_type = /* Implementation defined */; + raw_handle_type raw_handle; +}; + +template +interop_mem_handle import_external_memory( + const sycl::context &syclContext, + external_mem_descriptor externalMemDescriptor); + + +image_mem_handle map_external_memory_array( + const sycl::context &syclContext, + interop_mem_handle interopMemHandle, + const image_descriptor &imageDescriptor); + +void *map_external_memory_buffer( + const sycl::context &syclContext, + interop_mem_handle interopMemHandle); } ``` -_These functions provide the same information that CUDA's -`cuArray3DGetDescriptor` provides._ +The resulting `interop_mem_handle` can then be mapped to two different memory +types, a buffer, or an array. + +If mapped to a buffer, the resulting `void*` can +be used in USM like fashion. It can be passed as an argument to a kernel and +therein manipulated. + +If mapped to an array, the resulting type is an `image_mem_handle`, which can be +used to construct images in the same way as memory allocated through +`alloc_image_mem`. The `ext_oneapi_copy` operations also work with imported +memory mapped to `image_mem_handle` types. + +When calling `create_image` with an `image_mem_handle` mapped from an external +memory object, the user must ensure that the image descriptor they pass to +`create_image` has members that match or map to those of the external API. +A mismatch between any of the `width`, `height`, `depth`, `image_channel_type`, +or `image_channel_order` members will result in undefined behavior. The +`image_type` member must be set to `image_type::interop`. + +Once a user has finished operating on imported memory, they must ensure that +they destroy the imported memory handle through `destroy_external_memory`. + +`destroy_external_memory` can only accept `interop_mem_handles` that were +created through `import_external_memory`. + +```cpp +namespace sycl::ext::oneapi::experimental { + +void +destroy_external_memory(const sycl::context &syclContext, + interop_mem_handle interopMem); + +} +``` + +Destroying or freeing any imported memory through `image_mem_free` or +`sycl::free` will result in undefined behavior. + +=== Importing external semaphores + +In addition to proposing importation of external memory resources, we also +propose importation of synchronization primitives. Just like the sharing of +memory between APIs described above, any external APIs must provide a valid a +handle to a valid semaphore resource they wish to share, and just as external +memory resources handles can take different forms of structure and type +depending on the API and operating system, so do external semaphore resource +handles. + +In order to facilitate a number of different external semaphore handle types, we +propose the following structures. + +[NOTE] +==== +We only show two examples of external semaphore resource handle types here, but +the `external_semaphore_descriptor` struct could be templated by any number of +handle types, provided that the SYCL implementation provides support for them. +==== + +```cpp +namespace sycl::ext::oneapi::experimental { + +// POSIX file descriptor semaphore handle +struct external_semaphore_fd { + int file_descriptor; +}; + +// Windows NT semaphore handle +struct external_semaphore_win32 { + void *handle; + const void *name; +}; + +// Descriptor templated on specific external semaphore handle type +template +struct external_semaphore_descriptor { + external_semaphore_handle_type external_handle; +}; + +} +``` + +The user should create an `external_semaphore_descriptor` templated on the +appropriate handle type for their purposes, e.g. `external_semaphore_fd` to +describe a POSIX file descriptor resource on Linux systems, or an +`external_mem_win32` for Windows NT resource handles. + +Once the user populates the `external_semaphore_descriptor` with the appropriate +`external_semaphore_handle_type` values, they can then import that semaphore +into SYCL through `import_external_semaphore`. + +```cpp +namespace sycl::ext::oneapi::experimental { + +struct interop_semaphore_handle { + using raw_handle_type = /* Implementation defined */; + raw_handle_type raw_handle; +}; + +template +interop_semaphore_handle import_external_semaphore( + const sycl::context &syclContext, + external_semaphore_descriptor + externalSemaphoreDescriptor); + +} +``` + +The resulting `interop_semaphore_handle` can then be used in a SYCL command +group, to either wait until the semaphore is in the signaled state, or set the +semaphore to a signaled state. + +```cpp +namespace sycl::ext::oneapi::experimental { + +sycl::event +wait_external_semaphore(const sycl::queue &syclQueue, + interop_semaphore_handle semaphoreHandle); + +sycl::event +signal_external_semaphore(const sycl::queue &syclQueue, + interop_semaphore_handle semaphoreHandle); + +} +``` + +`wait_external_semaphore` and `signal_external_semaphore` are non-blocking, +asynchronous operations. + +Any operations submitted to the queue after a `wait_external_semaphore` call +will not begin until the imported semaphore is in a signaled state. + +When `signal_external_semaphore` is called, the external semaphore will be set +to the signaled state after all commands submitted to the queue prior to the +`signal_external_semaphore` call complete. + +The user must ensure to destroy all external semaphore objects once they are no +longer required through `destroy_external_semaphore`. + +```cpp +namespace sycl::ext::oneapi::experimental { + +void +destroy_external_semaphore(const sycl::context &syclContext, + interop_semaphore_handle semaphoreHandle); + +} +``` + +In addition to the extension functions described above, we also propose to +extend the SYCL queue and handler classes with semaphore waiting and signalling +operations. + +```cpp +namespace sycl { + +class handler { +public: + void ext_oneapi_wait_external_semaphore( + ext::oneapi::experimental::interop_semaphore_handle + interop_semaphore_handle); + + void ext_oneapi_signal_external_semaphore( + ext::oneapi::experimental::interop_semaphore_handle + interop_semaphore_handle); +}; + +class queue { +public: + event ext_oneapi_wait_external_semaphore( + ext::oneapi::experimental::interop_semaphore_handle + interop_semaphore_handle); + event ext_oneapi_wait_external_semaphore( + ext::oneapi::experimental::interop_semaphore_handle + interop_semaphore_handle, + event DepEvent); + event ext_oneapi_wait_external_semaphore( + ext::oneapi::experimental::interop_semaphore_handle + interop_semaphore_handle, + const std::vector &DepEvents); + + event ext_oneapi_signal_external_semaphore( + ext::oneapi::experimental::interop_semaphore_handle + interop_semaphore_handle); + event ext_oneapi_signal_external_semaphore( + ext::oneapi::experimental::interop_semaphore_handle + interop_semaphore_handle, + event DepEvent); + event ext_oneapi_signal_external_semaphore( + ext::oneapi::experimental::interop_semaphore_handle + interop_semaphore_handle, + const std::vector &DepEvents); +}; +} +``` == Examples === 1D image read/write ```cpp -queue q; -auto ctxt = q.get_context(); - -constexpr size_t N = 512; -std::vector dataIn(N); -float exp = 512; -for (int i = 0; i < N; i++) { - dataIn[i] = float4(i, i, i, i); +// Set up device, queue, and context +sycl::device dev; +sycl::queue q(dev); +sycl::context ctxt = q.get_context(); + +// Initialize input data +constexpr size_t width = 512; +std::vector dataIn(width); +std::vector dataOut(width); +for (int i = 0; i < width; i++) { + dataIn[i] = static_cast(i); } // Image descriptor - can use the same for both images -ext::oneapi::image_descriptor desc({N, 0, 0}, image_channel_order::rgba, - image_channel_type::fp32); +sycl::ext::oneapi::experimental::image_descriptor desc( + sycl::range{width}, sycl::image_channel_order::r, + sycl::image_channel_type::fp32); -// Extension: returns the device pointer to the allocated memory -auto imgMemoryIn = ext::oneapi::allocate_image(ctxt, desc); -auto imgMemoryOut = ext::oneapi::allocate_image(ctxt, desc); +try { + // Extension: returns the device pointer to the allocated memory + sycl::ext::oneapi::experimental::image_mem imgMemoryIn(ctxt, desc); + sycl::ext::oneapi::experimental::image_mem imgMemoryOut(ctxt, desc); -// Extension: create the image and return the handle -ext::oneapi::image_handle imgIn = - ext::oneapi::create_image(ctxt, imgMemoryIn, desc); -ext::oneapi::image_handle imgOut = - ext::oneapi::create_image(ctxt, imgMemoryOut, desc); + // Extension: create the image and return the handle + sycl::ext::oneapi::experimental::unsampled_image_handle imgIn = + sycl::ext::oneapi::experimental::create_image(ctxt, imgMemoryIn, desc); + sycl::ext::oneapi::experimental::unsampled_image_handle imgOut = + sycl::ext::oneapi::experimental::create_image(ctxt, imgMemoryOut, desc); -// Extension: copy over data to device -q.ext_image_copy(imgMemoryIn, dataIn.data(), desc); + // Extension: copy over data to device + q.ext_oneapi_copy(dataIn.data(), imgMemoryIn, desc); -// Wait for copy operation to finish -q.wait(); + // Bindless images require manual synchronization + // Wait for copy operation to finish + q.wait_and_throw(); -q.submit([&](handler &cgh) { - // No need to request access, handles captured by value + q.submit([&](sycl::handler &cgh) { + // No need to request access, handles captured by value - cgh.parallel_for(N, [=](id<1> id) { - // Extension: read image data from handle - float4 px1 = - ext::oneapi::read_image(imgIn, int(id[0])); + cgh.parallel_for(width, [=](sycl::id<1> id) { + // Extension: read image data from handle + float pixel = sycl::ext::oneapi::experimental::read_image( + imgIn, int(id[0])); - // Extension: write to image data using handle - sycl::ext::oneapi::write_image(imgOut, int(id[0]), px1); + // Extension: write to image data using handle + sycl::ext::oneapi::experimental::write_image(imgOut, int(id[0]), pixel); + }); }); -}); -// Using image handles requires manual synchronization -q.wait_and_throw(); + // Using image handles requires manual synchronization + q.wait_and_throw(); + + // Copy data written to imgOut to host + q.ext_oneapi_copy(imgMemoryOut, dataOut.data(), desc); + + // Cleanup + sycl::ext::oneapi::experimental::destroy_image_handle(ctxt, imgIn); + sycl::ext::oneapi::experimental::destroy_image_handle(ctxt, imgOut); +} catch (sycl::exception e) { + std::cerr << "SYCL exception caught: " << e.what(); + exit(-1); +} -// Cleanup -ext::oneapi::destroy_image_handle(ctxt, imgIn); -ext::oneapi::destroy_image_handle(ctxt, imgOut); -ext::oneapi::free_image(ctxt, imgMemoryIn); -ext::oneapi::free_image(ctxt, imgMemoryOut); +// Validate that `dataIn` correctly transferred to `dataOut` +bool validated = (dataIn == dataOut); ``` === Reading from a dynamically sized array of 2D images ```cpp -device dev; -queue q(dev); -auto ctxt = q.get_context(); +// Set up device, queue, and context +sycl::device dev; +sycl::queue q(dev); +sycl::context ctxt = q.get_context(); // declare image data size_t numImages = 5; -size_t width = 7; -size_t height = 3; +size_t width = 8; +size_t height = 8; size_t N = width * height; -std::vector out(N); -std::vector expected(N); -std::vector dataIn(N); +std::vector dataIn(N); +std::vector dataOut(N); +std::vector dataExpected(N); for (int i = 0; i < width; i++) { for (int j = 0; j < height; j++) { - expected[j + (height * i)] = (j + (height * i)) * numImages; - dataIn[j + (height * i)] = {j + (height * i), 0, 0, 0}; + int index = j + (height * i); + dataIn[index] = index; + dataExpected[index] = index * numImages; } } // Image descriptor - can use the same for all images -sycl::ext::oneapi::image_descriptor desc( - {width, height}, image_channel_order::rgba, image_channel_type::fp32); - -// Allocate each image and save the handles -std::vector imgAllocations; -for (int i = 0; i < numImages; i++) { - // Extension: returns the handle to the allocated memory - auto img_mem = sycl::ext::oneapi::allocate_image(ctxt, desc); - if (img_mem == nullptr) { - std::cout << "Error allocating image!" << std::endl; - return 1; +sycl::ext::oneapi::experimental::image_descriptor desc( + {width, height}, sycl::image_channel_order::r, + sycl::image_channel_type::fp32); + +try { + + // Allocate each image and save the handles + std::vector imgAllocations; + for (int i = 0; i < numImages; i++) { + // Extension: move-construct device allocated memory + imgAllocations.emplace_back( + sycl::ext::oneapi::experimental::image_mem{ctxt, desc}); + } + + // Copy over data to device for each image + for (int i = 0; i < numImages; i++) { + // Extension: copy over data to device + q.ext_oneapi_copy(dataIn.data(), imgAllocations[i], desc); + } + + // Wait for copy operations to finish + q.wait_and_throw(); + + // Create the images and return the handles + std::vector + imgHandles; + for (int i = 0; i < numImages; i++) { + // Extension: create the image and return the handle + sycl::ext::oneapi::experimental::unsampled_image_handle imgHandle = + sycl::ext::oneapi::experimental::create_image(ctxt, imgAllocations[i], + desc); + imgHandles.push_back(imgHandle); + } + + sycl::buffer outBuf{dataOut.data(), sycl::range{height, width}}; + sycl::buffer imgHandlesBuf{imgHandles.data(), sycl::range{numImages}}; + q.submit([&](sycl::handler &cgh) { + sycl::accessor outAcc{outBuf, cgh, sycl::write_only}; + sycl::accessor imgHandleAcc{imgHandlesBuf, cgh, sycl::read_only}; + + cgh.parallel_for( + sycl::nd_range<2>{{width, height}, {width, height}}, + [=](sycl::nd_item<2> it) { + size_t dim0 = it.get_local_id(0); + size_t dim1 = it.get_local_id(1); + + // Sum each image by reading via its handle + float sum = 0; + for (int i = 0; i < numImages; i++) { + // Extension: read image data from handle + sum += (sycl::ext::oneapi::experimental::read_image( + imgHandleAcc[i], sycl::vec(dim0, dim1))); + } + outAcc[sycl::id{dim1, dim0}] = sum; + }); + }); + + // Using image handles requires manual synchronization + q.wait_and_throw(); + + // Cleanup + for (int i = 0; i < numImages; i++) { + sycl::ext::oneapi::experimental::destroy_image_handle(ctxt, + imgHandles[i]); } - imgAllocations.push_back(img_mem); +} catch (sycl::exception e) { + std::cerr << "SYCL exception caught: " << e.what(); + exit(-1); } -// Copy over data to device for each image -for (int i = 0; i < numImages; i++) { - // Extension: copy over data to device - q.ext_image_copy(imgAllocations[i], dataIn.data(), desc); +// Validate that `dataOut` is correct +bool validated = (dataOut == dataExpected); +``` + +=== Reading a 1D mipmap with anisotropic filtering and levels +```cpp +// Set up device, queue, and context +sycl::device dev; +sycl::queue q(dev); +sycl::context ctxt = q.get_context(); + +// declare image data +constexpr size_t N = 16; +constexpr size_t width = N; +unsigned int num_levels = 2; +std::vector out(N); +std::vector expected(N); +std::vector dataIn1(N); +std::vector dataIn2(N / 2); +int j = 0; +for (int i = 0; i < N; i++) { + expected[i] = i + (j + 10); + if (i % 2) + j++; + dataIn1[i] = float4(i, i, i, i); + if (i < (N / 2)) + dataIn2[i] = float4(i + 10, i + 10, i + 10, i + 10); } -// Wait for copy operations to finish -q.wait(); +try { + + // Image descriptor -- number of levels + sycl::ext::oneapi::experimental::image_descriptor desc( + {width}, image_channel_order::rgba, image_channel_type::fp32, + sycl::ext::oneapi::experimental::image_type::Mipmap, num_levels); + + // Allocate the mipmap + sycl::ext::oneapi::experimental::image_mem mip_mem(ctxt, desc); + + // Retrieve level 0 + sycl::ext::oneapi::experimental::image_mem_handle img_mem1 = + sycl::ext::oneapi::experimental::get_mip_level(ctxt, mip_mem, 0); + + // Copy over data to level 0 + q.ext_oneapi_copy(dataIn1.data(), img_mem1, desc); + + // Copy over data to level 1 + // Copy function handles sizing + unsigned int level = 1; + q.ext_oneapi_copy(dataIn2.data(), mip_mem, desc, level); + q.wait_and_throw(); + + // Extended sampler object to take in mipmap attributes + sampler samp(coordinate_normalization_mode::normalized, + addressing_mode::mirrored_repeat, filtering_mode::nearest, + mipmap_filtering_mode::nearest, 0.0f, (float)num_levels, 8.0f); + + // Create a sampled image handle to represent the mipmap + sycl::ext::oneapi::experimental::sampled_image_handle mipHandle = + sycl::ext::oneapi::experimental::create_image(ctxt, mip_mem, samp, + desc); + q.wait_and_throw(); + + buffer buf((float *)out.data(), N); + q.submit([&](handler &cgh) { + auto outAcc = buf.get_access(cgh, N); + + cgh.parallel_for(N, [=](id<1> id) { + float sum = 0; + float x = float(id[0] + 0.5) / (float)N; + // Read mipmap level 0 with anisotropic filtering + // and level 1 with level filtering + float4 px1 = sycl::ext::oneapi::experimental::read_image( + mipHandle, x, 0.0f, 0.0f); + float4 px2 = sycl::ext::oneapi::experimental::read_image( + mipHandle, x, 1.0f); + + sum = px1[0] + px2[0]; + outAcc[id] = sum; + }); + }); -// Create the images and return the handles -std::vector imgHandles; -for (int i = 0; i < numImages; i++) { - // Extension: create the image and return the handle - sycl::ext::oneapi::unsampled_image_handle imgHandle = - sycl::ext::oneapi::create_image(ctxt, imgAllocations[i], desc); - imgHandles.push_back(imgHandle); + q.wait_and_throw(); + + // Cleanup + sycl::ext::oneapi::experimental::destroy_image_handle(ctxt, mipHandle); + +} catch (sycl::exception e) { + std::cerr << "SYCL exception caught! : " << e.what() << "\n"; + exit(-1); +} catch (...) { + std::cerr << "Unknown exception caught!\n"; + exit(-1); } -// Cuda stores data in column-major fashion -// SYCL deals with indexing in row-major fashion -// Reverse output buffer dimensions and access to convert -// the cuda column-major data back to row-major -buffer buf((float *)out.data(), range<2>{height, width}); -buffer imgHandlesBuf( - imgHandles.data(), range<1>{numImages}); -q.submit([&](handler &cgh) { - auto outAcc = - buf.get_access(cgh, range<2>{height, width}); - - auto imgHandleAcc = - imgHandlesBuf.get_access(cgh, range<1>{numImages}); - - cgh.parallel_for( - nd_range<2>{{width, height}, {width, height}}, [=](nd_item<2> it) { - size_t dim0 = it.get_local_id(0); - size_t dim1 = it.get_local_id(1); - - // Sum each image by reading their handle - float sum = 0; - for (int i = 0; i < numImages; i++) { - // Extension: read image data from handle - sum += (sycl::ext::oneapi::read_image( - imgHandleAcc[i], int2(dim0, dim1)))[0]; - } - outAcc[id<2>{dim1, dim0}] = sum; - }); -}); - -// Using image handles requires manual synchronization -q.wait_and_throw(); - -// Cleanup -for (int i = 0; i < numImages; i++) { - sycl::ext::oneapi::destroy_image_handle(ctxt, imgHandles[i]); - sycl::ext::oneapi::free_image(ctxt, imgAllocations[i]); +// Validate that `out` is correct +bool validated = (dataOut == dataExpected); +``` + +=== Using imported memory and semaphore objects + +```c++ +// Set up device, queue, and context +sycl::device dev; +sycl::queue q(dev); +auto ctxt = q.get_context(); + +size_t width = /* passed from external API */; +size_t height = /* passed from external API */; + +sycl::image_channel_order channel_order = /* mapped from external API */ + /* we assume sycl::image_channel_order::rgba */; + +sycl::image_channel_type channel_type = /* mapped from external API */ + /* we assume sycl::image_channel_type::unsigned_int32 */;; + +// Image descriptor - mapped to external API image layout +// with `image_type::interop` +sycl::ext::oneapi::experimental::image_descriptor desc( + {width, height}, channel_order, channel_type, + sycl::ext::oneapi::experimental::image_type::interop); + +size_t img_size_in_bytes = width * height * sizeof(sycl::vec); + +int external_input_image_file_descriptor = /* passed from external API */ +int external_output_image_file_descriptor = /* passed from external API */ + +// Extension: populate external memory descriptors +sycl::ext::oneapi::experimental::external_mem_descriptor< + sycl::ext::oneapi::experimental::external_mem_fd> + input_ext_mem_desc{external_input_image_file_descriptor, + img_size_in_bytes}; + +sycl::ext::oneapi::experimental::external_mem_descriptor< + sycl::ext::oneapi::experimental::external_mem_fd> + output_ext_mem_desc{external_output_image_file_descriptor, + img_size_in_bytes}; + +// An external API semaphore will signal this semaphore before our SYCL commands +// can begin execution +int wait_semaphore_file_descriptor = /* passed from external API */; + +// An external API will wait on this semaphore to be signalled by us before it +// can execute some commands +int done_semaphore_file_descriptor = /* passed from external API */; + +// Extension: populate external semaphore descriptor. +// We assume POSIX file descriptor resource types +sycl::ext::oneapi::experimental::external_semaphore_descriptor< + sycl::ext::oneapi::experimental::external_semaphore_fd> + wait_external_semaphore_desc{wait_semaphore_file_descriptor}; + +sycl::ext::oneapi::experimental::external_semaphore_descriptor< + sycl::ext::oneapi::experimental::external_semaphore_fd> + done_external_semaphore_desc{done_semaphore_file_descriptor}; + +try { + // Extension: import external semaphores + sycl::ext::oneapi::experimental::interop_semaphore_handle + wait_interop_semaphore_handle = + sycl::ext::oneapi::experimental::import_external_semaphore( + ctxt, wait_external_semaphore_desc); + + sycl::ext::oneapi::experimental::interop_semaphore_handle + done_interop_semaphore_handle = + sycl::ext::oneapi::experimental::import_external_semaphore( + ctxt, done_external_semaphore_desc); + + // Extension: import external memory from descriptors + sycl::ext::oneapi::experimental::interop_mem_handle + input_interop_mem_handle = + sycl::ext::oneapi::experimental::import_external_memory( + ctxt, input_ext_mem_desc); + + sycl::ext::oneapi::experimental::interop_mem_handle + output_interop_mem_handle = + sycl::ext::oneapi::experimental::import_external_memory( + ctxt, output_ext_mem_desc); + + // Extension: map imported external memory to image memory + sycl::ext::oneapi::experimental::image_mem_handle input_mapped_mem_handle = + sycl::ext::oneapi::experimental::map_external_memory_array( + ctxt, input_interop_mem_handle, desc); + sycl::ext::oneapi::experimental::image_mem_handle output_mapped_mem_handle = + sycl::ext::oneapi::experimental::map_external_memory_array( + ctxt, output_interop_mem_handle, desc); + + // Extension: create images from mapped memory and return the handles + sycl::ext::oneapi::experimental::unsampled_image_handle img_input = + sycl::ext::oneapi::experimental::create_image( + ctxt, input_mapped_mem_handle, desc); + sycl::ext::oneapi::experimental::unsampled_image_handle img_output = + sycl::ext::oneapi::experimental::create_image( + ctxt, output_mapped_mem_handle, desc); + + // Extension: wait for imported semaphore + q.ext_oneapi_wait_external_semaphore(wait_interop_semaphore_handle) + + // Submit our kernel that depends on imported "wait_semaphore_file_descriptor" + q.submit([&](sycl::handler &cgh) { + cgh.parallel_for<>( + sycl::nd_range<2>{{width, height}, {32, 32}}, + [=](sycl::nd_item<2> it) { + size_t dim0 = it.get_global_id(0); + size_t dim1 = it.get_global_id(1); + + // Extension: read image data from handle to imported image + sycl::uint4 pixel = + sycl::ext::oneapi::experimental::read_image( + img_input, sycl::int2(dim0, dim1)); + + // Modify the data before writing back + pixel *= 10; + + // Extension: write image data using handle to imported image + sycl::ext::oneapi::experimental::write_image( + img_output, sycl::int2(dim0, dim1), pixel); + }); + }); + + // Extension: signal imported semaphore + q.ext_oneapi_signal_external_semaphore(done_interop_semaphore_handle) + + // The external API can now use the semaphore it exported to + // "done_semaphore_file_descriptor" to schedule its own command submissions + + q.wait_and_throw(); + + // Extension: destroy all external resources + sycl::ext::oneapi::experimental::destroy_external_memory( + ctxt, input_interop_mem_handle); + sycl::ext::oneapi::experimental::destroy_external_memory( + ctxt, output_interop_mem_handle); + sycl::ext::oneapi::experimental::destroy_external_semaphore( + ctxt, wait_interop_semaphore_handle); + sycl::ext::oneapi::experimental::destroy_external_semaphore( + ctxt, done_interop_semaphore_handle); + sycl::ext::oneapi::experimental::destroy_image_handle(ctxt, img_input); + sycl::ext::oneapi::experimental::destroy_image_handle(ctxt, img_output); +} catch (sycl::exception e) { + std::cerr << "SYCL exception caught! : " << e.what() << "\n"; + exit(-1); +} catch (...) { + std::cerr << "Unknown exception caught!\n"; + exit(-1); } ``` @@ -501,16 +1759,15 @@ synchronize operations. === Limitations when using USM as image memory -When USM is used as the image memory, these are the general limitations: - -* Not possible to write images, just read. - -Then there are dimension specific limitations: +There are dimension specific limitations: * 1D - Linear interpolation not possible in the CUDA backend. - A workaround is to allocate 2D pitched memory with a height of 1. -* 2D - Only works with `pitched_alloc_device`. -* 3D - No support at the moment. + A workaround is to allocate 2D pitched memory with a height of 1. +* 2D - There are some alignment restrictions. See the "Pitch alignment + restrictions and queries" section, or use `pitched_alloc_device` to + allocate 2D USM image memory. +* 3D - No support at the moment. Possible support in non CUDA backends in the + future. A query is present to check 3D USM capabilities === Not supported yet @@ -518,7 +1775,6 @@ These features still need to be handled: * Level Zero and SPIR-V support * Mipmapping -* Interop with various backends * etc. == Revision History @@ -537,5 +1793,34 @@ These features still need to be handled: - Added image memory information getters to reflect `cuArray3DGetDescriptor` functionality -|====================== +|3 |2023-03-30 | - Some text clarifications. + + - Unsampled images can no longer be created from USM. + + - Added SYCL 1.2.1 `image_channel_order` and + `image_channel_type` structs. + + - Added `image_type` to enable construction of layered, + mipmap, and cubemap images in the future. + + - Added device information descriptors for querying pitched + allocation size and alignment requirement. + + - Added `ext_oneapi_copy` methods for the `sycl::handler`. + - `ext_oneapi_copy` functions now take the `Src` as the first + parameter. + + - Created `image_mem` as a RAII style class. + + - Renamed `allocate_image` to `alloc_image_mem` + + - `pitched_alloc_device` can now take an `image_descriptor`. + + - Added interoperability features + + - Added support to query bindless image and interoperability + capabilities + + - Added mipmap support +|====================== From b83425717745b8ab9e28bb3c1d866c02a59052f1 Mon Sep 17 00:00:00 2001 From: Przemek Malon Date: Thu, 22 Jun 2023 14:53:26 +0100 Subject: [PATCH 4/6] [SYCL][Doc] Bindless images proposal revision 4 Co-authored-by: Isaac Ault isaac.ault@codeplay.com Co-authored-by: Przemek Malon przemek.malon@codeplay.com Co-authored-by: Sean Stirling sean.stirling@codeplay.com Co-authored-by: Peter Zuzek peter@codeplay.com - Added `sycl::device` parameter to multiple functions to clarify that images must be created and used on the same device. - Changed naming and order of some parameters to be consistent throughout the proposal and with core SYCL. - Added variants of functions that take a `sycl::queue` instead of both `sycl::device` and `sycl::context`. - Removed standalone wait and signal semaphore functions. These should always go through the queue or handler methods. - Removed `get_image_handle` and `get_sampler_handle` functions from sampled and unsampled image handle structs. The structs have public handle members that can be retrieved without getters. - Made all enum types and values unspecified - Moved support queries to device aspects, improved naming of queries for better consistency, and moved device info queries to the experimental namespace. - Added `get_mip_level_desc` member function to `image_descriptor` - Fixed `get_mip_level_mem_handle` prototype in `image_mem`, and added a standalone function. - Removed `ext_oneapi_copy` variants that take `image_mem`, the user should retrieve the raw handle and pass that themselves. - Removed `ext_oneapi_copy` variants that take a mip level, the user should retrieve individual mip level image handles themselves and pass that. - Added `ext_oneapi_copy` variants that take offsets and the extent, to enable sub-region copy. - Created a list of failure scenarios for `ext_oneapi_copy`, changed the failure error code to `errc::invalid`, and specified that the implementation should relay the reason for the failure back to the user. - Added notes on the exception of using the SYCL 1.2.1 `sampler` class, and reason for not using the SYCL 2020 `image_sampler` struct. - Updated code samples - Other general improvements --- .../sycl_ext_oneapi_bindless_images.asciidoc | 1272 ++++++++++------- 1 file changed, 717 insertions(+), 555 deletions(-) rename sycl/doc/extensions/{experimental => proposed}/sycl_ext_oneapi_bindless_images.asciidoc (57%) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bindless_images.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_bindless_images.asciidoc similarity index 57% rename from sycl/doc/extensions/experimental/sycl_ext_oneapi_bindless_images.asciidoc rename to sycl/doc/extensions/proposed/sycl_ext_oneapi_bindless_images.asciidoc index d4b66def79e44..1dde07e83f9b2 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_bindless_images.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_bindless_images.asciidoc @@ -1,4 +1,4 @@ -# Bindless Images +# sycl_ext_oneapi_bindless_images :source-highlighter: coderay :coderay-linenums-mode: table @@ -34,6 +34,12 @@ This extension is written against the SYCL 2020 revision 6 specification. All references below to the "core SYCL specification" or to section numbers in the SYCL specification refer to that revision. +[NOTE] +==== +One exception to this is use of the SYCL 1.2.1 `sycl::sampler` class. Please +refer to the issues section at the bottom of this document for reasons why we +do not yet use the SYCL 2020 `sycl::image_sampler` struct. +==== == Status @@ -119,6 +125,7 @@ implementation supports. |1 |Initial draft of the proposal |2 |Second revision of the proposal |3 |Third revision of the proposal +|4 |Fourth revision of the proposal |====================== See the revision history at the bottom of this document for features added in @@ -129,30 +136,29 @@ each revision. We provide the following device queries to retrieve information on whether a SYCL implementation provides support for various bindless image features. -The device information descriptors for these queries are: +The device aspects for these queries are: [frame="none",options="header"] |====================== -|Device descriptor |Return type |Description -|`info::device::ext_bindless_image_support` |`bool` | -Returns `true` if the device supports creation of bindless images backed by -the `image_mem` and `image_mem_handle` APIs. -|`info::device::ext_bindless_image_1D_USM_support` |`bool` | -Returns `true` if the device supports creation of 1D bindless images backed by -USM. -|`info::device::ext_bindless_image_2D_USM_support` |`bool` | -Returns `true` if the device supports creation of 2D bindless images backed by -USM. -|`info::device::ext_bindless_image_3D_USM_support` |`bool` | -Returns `true` if the device supports creation of 3D bindless images backed by -USM. +|Device descriptor |Description +|`aspect::ext_oneapi_bindless_images` | Indicates if the device supports +creation of bindless images backed by the `image_mem` and `image_mem_handle` +APIs. +|`aspect::ext_oneapi_bindless_images_shared_usm` | Indicates if the device +supports the creation of bindless images backed by shared USM memory. +|`aspect::ext_oneapi_bindless_images_1d_usm` | Indicates if the device supports +creation of 1D bindless images backed by USM. +|`aspect::ext_oneapi_bindless_images_2d_usm` | Indicates if the device supports +creation of 2D bindless images backed by USM. |====================== [NOTE] ==== -Not all SYCL backends may provide support for bindless images constructed from USM -memory with all dimensions (1D, 2D, 3D). As an example, CUDA does not have -native support for 3D image resources constructed from USM. +Not all SYCL backends may provide support for bindless images constructed from +USM memory with all dimensions. As an example, CUDA does not have +native support for 3D image resources constructed from USM. In the future, some +backends may support this, and this proposal may be updated to allow creation +of 3D USM images. ==== === Image descriptor @@ -160,45 +166,45 @@ native support for 3D image resources constructed from USM. ```cpp namespace sycl::ext::oneapi::experimental { -enum class image_channel_order : unsigned int { - a = 0, - r = 1, - rx = 2, - rg = 3, - rgx = 4, - ra = 5, - rgb = 6, - rgbx = 7, - rgba = 8, - argb = 9, - bgra = 10, - intensity = 11, - luminance = 12, - abgr = 13, +enum class image_channel_order : /* unspecified */ { + a, + r, + rx, + rg, + rgx, + ra, + rgb, + rgbx, + rgba, + argb, + bgra, + intensity, + luminance, + abgr, }; -enum class image_channel_type : unsigned int { - snorm_int8 = 0, - snorm_int16 = 1, - unorm_int8 = 2, - unorm_int16 = 3, - unorm_short_565 = 4, - unorm_short_555 = 5, - unorm_int_101010 = 6, - signed_int8 = 7, - signed_int16 = 8, - signed_int32 = 9, - unsigned_int8 = 10, - unsigned_int16 = 11, - unsigned_int32 = 12, - fp16 = 13, - fp32 = 14, +enum class image_channel_type : /* unspecified */ { + snorm_int8, + snorm_int16, + unorm_int8, + unorm_int16, + unorm_short_565, + unorm_short_555, + unorm_int_101010, + signed_int8, + signed_int16, + signed_int32, + unsigned_int8, + unsigned_int16, + unsigned_int32, + fp16, + fp32, }; -enum class image_type : unsigned int { - standard = 0, - mipmap = 1, - interop = 2, +enum class image_type : /* unspecified */ { + standard, + mipmap, + interop, }; struct image_descriptor { @@ -210,28 +216,30 @@ struct image_descriptor { image_type type; unsigned int num_levels; - image_descriptor(sycl::range<1> dims, sycl::image_channel_order channel_order, - sycl::image_channel_type channel_type, + image_descriptor(sycl::range<1> dims, image_channel_order channel_order, + image_channel_type channel_type, image_type type = image_type::standard, unsigned int num_levels = 1); - image_descriptor(sycl::range<2> dims, sycl::image_channel_order channel_order, - sycl::image_channel_type channel_type, + image_descriptor(sycl::range<2> dims, image_channel_order channel_order, + image_channel_type channel_type, image_type type = image_type::standard, unsigned int num_levels = 1); - image_descriptor(sycl::range<3> dims, sycl::image_channel_order channel_order, - sycl::image_channel_type channel_type, + image_descriptor(sycl::range<3> dims, image_channel_order channel_order, + image_channel_type channel_type, image_type type = image_type::standard, unsigned int num_levels = 1); + + image_descriptor get_mip_level_desc(unsigned int level) const; }; } ``` The image descriptor represents the image dimensions, channel type, and channel -order. A `type` member is also present to allow for implementation of mipmapped -and interop images. +order. An `image_type` member is also present to allow for implementation of +mipmapped and interop images. The `image_descriptor` shall be default constructible and follow by-value semantics. @@ -248,11 +256,13 @@ and to avoid combinatorial complexity. For the `standard` image type, the value of `num_levels` must be `1`. -Only mipmap image types support more than one level. - The `type` member will inform the implementation of the type of image to create, allocate, or free. +Only mipmap image types support more than one level. For mipmap images, the +member function `get_mip_level_desc` will return an `image_descriptor` for +a given level of a mipmap, with valid dimension values for that level, and the +type of the returned `image_descriptor` will be `image_type::standard`. === Allocating image memory @@ -271,20 +281,25 @@ struct image_mem_handle { } class image_mem { - +public: image_mem(); - image_mem(const image_mem &) = delete; // no copy-construct + image_mem(const image_mem &) = delete; - image_mem(image_mem &&rhs) noexcept; // move-constructor is allowed - image_mem(const sycl::context &syclContext, const image_descriptor &desc); + image_mem(image_mem &&rhs) noexcept; + image_mem(const image_descriptor &imageDesc, + const sycl::device &syclDevice, + const sycl::context &syclContext); + image_mem(const image_descriptor &imageDesc, + const sycl::queue &syclQueue); ~image_mem(); - image_mem &operator=(image_mem &&); // move-assignment is allowed - image_mem &operator=(image_mem &) = delete; // no copy-assignment + image_mem &operator=(image_mem &&); + image_mem &operator=(image_mem &) = delete; image_mem_handle get_handle() const; image_descriptor get_descriptor() const; + sycl::device get_device() const; sycl::context get_context() const; sycl::range<3> get_range() const; @@ -293,20 +308,30 @@ class image_mem { unsigned int get_image_num_channels() const; image_type get_type() const; - image_mem_handle get_mip_level() const; + image_mem_handle get_mip_level_mem_handle(unsigned int level) const; }; -image_mem_handle alloc_image_mem(const context& syclContext, - const image_descriptor &desc); -void free_image_mem(const context& syclContext, - image_mem_handle memHandle - image_type imgType); +image_mem_handle alloc_image_mem(const image_descriptor &imageDesc, + const sycl::device &syclDevice, + const sycl::context &syclContext); +image_mem_handle alloc_image_mem(const image_descriptor &imageDesc, + const sycl::queue &syclQueue); + +void free_image_mem(image_mem_handle memHandle, + image_type imageType, + const sycl::device &syclDevice, + const sycl::context &syclContext); +void free_image_mem(image_mem_handle memHandle, + image_type imageType, + const sycl::queue &syclQueue); } ``` The first method of allocating device memory for images is through -`alloc_image_mem`. This takes an `image_descriptor` and `sycl::context` to -allocate device memory, appropriately sized based on the `image_descriptor`. +`alloc_image_mem`. This takes a `sycl::device`, `sycl::context`, +and `image_descriptor` to allocate device memory, appropriately sized +based on the `image_descriptor`. Alternatively, we can also pass a +`sycl::queue` instead of both `sycl::device` and `sycl::context`. Memory allocated in this way requires the user to free that memory after all operations using the memory are completed and no more operations operating on @@ -314,25 +339,23 @@ the memory will be scheduled. This is done using `free_image_mem`. An `image_type` should be passed to `free_image_mem` to inform the implementation of the type of memory to be freed. -The `image_mem` class is a RAII class wrapper that performs allocation and -deallocation of device memory. +The second method involves the `image_mem` class, which is a RAII class wrapper +that performs allocation and deallocation of device memory. The default constructor does not allocate any memory on the device and the resulting `image_mem` object is in an uninitialized state. -The constructor `image_mem(const sycl::context &, const image_descriptor &)` -is a wrapper for `alloc_image_mem` functionality. - -The destructor is a wrapper for `free_image_mem` functionality. +the constructor is a wrapper for `alloc_image_mem` functionality. +The destructor is a wrapper for `free_image_mem` functionality. `image_mem` also provides some functions to get various properties of the -image memory allocation such as the range, channel type, channel order, number -of channels, number of levels, and image type. +image memory allocation such as the image range, channel type, channel order, +number of channels, number of levels, and image type. -In the case where a mipmap has been allocated, `get_mip_level` can be used to -return an `image_mem_handle` to a specific level of the mipmap. This can then -be used to copy data to that specific level or create an image handle based on -that level. +In the case where a mipmap has been allocated, `get_mip_level_mem_handle` can +be used to return an `image_mem_handle` to a specific level of the mipmap. This +can then be used to copy data to that specific level or create an image handle +based on that level. Note that the handle type `image_mem_handle::raw_handle_type` is an opaque type, and the handle cannot be dereferenced on the host. The layout of the memory is @@ -346,7 +369,7 @@ provide a move-assignment operator. `image_mem_handle` shall be default-constructible and copy-constructible. `image_mem_handle` shall not be device-copyable. -If the construction or destruction of the `image_mem` class fails, a +If the construction of the `image_mem` class fails, a `sycl::exception` with error code `sycl::errc::memory_allocation` will be thrown. @@ -366,21 +389,29 @@ using the `image_mem_alloc` function. These are similar to the member functions provided by `image_mem`. However, since the `image_mem_handle` is a minimal struct representing just the opaque handle the underlying memory object, there is some information that we cannot retrieve from it, namely the `image_type`, -`image_channel_order`, the `sycl::context` the memory was allocated in, and the -`image_descriptor` used to allocate the memory. +`image_channel_order`, the `sycl::context` or `sycl::device` the memory was +allocated in, and the `image_descriptor` used to allocate the memory. ```cpp namespace sycl::ext::oneapi { -sycl::range<3> get_image_range(const sycl::context &syclContext, - const image_mem_handle mem_handle); +sycl::range<3> get_image_range(const image_mem_handle memHandle, + const sycl::device &syclDevice, + const sycl::context &syclContext); sycl::image_channel_type -get_image_channel_type(const sycl::context &syclContext, - const image_mem_handle mem_handle); - -unsigned int get_image_num_channels(const sycl::context &syclContext, - const image_mem_handle mem_handle); +get_image_channel_type(const image_mem_handle memHandle, + const sycl::device &syclDevice, + const sycl::context &syclContext); + +unsigned int get_image_num_channels(const image_mem_handle memHandle, + const sycl::device &syclDevice, + const sycl::context &syclContext); + +image_mem_handle get_mip_level_mem_handle(const image_mem_handle mipMemHandle, + unsigned int level, + const sycl::device &syclDevice, + const sycl::context &syclContext); } ``` @@ -397,63 +428,71 @@ pitched memory allocation, through `pitched_alloc_device`. ```cpp namespace sycl::ext::oneapi::experimental { -void *pitched_alloc_device(size_t *ret_pitch, const image_descriptor &desc, +void *pitched_alloc_device(size_t *retRowPitch, + size_t widthInBytes, size_t height, + unsigned int elementSizeBytes, + const sycl::queue &syclQueue); +void *pitched_alloc_device(size_t *retRowPitch + size_t widthInBytes, size_t height, + unsigned int elementSizeBytes, + const sycl::device &syclDevice, + const sycl::context &syclContext); + +void *pitched_alloc_device(size_t *resultPitch, + const image_descriptor &desc, const sycl::queue &queue); -void *pitched_alloc_device(size_t *ret_pitch, const image_descriptor &desc, - const sycl::device &dev, const sycl::context &ctxt); - -void *pitched_alloc_device(size_t *ret_pitch, size_t width_in_bytes, - size_t height, unsigned int element_size_bytes, - const sycl::queue &queue); -void *pitched_alloc_device(size_t *ret_pitch, size_t width_in_bytes, - size_t height, unsigned int element_size_bytes, - const sycl::device &dev, const sycl::context &ctxt); +void *pitched_alloc_device(size_t *resultPitch, + const image_descriptor &desc, + const sycl::device &syclDevice, + const sycl::context &syclContext); } ``` This function will allocate a memory region aimed to be used for -two-dimensional images. It allocates memory that is guaranteed to adhere to the -device's alignment requirements for 2D USM images. +two-dimensional images. It allocates memory that is guaranteed to +adhere to the device's alignment requirements for USM images. -If the user does not wish to use `pitched_alloc_device` to allocate USM 2D -images, but prefers to use another USM allocation function instead, then that -allocation must adhere to some alignment restrictions. These restrictions are -device specific, and queries for them can be found in the "Pitch alignment -restrictions and queries" section below. +If the user does not wish to use `pitched_alloc_device` to allocate +two-dimensional USM images, but prefers to use another USM allocation +function instead, then that allocation must adhere to some alignment +restrictions. These restrictions are device specific, and queries for them can +be found in the "Pitch alignment restrictions and queries" section below. If the allocation of pitched memory fails, `pitched_alloc_device` will throw a `sycl::exception` with error code `sycl::errc::memory_allocation`. === Pitch alignment restrictions and queries -For the purposes of this document, the pitch of an image memory allocation is -the distance in bytes between the first elements of adjacent rows of the image. -Some devices may require two-dimensional images to be allocated with specific -alignments for their width and pitch. The `pitched_alloc_device` API intends to -make allocation of USM memory adhering to these restrictions easy, returning an -appropriate pitch value to the user. However, if a user wishes to use another -USM allocation function, they must be aware of these restrictions, and query -the device to ensure the allocations they wish to use adhere to those -restrictions. +For the purposes of this document, the row pitch of an image memory allocation +is the distance in bytes between the first elements of adjacent rows of the +image. Some devices may require two-dimensional USM images to be allocated with +specific alignments for their width and pitch values. The `pitched_alloc_device` +API intends to make allocation of USM memory adhering to these restrictions +easy, returning the appropriate pitch value to the user. However, if a user +wishes to use another USM allocation function, they must be aware of these +restrictions, and query the device to ensure the allocations they wish to use +adhere to those restrictions. This proposal provides a number of additional device queries that enable the -user to allocate appropriate USM memory for two-dimensional images. +user to allocate appropriate pitched USM memory for two-dimensional +images. One-dimensional images do not require any pitch values. The device information descriptors for these queries are: [frame="none",options="header"] |====================== |Device descriptor |Return type |Description -|`info::device::ext_oneapi_texture_pitch_align` |`uint32_t` | +|`ext::oneapi::experimental::info::device::image_row_pitch_align` |`uint32_t` | Returns the required alignment of the pitch between two rows of an image in -bytes. -|`info::device::ext_oneapi_max_texture_linear_width` |`size_t` | +bytes for images allocated using USM. +|`ext::oneapi::experimental::info::device::max_image_linear_width` |`size_t` | Returns the maximum linear width allowed for images allocated using USM. -|`info::device::ext_oneapi_max_texture_linear_height` |`size_t` | +|`ext::oneapi::experimental::info::device::max_image_linear_height` |`size_t` | Returns the maximum linear height allowed for images allocated using USM. -|`info::device::ext_oneapi_max_texture_linear_pitch` |`size_t` | -Returns the maximum linear pitch allowed for images allocated using USM. +|`ext::oneapi::experimental::info::device::max_image_linear_row_pitch` +|`size_t` | Returns the maximum linear row pitch allowed for images allocated +using USM. |====================== === Obtaining a handle to the image @@ -466,8 +505,6 @@ namespace sycl::ext::oneapi::experimental { /// Opaque unsampled image handle type. struct unsampled_image_handle { using raw_image_handle_type = /* Implementation defined */; - - raw_image_handle_type get_image_handle() const; raw_image_handle_type image_handle; }; @@ -478,45 +515,71 @@ struct sampled_image_handle { raw_image_handle_type image_handle; raw_sampler_handle_type sampler_handle; - - raw_image_handle_type get_image_handle() const; - raw_sampler_handle_type get_sampler_handle() const; }; // Creating an unsampled image from an `image_mem_handle` -unsampled_image_handle create_image(const sycl::context &syclContext, - image_mem_handle memHandle, - const image_descriptor &desc); +unsampled_image_handle create_image(image_mem_handle memHandle, + const image_descriptor &desc, + const sycl::device &syclDevice, + const sycl::context &syclContext); +unsampled_image_handle create_image(image_mem_handle memHandle, + const image_descriptor &desc, + const sycl::queue &syclQueue); // Creating a sampled image from an `image_mem_handle` -sampled_image_handle create_image(const sycl::context &syclContext, - image_mem_handle memHandle, +sampled_image_handle create_image(image_mem_handle memHandle, const image_descriptor &desc, - const sampler &sampler); + const sycl::sampler &syclSampler, + const sycl::device &syclDevice, + const sycl::context &syclContext); +sampled_image_handle create_image(image_mem_handle memHandle, + const image_descriptor &desc, + const sycl::sampler &syclSampler, + const sycl::queue &syclQueue); // Creating an unsampled image from an `image_mem` object -unsampled_image_handle create_image(const sycl::context &syclContext, - const image_mem &memHandle, - const image_descriptor &desc); +unsampled_image_handle create_image(const image_mem &memHandle, + const image_descriptor &desc, + const sycl::device &syclDevice, + const sycl::context &syclContext); +unsampled_image_handle create_image(const image_mem &memHandle, + const image_descriptor &desc, + const sycl::queue &syclQueue); // Creating a sampled image from an `image_mem` object -sampled_image_handle create_image(const sycl::context &syclContext, - const image_mem &memHandle, +sampled_image_handle create_image(const image_mem &memHandle, + const image_descriptor &desc, + const sycl::sampler &syclSampler, + const sycl::device &syclDevice, + const sycl::context &syclContext); +sampled_image_handle create_image(const image_mem &memHandle, const image_descriptor &desc, - const sampler &sampler); + const sycl::sampler &syclSampler, + const sycl::queue &syclQueue); // Creating a sampled image from a USM allocation and pitch -sampled_image_handle create_image(const sycl::context &syclContext, - const void *usmPtr, size_t pitch, +sampled_image_handle create_image(const void *usmPtr, size_t pitch, + const image_descriptor &desc, + const sycl::sampler &syclSampler, + const sycl::device &syclDevice, + const sycl::context &syclContext); +sampled_image_handle create_image(const void *usmPtr, size_t pitch, const image_descriptor &desc, - const sampler &sampler); + const sycl::sampler &syclSampler, + const sycl::queue &syclQueue); // Destroying an image handle -void destroy_image_handle(const sycl::context& syclContext, - sampled_image_handle& imageHandle); -void destroy_image_handle(const sycl::context& syclContext, - unsampled_image_handle& imageHandle); - +void destroy_image_handle(sampled_image_handle &imageHandle, + const sycl::device &syclDevice, + const sycl::context &syclContext); +void destroy_image_handle(sampled_image_handle &imageHandle, + const sycl::queue &syclQueue); + +void destroy_image_handle(unsampled_image_handle &imageHandle, + const sycl::device &syclDevice, + const sycl::context &syclContext); +void destroy_image_handle(unsampled_image_handle &imageHandle, + const sycl::queue &syclQueue); } ``` @@ -524,16 +587,17 @@ Once we have allocated memory, we can pass it into the `create_image` function to obtain a `sampled_image_handle` or `unsampled_image_handle`. These objects are opaque types that represent an image object. They can be captured by value into a SYCL kernel, or they can be passed in a -buffer as a dynamic array of images (see examples below). +buffer as a dynamic array of images (see examples at the bottom of this +document). -We can either provide a `sampler` or not when creating the image. Doing +We can either provide a `sycl::sampler` or not when creating the image. Doing so will create a `sampled_image_handle`, where otherwise an `unsampled_image_handle` would be returned. A `sampled_image_handle` should contain a raw sampler handle that will be used when sampling an image. Whether an `image_descriptor` or `void *` USM allocation was passed to -`create_image`, it must have been allocated in the same context as the one -passed to `create_image`. +`create_image`, it must have been allocated in the same context and on the same +device as the one passed to `create_image`. If we choose to create a 2D image from a USM allocation by passing a `void *`, we must also pass the pitch of the memory allocation. If the memory was @@ -544,7 +608,7 @@ adhere to device specific alignment restrictions. These restrictions and their queries are outlined in the section "Pitch alignment restrictions and queries" below. -The pitch is ignored for 1D images. +The pitch is ignored for 1D USM images. If the creation of an image fails, `create_image` will throw a `sycl::exception` with error code `sycl::errc::runtime`. @@ -570,135 +634,267 @@ namespace sycl { class handler { public: - // Host to Device using `image_mem_handle` - void ext_oneapi_copy(void *Src, - ext::oneapi::experimental::image_mem_handle Dest, - const ext::oneapi::experimental::image_descriptor &Desc); - - // Host to Device using `image_mem` - void ext_oneapi_copy(void *Src, - ext::oneapi::experimental::image_mem &Dest, - const ext::oneapi::experimental::image_descriptor &Desc); - - // Device to host using `image_mem_handle` - void ext_oneapi_copy(ext::oneapi::experimental::image_mem_handle Src, - void *Dest, - const ext::oneapi::experimental::image_descriptor &Desc); - // Device to host using `image_mem` - void ext_oneapi_copy(ext::oneapi::experimental::image_mem &Src, - void *Dest, - const ext::oneapi::experimental::image_descriptor &Desc); - - // Host to device OR device to host using USM device memory - void ext_oneapi_copy(void *Src, + // Simple host to device copy + void ext_oneapi_copy( + void *Src, + ext::oneapi::experimental::image_mem_handle Dest, + const ext::oneapi::experimental:image_descriptor &DestImgDesc); + + // Host to device copy with offsets and extent + void ext_oneapi_copy( + void *Src, + sycl::range<3> SrcOffset, + sycl::range<3> SrcExtent, + ext::oneapi::experimental::image_mem_handle Dest, + sycl::range<3> DestOffset, + const ext::oneapi::experimental::image_descriptor &DestImgDesc, + sycl::range<3> CopyExtent); + + // Simple device to host copy + void ext_oneapi_copy( + ext::oneapi::experimental::image_mem_handle Src, + void *Dest, + const ext::oneapi::experimental::image_descriptor &SrcImgDesc); + + // Device to host copy with offsets and extent + void ext_oneapi_copy( + ext::oneapi::experimental::image_mem_handle Src, + sycl::range<3> SrcOffset, + const ext::oneapi::experimental::image_descriptor &SrcImgDesc, + void *Dest, + sycl::range<3> DestOffset, + sycl::range<3> DestExtent, + sycl::range<3> CopyExtent); + + // Simple HtoD or DtoH copy with USM device memory + void ext_oneapi_copy(void *Src, void *Dest, const ext::oneapi::experimental::image_descriptor &Desc, - size_t Pitch); + size_t DeviceRowPitch); + + // HtoD or DtoH copy with USM device memory, using offsets, extent + void ext_oneapi_copy( + void *Src, + sycl::range<3> SrcOffset, + void *Dest, + sycl::range<3> DestOffset, + const ext::oneapi::experimental::image_descriptor &DeviceImgDesc, + size_t DeviceRowPitch, + sycl::range<3> HostExtent, + sycl::range<3> CopyExtent); }; class queue { public: - // Host to Device using `image_mem` - event ext_oneapi_copy(void *Src, - ext::oneapi::experimental::image_mem &Dest, - const ext::oneapi::experimental::image_descriptor &Desc); - event ext_oneapi_copy(void *Src, - ext::oneapi::experimental::image_mem &Dest, - const ext::oneapi::experimental::image_descriptor &Desc, - event DepEvent); - event ext_oneapi_copy(void *Src, - ext::oneapi::experimental::image_mem &Dest, - const ext::oneapi::experimental::image_descriptor &Desc, - const std::vector &DepEvents); - // Host to Device using `image_mem_handle` - event ext_oneapi_copy(void *Src, + // Simple host to device copy + event ext_oneapi_copy(void *Src, ext::oneapi::experimental::image_mem_handle Dest, - const ext::oneapi::experimental::image_descriptor &Desc); + const ext::oneapi::experimental::image_descriptor &DestImgDesc); event ext_oneapi_copy(void *Src, ext::oneapi::experimental::image_mem_handle Dest, - const ext::oneapi::experimental::image_descriptor &Desc, + const ext::oneapi::experimental::image_descriptor &DestImgDesc, event DepEvent); event ext_oneapi_copy(void *Src, ext::oneapi::experimental::image_mem_handle Dest, - const ext::oneapi::experimental::image_descriptor &Desc, + const ext::oneapi::experimental::image_descriptor &DestImgDesc, const std::vector &DepEvents); - // Device to host using `image_mem` - event ext_oneapi_copy(ext::oneapi::experimental::image_mem &Src, - void *Dest, - const ext::oneapi::experimental::image_descriptor &Desc); - event ext_oneapi_copy(ext::oneapi::experimental::image_mem &Src, - void *Dest, - const ext::oneapi::experimental::image_descriptor &Desc, - event DepEvent); - event ext_oneapi_copy(ext::oneapi::experimental::image_mem &Src, - void *Dest, - const ext::oneapi::experimental::image_descriptor &Desc, - const std::vector &DepEvents); + // Host to device copy with offsets and extent + event ext_oneapi_copy( + void *Src, + range<3> SrcOffset, + range<3> SrcExtent, + ext::oneapi::experimental::image_mem_handle Dest, + range<3> DestOffset, + const ext::oneapi::experimental::image_descriptor &DestImgDesc, + range<3> CopyExtent); + event ext_oneapi_copy( + void *Src, + range<3> SrcOffset, + range<3> SrcExtent, + ext::oneapi::experimental::image_mem_handle Dest, + range<3> DestOffset, + const ext::oneapi::experimental::image_descriptor &DestImgDesc, + range<3> Extent, event DepEvent); + event ext_oneapi_copy( + void *Src, + range<3> SrcOffset, + range<3> SrcExtent, + ext::oneapi::experimental::image_mem_handle Dest, + range<3> DestOffset, + const ext::oneapi::experimental::image_descriptor &DestImgDesc, + range<3> CopyExtent, const std::vector &DepEvents); + + // Simple device to host copy + event ext_oneapi_copy( + ext::oneapi::experimental::image_mem_handle Src, + void *Dest, + const ext::oneapi::experimental::image_descriptor &Desc); + event ext_oneapi_copy( + ext::oneapi::experimental::image_mem_handle Src, + void *Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + event DepEvent); + event ext_oneapi_copy( + ext::oneapi::experimental::image_mem_handle Src, + void *Dest, + const ext::oneapi::experimental::image_descriptor &Desc, + const std::vector &DepEvents); - // Device to host using `image_mem_handle` - event ext_oneapi_copy(ext::oneapi::experimental::image_mem_handle Src, - void *Dest, - const ext::oneapi::experimental::image_descriptor &Desc); - event ext_oneapi_copy(ext::oneapi::experimental::image_mem_handle Src, - void *Dest, - const ext::oneapi::experimental::image_descriptor &Desc, - event DepEvent); - event ext_oneapi_copy(ext::oneapi::experimental::image_mem_handle Src, - void *Dest, - const ext::oneapi::experimental::image_descriptor &Desc, - const std::vector &DepEvents); + // Device to host copy with offsets and extent + event ext_oneapi_copy( + ext::oneapi::experimental::image_mem_handle Src, + range<3> SrcOffset, + const ext::oneapi::experimental::image_descriptor &SrcImgDesc, + void *Dest, + range<3> DestOffset, + range<3> DestExtent, + range<3> CopyExtent); + event ext_oneapi_copy( + ext::oneapi::experimental::image_mem_handle Src, + range<3> SrcOffset, + const ext::oneapi::experimental::image_descriptor &SrcImgDesc, + void *Dest, + range<3> DestOffset, + range<3> DestExtent, + range<3> CopyExtent, event DepEvent); + event ext_oneapi_copy( + ext::oneapi::experimental::image_mem_handle Src, + range<3> SrcOffset, + const ext::oneapi::experimental::image_descriptor &SrcImgDesc, + void *Dest, + range<3> DestOffset, + range<3> DestExtent, + range<3> CopyExtent, const std::vector &DepEvents); // Host to device OR device to host using USM device memory - event ext_oneapi_copy(void *Src, void *Dest, - const ext::oneapi::experimental::image_descriptor &Desc, - size_t DevicePitch); - event ext_oneapi_copy(void *Src, void *Dest, - const ext::oneapi::experimental::image_descriptor &Desc, - size_t DevicePitch, - event DepEvent); - event ext_oneapi_copy(void *Src, void *Dest, - const ext::oneapi::experimental::image_descriptor &Desc, - size_t DevicePitch, - const std::vector &DepEvents); + event ext_oneapi_copy( + void *Src, void *Dest, + const ext::oneapi::experimental::image_descriptor &DeviceImgDesc, + size_t DeviceRowPitch); + event ext_oneapi_copy( + void *Src, void *Dest, + const ext::oneapi::experimental::image_descriptor &DeviceImgDesc, + size_t DeviceRowPitch, + event DepEvent); + event ext_oneapi_copy( + void *Src, void *Dest, + const ext::oneapi::experimental::image_descriptor &DeviceImgDesc, + size_t DeviceRowPitch, + const std::vector &DepEvents); + // Host to device OR device to host using USM device memory, + // with control over sub-region + event ext_oneapi_copy( + void *Src, sycl::range<3> SrcOffset, + void *Dest, sycl::range<3> DestOffset, + const ext::oneapi::experimental::image_descriptor &DeviceImgDesc, + size_t DeviceRowPitch, + sycl::range<3> HostExtent, + sycl::range<3> CopyExtent); + event ext_oneapi_copy( + void *Src, sycl::range<3> SrcOffset, + void *Dest, sycl::range<3> DestOffset, + const ext::oneapi::experimental::image_descriptor &DeviceImgDesc, + size_t DeviceRowPitch, + sycl::range<3> HostExtent, + sycl::range<3> CopyExtent); + event ext_oneapi_copy( + void *Src, sycl::range<3> SrcOffset, + void *Dest, sycl::range<3> DestOffset, + const ext::oneapi::experimental::image_descriptor &DeviceImgDesc, + size_t DeviceRowPitch, + sycl::range<3> HostExtent, + sycl::range<3> CopyExtent); }; } ``` To enable the copying of images an `ext_oneapi_copy` function is proposed as a method of the queue and handler. It can be used to copy image memory, whether -allocated through USM or the `image_mem` API, from host to -device, or device to host. The image descriptor passed to the `ext_oneapi_copy` -API is used to determine the dimensions and size in memory of the image to copy. +allocated through USM or using an `image_mem_handle`, from host to +device, or device to host. For the `ext_oneapi_copy` variants that do not take +offsets and extents, the image descriptor passed to the `ext_oneapi_copy` API +is used to determine the pixel size, dimensions, and extent in memory of the +image to copy. If performing sub-region copy, the size of the memory region is +also determined by the offsets and extent passed. For images allocated using USM, existing SYCL functionality can be used to copy their memory, but we also provide `ext_oneapi_copy` functions that take USM pointers. If the image memory was allocated using `pitched_alloc_device`, -then the `DevicePitch` parameter must match that which was returned from -`pitched_alloc_device`. If the user opted to use another allocation function -then the `DevicePitch` parameter should be set to the width of the memory -allocation in bytes, and adhere to the alignment restrictions outlined in the +then the source and destination, row pitch parameter passed must match that +which was returned from `pitched_alloc_device`. If the user opted to +use another allocation function then the device pitch parameters must adhere to +the alignment restrictions outlined in the "Pitch alignment restrictions and queries" section. -When using `ext_oneapi_copy`, the user must ensure that the memory regions -accessed through `Dest` and `Src` have the same capacity. `ext_oneapi_copy` only -supports copying images of the same dimensions. Whether copying image memory to -the device through a USM `Dest` pointer, or an `image_mem`, the host -memory is always assumed to be tightly packed. Similarly, the host memory is -assumed to be packed when copying from device to host. +Unless performing a sub-region copy, the user must ensure that the memory +regions accessed through `Dest` and `Src` have the same capacity. + +Whether copying image memory to the device through a USM `Dest` pointer, or an +`image_mem_handle`, the host memory is always assumed to be tightly packed. +Similarly, the host memory is assumed to be packed when copying from device to +host. + +For the functions that take an `image_mem_handle`, the handle must have been +allocated within the same context and device of the `queue`. + +For the forms that take a USM pointer, the image memory must also have been +allocated within the same context and device of the `queue`. The USM memory +must be accessible on the queue's device. + +The `ext_oneapi_copy` function variants that don't take offsets and extents may +fail in the following scenarios: -For the forms that take `image_mem`, the handle must have been allocated -with the same context used to create the `queue`. +1. The `Src` and `Dest` memory was not allocated on the same device and +context of the queue. -For the forms that take a USM pointer, the image memory must have been allocated -with the same context used to create the `queue` and the USM memory must be -accessible on the queue's device. +2. The `Src` and `Dest` memory regions, where `Src` or `Dest` can be either +on the host or device, do not have the same memory capacity, where the capacity +is calculate from the `width`, `height`, `depth`, `channel_order`, and +`channel_type` members of the `image_descriptor` parameter. + +The `ext_oneapi_copy` function variants that do take offsets and extents may +fail in the following scenarios: + +1. The `Src` and `Dest` memory was not allocated on the same device and +context of the queue. + +2. The image descriptor passed does not match the image descriptor used to +allocate the image on the device. + +3. the `CopyExtent` describes a memory region larger than that which was +allocated on either the host or the device. + +4. The `HostExtent` describes a memory region larger than that which was +allocated on the host. + +5. The `SrcExtent` describes a memory region larger than that which was +allocated, where `Src` can be either the host or device. + +6. The `DestExtent` describes a memory region larger than that which was +allocated, where `Dest` can be either the host or device. + +7. If `SrcOffset + CopyExtent` moves the memory sub-region outside the bounds +of the memory described by `Src`, irrespective of whether `Src` is on the host +or the device. + +8. If `DestOffset + CopyExtent` moves the memory sub-region outside the bounds +of the memory described by `Dest`, irrespective of whether `Dest` is on the +host or the device. + +9. The `DeviceRowPitch` does not adhere to the alignment requirements +outlined in section "Pitch alignment restrictions and queries" + +10. The value of `DeviceRowPitch` is smaller than the width of the image on +the device. If copying of an image fails, `ext_oneapi_copy` will throw a `sycl::exception` -with error code `sycl::errc::runtime`. +with error code `sycl::errc::invalid`, and relay an error message back to the +user through `sycl::exception::what()`, describing which of the scenarios +listed above caused the failure. === Reading and writing inside the kernel @@ -722,12 +918,14 @@ Inside a kernel, it's possible to read an image via `read_image`, passing the image handle. For the form that takes `unsampled_image_handle`, image data will be fetched exactly as is in device memory. For the form that takes a `sampled_image_handle`, the image will be sampled according to the -`sampler` that was passed to the image upon construction, and is included -in the `sampled_image_handle` as `sampled_image_handle::raw_sampler_handle`. -The returned data will be of type `DataT`, which is specified by the user, and -should map to the type that the image was created with (a combination of -`image_channel_type` and `image_channel_order`). An example of a type resulting -from these two is shown below. +`sycl::sampler` that was passed to the image upon construction. The sampler +handle and is included in the `sampled_image_handle` as +`sampled_image_handle::raw_sampler_handle`. + +The returned data will be of templated type `DataT`, which is specified by the +user, and should map to the type that the image was created with (a combination +of `image_channel_type` and `image_channel_order`). An example of a type +resulting from these two is shown below. For reads or writes, `DataT` must correspond to the type specified in the `image_descriptor` when the image was created. For multi-channel types, the @@ -755,6 +953,11 @@ Note that coordinates for 3D images take a vector of size 4, not 3. Note also that all images must be used in either read-only or write-only fashion within a single kernel invocation; read/write images are not supported. +Note also that read-after-write functionality is not supported. Unsampled +images may be read from and written back to within the same kernel, however, +reading from that same image again will result in undefined behaviour. A new +kernel must be submitted for the written data to be accessible. + == Mipmapped images So far, we have described how to create and operate on standard bindless images. @@ -766,28 +969,34 @@ two-dimensional mipmapped image where the top-most level (`level==0`) image has a `width==16` and `height==16`, the succeeding level (`level==1`) in the mipmap will have sizes `width==8` and `height==8`. This pattern continues until either the final level has sizes of `width==1` and `height==1`, or the user-specified -maximum mip level has been reached. +maximum mip level has been reached (described by the `num_levels` member of +`image_descriptor`). === Querying mipmap support -We provide the following device queries to retrieve information on whether a -SYCL implementation provides support for various mipmap features. +We provide the following device queries to retrieve information on a SYCL +implementation of various mipmap features. + +The device aspect descriptors for these queries are: + +[frame="none",options="header"] +|====================== +|Device descriptor |Description +|`aspect::ext_oneapi_mipmap` | Indicates if the device supports allocating +mipmap resources. +|`aspect::ext_oneapi_mipmap_anisotropy` | Indicates if the device supports +sampling mipmap images with anisotropic filtering +|`aspect::ext_oneapi_mipmap_level_reference` | Indicates if the device supports +using images created from individual mipmap levels +|====================== The device information descriptors for these queries are: [frame="none",options="header"] |====================== |Device descriptor |Return type |Description -|`info::device::ext_oneapi_mipmap_support` |`bool` | -Return `true`` if the device supports allocating mipmap resources. -|`info::device::ext_oneapi_anisotropic_support` |`bool` | -Return `true`` if the device supports sampling mipmap images with anisotropic -filtering -|`info::device::ext_oneapi_max_anisotropy` |`float` | +|`ext::oneapi::experimental::info::device::mipmap_max_anisotropy` |`float` | Return the maximum anisotropic ratio supported by the device -|`info::device::ext_oneapi_mipmap_level_reference` |`bool` | -Return `true` if the device supports using images created from individual -mipmap levels |====================== === Allocation of mipmapped images @@ -802,17 +1011,18 @@ must be greater than `1`. Mipmap memory allocated this way requires the user to free that memory after all operations using the memory are completed and no more operations operating on -the memory will be scheduled. This is done using `free_image_mem`, passing `image_type::mipmap`. Importantly, individual levels of a mipmap must not be +the memory will be scheduled. This is done using `free_image_mem`, passing +`image_type::mipmap`. Importantly, individual levels of a mipmap must not be freed before calling `free_image_mem`. -The `image_mem` class acts as a RAII class wrapper that performs allocation and +The RAII class `image_mem` may also be used to perform allocation and deallocation of mipmap device memory. The constructor and destructor act as a wrapper for the functions `alloc_image_mem` and `free_image_mem` respectively. -When the underlying memory of `image_mem` is a mipmap, `get_mip_level` can be -used to return an `image_mem_handle` to a specific level of the mipmap. This can -then be used to copy data to that specific level or create an image based on -that level. +When the underlying memory of `image_mem` is a mipmap, +`get_mip_level_mem_handle` can be used to return an `image_mem_handle` to a +specific level of the mipmap. This can then be used to copy data to that +specific level or create an image based on that level. === Obtaining a handle to a mipmap image @@ -827,7 +1037,7 @@ anisotropic filtering. To sample a mipmap correctly, the extended attributes of this sampler must be defined. Attempting to create an `unsampled_image_handle` to a mipmap will result in a -`sycl::exception` with error code `sycl::errc::runtime` being thrown. +`sycl::exception` with error code `sycl::errc::runtime` being thrown. === Extended sampler object @@ -879,127 +1089,13 @@ with anisotropic filtering. === Copying mipmap image data -We provide two ways in which image data can be copied to mipmap levels. - -The first method is to retrieve the handle to an individual mipmap level using -`get_mip_level`, provided by `image_mem`, and copy via the previously described -copy functions - -The user must retrieve the `image_mem_handle` themselves for -the particular level and provide an `image_descriptor` which describes that -level. For instance, copying to the second mipmap level must take a descriptor -whose width, height, depth, are all half of the first mipmap level. Providing an -incorrect descriptor can lead to undefined behaviour or throw a -`sycl::exception` with error code `sycl::errc::runtime`. - -The second method is to use one of the provided `ext_oneapi_copy` functions on -the queue or handler, which take a `Level` parameter specifying which level the -user wishes to copy from device to host, or from host to device. - -For this method, the user only needs to provide an additional `Level` -parameter to the `ext_oneapi_copy` function. The descriptor passed to -`ext_oneapi_copy` would be used to calculate the appropriate mipmap level -dimensions. - -```c++ -namespace sycl { - -class handler { -public: - // Host to Device using `image_mem_handle` and a mipmap level - void ext_oneapi_copy(void *Src, - ext::oneapi::experimental::image_mem_handle Dest, - const ext::oneapi::experimental::image_descriptor &Desc, - unsigned int Level); - - // Host to Device using `image_mem` and a mipmap level - void ext_oneapi_copy(void *Src, - ext::oneapi::experimental::image_mem &Dest, - const ext::oneapi::experimental::image_descriptor &Desc, - unsigned int Level); - - // Device to host using `image_mem_handle` and a mipmap level - void ext_oneapi_copy(ext::oneapi::experimental::image_mem_handle Src, - void *Dest, - const ext::oneapi::experimental::image_descriptor &Desc, - unsigned int Level); - - // Device to host using `image_mem` and a mipmap level - void ext_oneapi_copy(ext::oneapi::experimental::image_mem &Src, - void *Dest, - const ext::oneapi::experimental::image_descriptor &Desc, - unsigned int Level); - - -}; - -class queue { - // Host to Device using `image_mem` and a mipmap level - event ext_oneapi_copy(void *Src, - ext::oneapi::experimental::image_mem &Dest, - const ext::oneapi::experimental::image_descriptor &Desc, - unsigned int Level); - event ext_oneapi_copy(void *Src, - ext::oneapi::experimental::image_mem &Dest, - const ext::oneapi::experimental::image_descriptor &Desc, - unsigned int Level, - event DepEvent); - event ext_oneapi_copy(void *Src, - ext::oneapi::experimental::image_mem &Dest, - const ext::oneapi::experimental::image_descriptor &Desc, - unsigned int Level, - const std::vector &DepEvents); - - // Host to Device using `image_mem_handle` - event ext_oneapi_copy(void *Src, - ext::oneapi::experimental::image_mem_handle Dest, - const ext::oneapi::experimental::image_descriptor &Desc, - unsigned int Level); - event ext_oneapi_copy(void *Src, - ext::oneapi::experimental::image_mem_handle Dest, - const ext::oneapi::experimental::image_descriptor &Desc, - unsigned int Level, - event DepEvent); - event ext_oneapi_copy(void *Src, - ext::oneapi::experimental::image_mem_handle Dest, - const ext::oneapi::experimental::image_descriptor &Desc, - unsigned int Level, - const std::vector &DepEvents); - - // Device to host using `image_mem` and a mipmap level - event ext_oneapi_copy(ext::oneapi::experimental::image_mem &Src, - void *Dest, - const ext::oneapi::experimental::image_descriptor &Desc, - unsigned int Level); - event ext_oneapi_copy(ext::oneapi::experimental::image_mem &Src, - void *Dest, - const ext::oneapi::experimental::image_descriptor &Desc, - unsigned int Level, - event DepEvent); - event ext_oneapi_copy(ext::oneapi::experimental::image_mem &Src, - void *Dest, - const ext::oneapi::experimental::image_descriptor &Desc, - unsigned int Level, - const std::vector &DepEvents); - - // Device to host using `image_mem_handle` and a mipmap level - event ext_oneapi_copy(ext::oneapi::experimental::image_mem_handle Src, - void *Dest, - const ext::oneapi::experimental::image_descriptor &Desc, - unsigned int Level); - event ext_oneapi_copy(ext::oneapi::experimental::image_mem_handle Src, - void *Dest, - const ext::oneapi::experimental::image_descriptor &Desc, - unsigned int Level, - event DepEvent); - event ext_oneapi_copy(ext::oneapi::experimental::image_mem_handle Src, - void *Dest, - const ext::oneapi::experimental::image_descriptor &Desc, - unsigned int Level, - const std::vector &DepEvents); -}; -} -``` +In order to copy to or from mipmaps, the user should retrieve an individual +level's `image_mem_handle` through `image_mem::get_mip_level_mem_handle`, which +can then be passed to `ext_oneapi_copy`. The user must ensure that the image +descriptor passed to `ext_oneapi_copy` is dimensioned correctly for the mip +level being copied to/from. The provided `image_descriptor::get_mip_level_desc` +allows the user to retrieve a correctly dimensioned image descriptor for any +level of a given top-level descriptor. === Reading a mipmap @@ -1014,7 +1110,7 @@ mipmap. ```c++ // Nearest/linear filtering between mip levels template -DataT read_image(const sample_image_handle &ImageHandle, +DataT read_image(const sampled_image_handle &ImageHandle, const CoordT &Coords, const float Level); @@ -1032,19 +1128,19 @@ DataT read_image(const sampled_image_handle &ImageHandle, We provide the following device queries to retrieve information on whether a SYCL implementation provides support for various interoperability features. -The device information descriptors for these queries are: +The device aspect descriptors for these queries are: [frame="none",options="header"] |====================== -|Device descriptor |Return type |Description -|`info::device::ext_oneapi_interop_memory_import_support` |`bool` | -Return true if the device supports importing external memory resources. -|`info::device::ext_oneapi_interop_memory_export_support` |`bool` | -Return true if the device supports exporting internal memory resources. -|`info::device::ext_oneapi_interop_semaphore_import_support` |`bool` | -Return true if the device supports importing external semaphore resources. -|`info::device::ext_oneapi_interop_semaphore_export_support` |`bool` | -Return true if the device supports exporting internal event resources. +|Device descriptor |Description +|`aspect::ext_oneapi_interop_memory_import` | Indicates if the device supports +importing external memory resources. +|`aspect::ext_oneapi_interop_memory_export` | Indicates if the device supports +exporting internal memory resources. +|`aspect::ext_oneapi_interop_semaphore_import`` | Indicates if the device +supports importing external semaphore resources. +|`aspect::ext_oneapi_interop_semaphore_export` | Indicates if the device +supports exporting internal event resources. |====================== @@ -1114,28 +1210,40 @@ struct interop_mem_handle { template interop_mem_handle import_external_memory( - const sycl::context &syclContext, - external_mem_descriptor externalMemDescriptor); + external_mem_descriptor externalMemDescriptor, + const sycl::device &syclDevice, + const sycl::context &syclContext); +template +interop_mem_handle import_external_memory( + external_mem_descriptor externalMemDescriptor, + const sycl::queue &syclQueue); image_mem_handle map_external_memory_array( - const sycl::context &syclContext, interop_mem_handle interopMemHandle, - const image_descriptor &imageDescriptor); + const image_descriptor &imageDescriptor, + const sycl::device &syclDevice, + const sycl::context &syclContext); +image_mem_handle map_external_memory_array( + interop_mem_handle interopMemHandle, + const image_descriptor &imageDescriptor, + const sycl::queue &syclQueue); void *map_external_memory_buffer( - const sycl::context &syclContext, - interop_mem_handle interopMemHandle); - + interop_mem_handle interopMemHandle, + const sycl::device &syclDevice, + const sycl::context &syclContext); +void *map_external_memory_buffer( + interop_mem_handle interopMemHandle, + const sycl::queue &syclQueue); } ``` The resulting `interop_mem_handle` can then be mapped to two different memory types, a buffer, or an array. -If mapped to a buffer, the resulting `void*` can -be used in USM like fashion. It can be passed as an argument to a kernel and -therein manipulated. +If mapped to a buffer, the resulting `void*` can be used in USM like fashion. +It can be passed as an argument to a kernel and therein manipulated. If mapped to an array, the resulting type is an `image_mem_handle`, which can be used to construct images in the same way as memory allocated through @@ -1158,10 +1266,11 @@ created through `import_external_memory`. ```cpp namespace sycl::ext::oneapi::experimental { -void -destroy_external_memory(const sycl::context &syclContext, - interop_mem_handle interopMem); - +void destroy_external_memory(interop_mem_handle interopMem, + const sycl::device &syclDevice, + const sycl::context &syclContext); +void destroy_external_memory(interop_mem_handle interopMem, + const sycl::queue &syclQueue); } ``` @@ -1230,10 +1339,17 @@ struct interop_semaphore_handle { template interop_semaphore_handle import_external_semaphore( - const sycl::context &syclContext, external_semaphore_descriptor - externalSemaphoreDescriptor); + externalSemaphoreDescriptor, + const sycl::device &syclDevice, + const sycl::context &syclContext); +} +template +interop_semaphore_handle import_external_semaphore( + external_semaphore_descriptor + externalSemaphoreDescriptor, + const sycl::queue &syclQueue); } ``` @@ -1241,46 +1357,8 @@ The resulting `interop_semaphore_handle` can then be used in a SYCL command group, to either wait until the semaphore is in the signaled state, or set the semaphore to a signaled state. -```cpp -namespace sycl::ext::oneapi::experimental { - -sycl::event -wait_external_semaphore(const sycl::queue &syclQueue, - interop_semaphore_handle semaphoreHandle); - -sycl::event -signal_external_semaphore(const sycl::queue &syclQueue, - interop_semaphore_handle semaphoreHandle); - -} -``` - -`wait_external_semaphore` and `signal_external_semaphore` are non-blocking, -asynchronous operations. - -Any operations submitted to the queue after a `wait_external_semaphore` call -will not begin until the imported semaphore is in a signaled state. - -When `signal_external_semaphore` is called, the external semaphore will be set -to the signaled state after all commands submitted to the queue prior to the -`signal_external_semaphore` call complete. - -The user must ensure to destroy all external semaphore objects once they are no -longer required through `destroy_external_semaphore`. - -```cpp -namespace sycl::ext::oneapi::experimental { - -void -destroy_external_semaphore(const sycl::context &syclContext, - interop_semaphore_handle semaphoreHandle); - -} -``` - -In addition to the extension functions described above, we also propose to -extend the SYCL queue and handler classes with semaphore waiting and signalling -operations. +We propose to extend the SYCL queue and handler classes with semaphore waiting +and signalling operations. ```cpp namespace sycl { @@ -1325,15 +1403,42 @@ public: } ``` +Any operations submitted to the queue after a +`ext_oneapi_wait_external_semaphore` call will not begin until the imported +semaphore is in a signaled state. + +When `ext_oneapi_signal_external_semaphore` is called, the external semaphore +will be set to the signaled state after all commands submitted to the queue +prior to the `ext_oneapi_signal_external_semaphore` call complete. + +`ext_oneapi_wait_external_semaphore` and `ext_oneapi_signal_external_semaphore` +are non-blocking, asynchronous operations. + +The user must ensure to destroy all external semaphore objects once they are no +longer required through `destroy_external_semaphore`. + +```cpp +namespace sycl::ext::oneapi::experimental { + +void destroy_external_semaphore(interop_semaphore_handle semaphoreHandle, + const sycl::device &syclDevice, + const sycl::context &syclContext); + +void destroy_external_semaphore(interop_semaphore_handle semaphoreHandle, + const sycl::queue &syclQueue); + +} +``` + == Examples === 1D image read/write ```cpp // Set up device, queue, and context -sycl::device dev; -sycl::queue q(dev); -sycl::context ctxt = q.get_context(); +sycl::device device; +sycl::queue queue(device); +sycl::context context = queue.get_context(); // Initialize input data constexpr size_t width = 512; @@ -1345,19 +1450,19 @@ for (int i = 0; i < width; i++) { // Image descriptor - can use the same for both images sycl::ext::oneapi::experimental::image_descriptor desc( - sycl::range{width}, sycl::image_channel_order::r, - sycl::image_channel_type::fp32); + sycl::range{width}, sycl::ext::oneapi::experimental::image_channel_order::r, + sycl::ext::oneapi::experimental::image_channel_type::fp32); try { // Extension: returns the device pointer to the allocated memory - sycl::ext::oneapi::experimental::image_mem imgMemoryIn(ctxt, desc); - sycl::ext::oneapi::experimental::image_mem imgMemoryOut(ctxt, desc); + sycl::ext::oneapi::experimental::image_mem imgMemoryIn(desc, queue); + sycl::ext::oneapi::experimental::image_mem imgMemoryOut(desc, queue); // Extension: create the image and return the handle sycl::ext::oneapi::experimental::unsampled_image_handle imgIn = - sycl::ext::oneapi::experimental::create_image(ctxt, imgMemoryIn, desc); + sycl::ext::oneapi::experimental::create_image(imgMemoryIn, desc, queue); sycl::ext::oneapi::experimental::unsampled_image_handle imgOut = - sycl::ext::oneapi::experimental::create_image(ctxt, imgMemoryOut, desc); + sycl::ext::oneapi::experimental::create_image(imgMemoryOut, desc, queue); // Extension: copy over data to device q.ext_oneapi_copy(dataIn.data(), imgMemoryIn, desc); @@ -1386,8 +1491,8 @@ try { q.ext_oneapi_copy(imgMemoryOut, dataOut.data(), desc); // Cleanup - sycl::ext::oneapi::experimental::destroy_image_handle(ctxt, imgIn); - sycl::ext::oneapi::experimental::destroy_image_handle(ctxt, imgOut); + sycl::ext::oneapi::experimental::destroy_image_handle(imgIn, queue); + sycl::ext::oneapi::experimental::destroy_image_handle(imgOut, queue); } catch (sycl::exception e) { std::cerr << "SYCL exception caught: " << e.what(); exit(-1); @@ -1402,18 +1507,18 @@ bool validated = (dataIn == dataOut); ```cpp // Set up device, queue, and context -sycl::device dev; -sycl::queue q(dev); -sycl::context ctxt = q.get_context(); +sycl::device device; +sycl::queue queue(device); +sycl::context context = queue.get_context(); // declare image data size_t numImages = 5; size_t width = 8; size_t height = 8; -size_t N = width * height; -std::vector dataIn(N); -std::vector dataOut(N); -std::vector dataExpected(N); +size_t numPixels = width * height; +std::vector dataIn(numPixels); +std::vector dataOut(numPixels); +std::vector dataExpected(numPixels); for (int i = 0; i < width; i++) { for (int j = 0; j < height; j++) { int index = j + (height * i); @@ -1424,8 +1529,8 @@ for (int i = 0; i < width; i++) { // Image descriptor - can use the same for all images sycl::ext::oneapi::experimental::image_descriptor desc( - {width, height}, sycl::image_channel_order::r, - sycl::image_channel_type::fp32); + {width, height}, sycl::ext::oneapi::experimental::image_channel_order::r, + sycl::ext::oneapi::experimental::image_channel_type::fp32); try { @@ -1434,7 +1539,7 @@ try { for (int i = 0; i < numImages; i++) { // Extension: move-construct device allocated memory imgAllocations.emplace_back( - sycl::ext::oneapi::experimental::image_mem{ctxt, desc}); + sycl::ext::oneapi::experimental::image_mem{desc, queue}); } // Copy over data to device for each image @@ -1452,8 +1557,8 @@ try { for (int i = 0; i < numImages; i++) { // Extension: create the image and return the handle sycl::ext::oneapi::experimental::unsampled_image_handle imgHandle = - sycl::ext::oneapi::experimental::create_image(ctxt, imgAllocations[i], - desc); + sycl::ext::oneapi::experimental::create_image(imgAllocations[i], + desc, queue); imgHandles.push_back(imgHandle); } @@ -1485,8 +1590,7 @@ try { // Cleanup for (int i = 0; i < numImages; i++) { - sycl::ext::oneapi::experimental::destroy_image_handle(ctxt, - imgHandles[i]); + sycl::ext::oneapi::experimental::destroy_image_handle(imgHandles[i], queue); } } catch (sycl::exception e) { std::cerr << "SYCL exception caught: " << e.what(); @@ -1500,77 +1604,75 @@ bool validated = (dataOut == dataExpected); === Reading a 1D mipmap with anisotropic filtering and levels ```cpp // Set up device, queue, and context -sycl::device dev; -sycl::queue q(dev); -sycl::context ctxt = q.get_context(); +sycl::device device; +sycl::queue queue(device); +sycl::context context = q.get_context(); // declare image data -constexpr size_t N = 16; -constexpr size_t width = N; +constexpr size_t width = 16; unsigned int num_levels = 2; -std::vector out(N); -std::vector expected(N); -std::vector dataIn1(N); -std::vector dataIn2(N / 2); +std::vector dataIn1(width); +std::vector dataIn2(width / 2); +std::vector dataOut(width); +std::vector dataExpected(width); int j = 0; -for (int i = 0; i < N; i++) { - expected[i] = i + (j + 10); +for (int i = 0; i < width; i++) { + dataExpected[i] = static_cast(i + (j + 10)); if (i % 2) j++; - dataIn1[i] = float4(i, i, i, i); + dataIn1[i] = static_cast(i); if (i < (N / 2)) - dataIn2[i] = float4(i + 10, i + 10, i + 10, i + 10); + dataIn2[i] = static_cast(i + 10); } try { // Image descriptor -- number of levels sycl::ext::oneapi::experimental::image_descriptor desc( - {width}, image_channel_order::rgba, image_channel_type::fp32, - sycl::ext::oneapi::experimental::image_type::Mipmap, num_levels); + {width}, sycl::ext::oneapi::experimental::image_channel_order::r, + sycl::ext::oneapi::experimental::image_channel_type::fp32, + sycl::ext::oneapi::experimental::image_type::mipmap, num_levels); // Allocate the mipmap - sycl::ext::oneapi::experimental::image_mem mip_mem(ctxt, desc); + sycl::ext::oneapi::experimental::image_mem mip_mem(desc, queue); // Retrieve level 0 - sycl::ext::oneapi::experimental::image_mem_handle img_mem1 = - sycl::ext::oneapi::experimental::get_mip_level(ctxt, mip_mem, 0); + sycl::ext::oneapi::experimental::image_mem_handle img_mem1 = + mip_mem.get_mip_level_mem_handle(0) // Copy over data to level 0 q.ext_oneapi_copy(dataIn1.data(), img_mem1, desc); // Copy over data to level 1 - // Copy function handles sizing - unsigned int level = 1; - q.ext_oneapi_copy(dataIn2.data(), mip_mem, desc, level); + q.ext_oneapi_copy(dataIn2.data(), mip_mem.get_mip_level_mem_handle(1), + desc.get_mip_level_desc(1)); q.wait_and_throw(); // Extended sampler object to take in mipmap attributes - sampler samp(coordinate_normalization_mode::normalized, + sycl::sampler samp(coordinate_normalization_mode::normalized, addressing_mode::mirrored_repeat, filtering_mode::nearest, mipmap_filtering_mode::nearest, 0.0f, (float)num_levels, 8.0f); // Create a sampled image handle to represent the mipmap sycl::ext::oneapi::experimental::sampled_image_handle mipHandle = - sycl::ext::oneapi::experimental::create_image(ctxt, mip_mem, samp, - desc); + sycl::ext::oneapi::experimental::create_image(mip_mem, samp, desc, queue); q.wait_and_throw(); - buffer buf((float *)out.data(), N); + sycl::buffer buf((float *)dataOut.data(), width); q.submit([&](handler &cgh) { - auto outAcc = buf.get_access(cgh, N); + auto outAcc = buf.get_access(cgh, width); - cgh.parallel_for(N, [=](id<1> id) { + cgh.parallel_for(width, [=](id<1> id) { float sum = 0; - float x = float(id[0] + 0.5) / (float)N; + float x = (static_cast(id[0]) + 0.5f) / static_cast(width); // Read mipmap level 0 with anisotropic filtering // and level 1 with level filtering - float4 px1 = sycl::ext::oneapi::experimental::read_image( + float px1 = sycl::ext::oneapi::experimental::read_image( mipHandle, x, 0.0f, 0.0f); - float4 px2 = sycl::ext::oneapi::experimental::read_image( + float px2 = sycl::ext::oneapi::experimental::read_image( mipHandle, x, 1.0f); - sum = px1[0] + px2[0]; + sum = px1 + px2; outAcc[id] = sum; }); }); @@ -1578,7 +1680,7 @@ try { q.wait_and_throw(); // Cleanup - sycl::ext::oneapi::experimental::destroy_image_handle(ctxt, mipHandle); + sycl::ext::oneapi::experimental::destroy_image_handle(mipHandle, queue); } catch (sycl::exception e) { std::cerr << "SYCL exception caught! : " << e.what() << "\n"; @@ -1588,7 +1690,7 @@ try { exit(-1); } -// Validate that `out` is correct +// Validate that `dataOut` is correct bool validated = (dataOut == dataExpected); ``` @@ -1596,18 +1698,20 @@ bool validated = (dataOut == dataExpected); ```c++ // Set up device, queue, and context -sycl::device dev; -sycl::queue q(dev); -auto ctxt = q.get_context(); +sycl::device device; +sycl::queue queue(device); +sycl::context context = queue.get_context(); size_t width = /* passed from external API */; size_t height = /* passed from external API */; -sycl::image_channel_order channel_order = /* mapped from external API */ - /* we assume sycl::image_channel_order::rgba */; +sycl::ext::oneapi::experimental::image_channel_order channel_order = + /* mapped from external API */ + /* we assume sycl::image_channel_order::r */; -sycl::image_channel_type channel_type = /* mapped from external API */ - /* we assume sycl::image_channel_type::unsigned_int32 */;; +sycl::ext::oneapi::experimental::image_channel_type channel_type = + /* mapped from external API */ + /* we assume sycl::image_channel_type::unsigned_int32 */; // Image descriptor - mapped to external API image layout // with `image_type::interop` @@ -1615,7 +1719,7 @@ sycl::ext::oneapi::experimental::image_descriptor desc( {width, height}, channel_order, channel_type, sycl::ext::oneapi::experimental::image_type::interop); -size_t img_size_in_bytes = width * height * sizeof(sycl::vec); +size_t img_size_in_bytes = width * height * sizeof(uint32_t); int external_input_image_file_descriptor = /* passed from external API */ int external_output_image_file_descriptor = /* passed from external API */ @@ -1654,39 +1758,39 @@ try { sycl::ext::oneapi::experimental::interop_semaphore_handle wait_interop_semaphore_handle = sycl::ext::oneapi::experimental::import_external_semaphore( - ctxt, wait_external_semaphore_desc); + wait_external_semaphore_desc, queue); sycl::ext::oneapi::experimental::interop_semaphore_handle done_interop_semaphore_handle = sycl::ext::oneapi::experimental::import_external_semaphore( - ctxt, done_external_semaphore_desc); + done_external_semaphore_desc, queue); // Extension: import external memory from descriptors sycl::ext::oneapi::experimental::interop_mem_handle input_interop_mem_handle = sycl::ext::oneapi::experimental::import_external_memory( - ctxt, input_ext_mem_desc); + input_ext_mem_desc, queue); sycl::ext::oneapi::experimental::interop_mem_handle output_interop_mem_handle = sycl::ext::oneapi::experimental::import_external_memory( - ctxt, output_ext_mem_desc); + output_ext_mem_desc, queue); // Extension: map imported external memory to image memory sycl::ext::oneapi::experimental::image_mem_handle input_mapped_mem_handle = sycl::ext::oneapi::experimental::map_external_memory_array( - ctxt, input_interop_mem_handle, desc); + input_interop_mem_handle, desc, queue); sycl::ext::oneapi::experimental::image_mem_handle output_mapped_mem_handle = sycl::ext::oneapi::experimental::map_external_memory_array( - ctxt, output_interop_mem_handle, desc); + output_interop_mem_handle, desc, queue); // Extension: create images from mapped memory and return the handles sycl::ext::oneapi::experimental::unsampled_image_handle img_input = sycl::ext::oneapi::experimental::create_image( - ctxt, input_mapped_mem_handle, desc); + input_mapped_mem_handle, desc, queue); sycl::ext::oneapi::experimental::unsampled_image_handle img_output = sycl::ext::oneapi::experimental::create_image( - ctxt, output_mapped_mem_handle, desc); + output_mapped_mem_handle, desc, queue); // Extension: wait for imported semaphore q.ext_oneapi_wait_external_semaphore(wait_interop_semaphore_handle) @@ -1700,16 +1804,16 @@ try { size_t dim1 = it.get_global_id(1); // Extension: read image data from handle to imported image - sycl::uint4 pixel = - sycl::ext::oneapi::experimental::read_image( - img_input, sycl::int2(dim0, dim1)); + uint32_t pixel = + sycl::ext::oneapi::experimental::read_image( + img_input, sycl::vec(dim0, dim1)); // Modify the data before writing back pixel *= 10; // Extension: write image data using handle to imported image sycl::ext::oneapi::experimental::write_image( - img_output, sycl::int2(dim0, dim1), pixel); + img_output, sycl::vec(dim0, dim1), pixel); }); }); @@ -1723,15 +1827,15 @@ try { // Extension: destroy all external resources sycl::ext::oneapi::experimental::destroy_external_memory( - ctxt, input_interop_mem_handle); + input_interop_mem_handle, queue); sycl::ext::oneapi::experimental::destroy_external_memory( - ctxt, output_interop_mem_handle); + output_interop_mem_handle, queue); sycl::ext::oneapi::experimental::destroy_external_semaphore( - ctxt, wait_interop_semaphore_handle); + wait_interop_semaphore_handle, queue); sycl::ext::oneapi::experimental::destroy_external_semaphore( - ctxt, done_interop_semaphore_handle); - sycl::ext::oneapi::experimental::destroy_image_handle(ctxt, img_input); - sycl::ext::oneapi::experimental::destroy_image_handle(ctxt, img_output); + done_interop_semaphore_handle, queue); + sycl::ext::oneapi::experimental::destroy_image_handle(img_input, queue); + sycl::ext::oneapi::experimental::destroy_image_handle(img_output, queue); } catch (sycl::exception e) { std::cerr << "SYCL exception caught! : " << e.what() << "\n"; exit(-1); @@ -1750,6 +1854,13 @@ work across different backends. == Issues +== Use of SYCL 1.2.1 `sycl::sampler` + +At the time of writing the extended `sycl::sampler` class, DPC++ had no support +for the SYCL 2020 `sycl::image_sampler` struct. Hence the proposal and current +experimental implementation both use the old SYCL 1.2.1 `sycl::sampler`. Both +will be updated to use the SYCL 2020 `sycl::image_sampler` in the future. + === No dependency tracking Because this extension allows images to work in a USM-like model, @@ -1767,15 +1878,15 @@ There are dimension specific limitations: restrictions and queries" section, or use `pitched_alloc_device` to allocate 2D USM image memory. * 3D - No support at the moment. Possible support in non CUDA backends in the - future. A query is present to check 3D USM capabilities + future. === Not supported yet These features still need to be handled: * Level Zero and SPIR-V support -* Mipmapping -* etc. +* Layered images +* Cubemap images == Revision History @@ -1823,4 +1934,55 @@ These features still need to be handled: capabilities - Added mipmap support +|4 |2023-06-23 | - Added `sycl::device` parameter to multiple functions to + clarify that images must be created and used on the same + device. + + - Changed naming and order of some parameters to be consistent + throughout the proposal and with core SYCL. + + - Added variants of functions that take a `sycl::queue` + instead of both `sycl::device` and `sycl::context`. + + - Removed standalone wait and signal semaphore functions. These + should always go through the queue or handler methods. + + - Removed `get_image_handle` and `get_sampler_handle` functions + from sampled and unsampled image handle structs. The structs + have public handle members that can be retrieved without + getters. + + - Made all enum types and values unspecified + + - Moved support queries to device aspects, improved naming of + queries for better consistency, and moved device info queries + to the experimental namespace. + + - Added `get_mip_level_desc` member function to + `image_descriptor` + + - Fixed `get_mip_level_mem_handle` prototype in `image_mem`, + and added a standalone function. + + - Removed `ext_oneapi_copy` variants that take `image_mem`, + the user should retrieve the raw handle and pass that + themselves. + + - Removed `ext_oneapi_copy` variants that take a mip level, + the user should retrieve individual mip level image handles + themselves and pass that. + + - Added `ext_oneapi_copy` variants that take offsets and the + extent, to enable sub-region copy. + + - Created a list of failure scenarios for `ext_oneapi_copy`, + changed the failure error code to `errc::invalid`, and + specified that the implementation should relay the reason + for the failure back to the user. + + - Added notes on the exception of using the SYCL 1.2.1 + `sampler` class, and reason for not using the SYCL 2020 + `image_sampler` struct. + + - Updated code samples |====================== From 75cd62ebfe4430047c8a26aef63715ad74325c3b Mon Sep 17 00:00:00 2001 From: Przemek Malon Date: Mon, 3 Jul 2023 09:33:34 +0100 Subject: [PATCH 5/6] [SYCL][Doc] Add bindless_image_sampler struct Added a bindless_image_sampler struct to the proposal. We no longer extend any existing SYCL samplers but use our own instead to avoid ABI breaks. --- .../sycl_ext_oneapi_bindless_images.asciidoc | 154 ++++++++---------- 1 file changed, 68 insertions(+), 86 deletions(-) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_bindless_images.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_bindless_images.asciidoc index 1dde07e83f9b2..d0a7965d1dac9 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_bindless_images.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_bindless_images.asciidoc @@ -34,13 +34,6 @@ This extension is written against the SYCL 2020 revision 6 specification. All references below to the "core SYCL specification" or to section numbers in the SYCL specification refer to that revision. -[NOTE] -==== -One exception to this is use of the SYCL 1.2.1 `sycl::sampler` class. Please -refer to the issues section at the bottom of this document for reasons why we -do not yet use the SYCL 2020 `sycl::image_sampler` struct. -==== - == Status This is an experimental extension specification, intended to provide early @@ -529,12 +522,12 @@ unsampled_image_handle create_image(image_mem_handle memHandle, // Creating a sampled image from an `image_mem_handle` sampled_image_handle create_image(image_mem_handle memHandle, const image_descriptor &desc, - const sycl::sampler &syclSampler, + const bindless_image_sampler &sampler, const sycl::device &syclDevice, const sycl::context &syclContext); sampled_image_handle create_image(image_mem_handle memHandle, const image_descriptor &desc, - const sycl::sampler &syclSampler, + const bindless_image_sampler &sampler, const sycl::queue &syclQueue); // Creating an unsampled image from an `image_mem` object @@ -549,23 +542,23 @@ unsampled_image_handle create_image(const image_mem &memHandle, // Creating a sampled image from an `image_mem` object sampled_image_handle create_image(const image_mem &memHandle, const image_descriptor &desc, - const sycl::sampler &syclSampler, + const bindless_image_sampler &sampler, const sycl::device &syclDevice, const sycl::context &syclContext); sampled_image_handle create_image(const image_mem &memHandle, const image_descriptor &desc, - const sycl::sampler &syclSampler, + const bindless_image_sampler &sampler, const sycl::queue &syclQueue); // Creating a sampled image from a USM allocation and pitch sampled_image_handle create_image(const void *usmPtr, size_t pitch, const image_descriptor &desc, - const sycl::sampler &syclSampler, + const bindless_image_sampler &sampler, const sycl::device &syclDevice, const sycl::context &syclContext); sampled_image_handle create_image(const void *usmPtr, size_t pitch, const image_descriptor &desc, - const sycl::sampler &syclSampler, + const bindless_image_sampler &sampler, const sycl::queue &syclQueue); // Destroying an image handle @@ -590,10 +583,11 @@ They can be captured by value into a SYCL kernel, or they can be passed in a buffer as a dynamic array of images (see examples at the bottom of this document). -We can either provide a `sycl::sampler` or not when creating the image. Doing -so will create a `sampled_image_handle`, where otherwise an -`unsampled_image_handle` would be returned. A `sampled_image_handle` should -contain a raw sampler handle that will be used when sampling an image. +We can either provide a `bindless_image_sampler` (defined in section below) or +not when creating the image. Doing so will create a `sampled_image_handle`, +where otherwise an `unsampled_image_handle` would be returned. A +`sampled_image_handle` should contain a raw sampler handle that will be used +when sampling an image. Whether an `image_descriptor` or `void *` USM allocation was passed to `create_image`, it must have been allocated in the same context and on the same @@ -627,6 +621,54 @@ After we're done with the image, we need to destroy the handle using underlying image memory. The user is responsible for deallocation, either through `free_image_mem`, or destroying the `image_mem` object, if one was used. +=== Image sampler struct + +The `bindless_image_sampler` struct shown below is used to set the sampling +properties of `sampled_images` upon image creation. It can be used to set +sampling properties that exist in the SYCL 2020 `image_sampler` as well as +extra properties used for sampling mipmaps including level-of-detail (LOD) and +anisotropic filtering. + +```cpp +namespace sycl::ext::oneapi::experimental { + +struct bindless_image_sampler { + + bindless_image_sampler(sycl::addressing_mode addressing, + sycl::coordinate_normalization_mode coordinate, + sycl::filtering_mode filtering); + + bindless_image_sampler(sycl::addressing_mode addressing, + sycl::coordinate_normalization_mode coordinate, + sycl::filtering_mode filtering, + sycl::filtering_mode mipFiltering, + float minMipmapLevelClamp, float maxMipmapLevelClamp, + float maxAnisotropy); + + sycl::addressing_mode addressing; + sycl::coordinate_normalization_mode coordinate; + sycl::filtering_mode filtering; + sycl::filtering_mode mipmapFiltering; + float minMipmapLevelClamp; + float maxMipmapLevelClamp; + float maxAnisotropy; +}; + +} +``` + +`mipmapFiltering` dictates the method in which sampling between mipmap +levels is performed. + +`minMipmapLevelClamp` defines the minimum mipmap level from which we can sample, +with the minimum value being 0. + +`maxMipmapLevelClamp` defines the maximum mipmap level from which we can sample. +This cannot value cannot be higher than the number of allocated levels. + +`maxAnisotropy` dictates the anisotropic ratio used when samplling the mipmap +with anisotropic filtering. + === Explicit copies ```cpp @@ -918,8 +960,8 @@ Inside a kernel, it's possible to read an image via `read_image`, passing the image handle. For the form that takes `unsampled_image_handle`, image data will be fetched exactly as is in device memory. For the form that takes a `sampled_image_handle`, the image will be sampled according to the -`sycl::sampler` that was passed to the image upon construction. The sampler -handle and is included in the `sampled_image_handle` as +`bindless_image_sampler` that was passed to the image upon construction. The +sampler handle is included in the `sampled_image_handle` as `sampled_image_handle::raw_sampler_handle`. The returned data will be of templated type `DataT`, which is specified by the @@ -1029,64 +1071,12 @@ specific level or create an image based on that level. A handle to a mipmap image is acquired in the same way as a `sampled_image_handle`. Mipmaps can only be sampled image types. We can create a `sampled_image_handle` to the allocated mipmap through the `create_image` -functions which take a sampler. - -Below we describe an extension to the existing SYCL `sampler` object which -allows for the sampling of mipmaps with support for level of detail(LOD) and -anisotropic filtering. To sample a mipmap correctly, the extended attributes of -this sampler must be defined. +functions which take a `bindless_image_sampler`. To sample a mipmap correctly, +the mipmap attributes of this sampler must be defined. Attempting to create an `unsampled_image_handle` to a mipmap will result in a `sycl::exception` with error code `sycl::errc::runtime` being thrown. -=== Extended sampler object - -To represent the different sampling capabilities that mipmaps provide as opposed -to standard sampled images, we propose to extend the core SYCL `sampler` -class. - -We propose to add a `mipmap_filtering_mode` enum, mipmap filtering properties' -values, getter functions, and a new constructor to the SYCL `sampler` -class. - -```c++ -namespace sycl { - -enum class mipmap_filtering_mode : /* unspecified */ { - nearest, - linear, -}; - -class sampler { -public: - - sampler(coordinate_normalization_mode normalizationMode, - addressing_mode addressingMode, filtering_mode filteringMode, - mipmap_filtering_mode mipmapFilteringMode, float minMipmapLevelClamp, - float maxMipmapLevelClamp, float maxAnisotropy, - const property_list &propList = {}); - - - mipmap_filtering_mode get_mipmap_filtering_mode() const; - float get_min_mipmap_level_clamp() const; - float get_max_mipmap_level_clamp() const; - float get_max_anisotropy() const; -}; -} -``` - -`mipmapFilteringMode` dictates the method in which sampling between mipmap -levels is performed. - -`minMipmapLevelClamp` defines the minimum mipmap level from which we can sample, -with the minimum value being 0. - -`maxMipmapLevelClamp` defines the maximum mipmap level from which we can sample. -This cannot value cannot be higher than the number of allocated levels. - -`maxAnisotropy` dictates the anisotropic ratio used when samplling the mipmap -with anisotropic filtering. - === Copying mipmap image data In order to copy to or from mipmaps, the user should retrieve an individual @@ -1649,9 +1639,10 @@ try { q.wait_and_throw(); // Extended sampler object to take in mipmap attributes - sycl::sampler samp(coordinate_normalization_mode::normalized, - addressing_mode::mirrored_repeat, filtering_mode::nearest, - mipmap_filtering_mode::nearest, 0.0f, (float)num_levels, 8.0f); + sycl::ext::oneapi::experimental::bindless_image_sampler samp( + addressing_mode::mirrored_repeat, + coordinate_normalization_mode::normalized, filtering_mode::nearest, + mipmap_filtering_mode::nearest, 0.0f, (float)num_levels, 8.0f); // Create a sampled image handle to represent the mipmap sycl::ext::oneapi::experimental::sampled_image_handle mipHandle = @@ -1854,13 +1845,6 @@ work across different backends. == Issues -== Use of SYCL 1.2.1 `sycl::sampler` - -At the time of writing the extended `sycl::sampler` class, DPC++ had no support -for the SYCL 2020 `sycl::image_sampler` struct. Hence the proposal and current -experimental implementation both use the old SYCL 1.2.1 `sycl::sampler`. Both -will be updated to use the SYCL 2020 `sycl::image_sampler` in the future. - === No dependency tracking Because this extension allows images to work in a USM-like model, @@ -1980,9 +1964,7 @@ These features still need to be handled: specified that the implementation should relay the reason for the failure back to the user. - - Added notes on the exception of using the SYCL 1.2.1 - `sampler` class, and reason for not using the SYCL 2020 - `image_sampler` struct. + - Added a `bindless_image_sampler` struct - Updated code samples |====================== From e8fc688a567552a6ab8352b600a2ec0e1d9394fd Mon Sep 17 00:00:00 2001 From: Przemek Malon Date: Mon, 3 Jul 2023 21:09:42 +0100 Subject: [PATCH 6/6] [SYCL][Doc] Specify that `image_mem` must follow CRS `image_mem` has been specified that it now must follow Common Reference Semantics as outlined by the core SYCL 2020 specification. --- .../sycl_ext_oneapi_bindless_images.asciidoc | 27 ++++++++++--------- 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_bindless_images.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_bindless_images.asciidoc index d0a7965d1dac9..98f3e213eed40 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_bindless_images.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_bindless_images.asciidoc @@ -276,9 +276,9 @@ struct image_mem_handle { class image_mem { public: image_mem(); - image_mem(const image_mem &) = delete; - + image_mem(const image_mem &rhs); image_mem(image_mem &&rhs) noexcept; + image_mem(const image_descriptor &imageDesc, const sycl::device &syclDevice, const sycl::context &syclContext); @@ -287,8 +287,11 @@ public: ~image_mem(); - image_mem &operator=(image_mem &&); - image_mem &operator=(image_mem &) = delete; + image_mem &operator=(image_mem &rhs); + image_mem &operator=(image_mem &&rhs) noexcept; + + bool operator==(const image_mem &rhs) const; + bool operator!=(const image_mem &rhs) const; image_mem_handle get_handle() const; image_descriptor get_descriptor() const; @@ -354,13 +357,10 @@ Note that the handle type `image_mem_handle::raw_handle_type` is an opaque type, and the handle cannot be dereferenced on the host. The layout of the memory is backend-specific, and may be an optimized layout, e.g. tile swizzle patterns. -`image_mem` shall not be copy-constructible, copy-assignable, or -device-copyable. -`image_mem` shall be default-constructible, move-constructible, as well as -provide a move-assignment operator. +The `image_mem` class must follow Common Reference Semantics as outlined by the +core SYCL 2020 specification. -`image_mem_handle` shall be default-constructible and copy-constructible. -`image_mem_handle` shall not be device-copyable. +The `image_mem` class is not a valid kernel argument. If the construction of the `image_mem` class fails, a `sycl::exception` with error code `sycl::errc::memory_allocation` will be @@ -1964,7 +1964,10 @@ These features still need to be handled: specified that the implementation should relay the reason for the failure back to the user. - - Added a `bindless_image_sampler` struct + - Added a `bindless_image_sampler` struct. + + - Specified that `image_mem` must follow Common Reference + Semantics. - - Updated code samples + - Updated code samples. |======================