From f5fb7599ea70feadd527fd2a91f9724fa9b7ac87 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Fri, 9 Dec 2022 10:52:36 -0500 Subject: [PATCH] [SYCL] Extension spec for queue index (#7520) Add a proposed extension specification that allows a queue to be tied to a "queue index". --- .../sycl_ext_intel_queue_index.asciidoc | 214 ++++++++++++++++++ 1 file changed, 214 insertions(+) create mode 100644 sycl/doc/extensions/proposed/sycl_ext_intel_queue_index.asciidoc diff --git a/sycl/doc/extensions/proposed/sycl_ext_intel_queue_index.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_intel_queue_index.asciidoc new file mode 100644 index 0000000000000..245583276aabc --- /dev/null +++ b/sycl/doc/extensions/proposed/sycl_ext_intel_queue_index.asciidoc @@ -0,0 +1,214 @@ += sycl_ext_intel_queue_index + +:source-highlighter: coderay +:coderay-linenums-mode: table + +// This section needs to be after the document title. +:doctype: book +:toc2: +:toc: left +:encoding: utf-8 +:lang: en +:dpcpp: pass:[DPC++] + +// Set the default source code type in this document to C++, +// for syntax highlighting purposes. This is needed because +// docbook uses c++ and html5 uses cpp. +:language: {basebackend@docbook:c++:cpp} + + +== Notice + +[%hardbreaks] +Copyright (C) 2022-2022 Intel Corporation. All rights reserved. + +Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks +of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by +permission by Khronos. + + +== Contact + +To report problems with this extension, please open a new issue at: + +https://github.com/intel/llvm/issues + + +== Dependencies + +This extension is written against the SYCL 2020 revision 6 specification. All +references below to the "core SYCL specification" or to section numbers in the +SYCL specification refer to that revision. + + +== Status + +This is a proposed extension specification, intended to gather community +feedback. Interfaces defined in this specification may not be implemented yet +or may be in a preliminary state. The specification itself may also change in +incompatible ways before it is finalized. *Shipping software products should +not rely on APIs defined in this specification.* + + +== Overview + +Backends such as Level Zero and OpenCL expose an "index" to a device's work +submission queue, which allows the application to fine tune the way work is +submitted to a device. This extension exposes that same concept to SYCL +applications. + +Most SYCL applications should not need to use this extension because the SYCL +implementation automatically selects an efficient way to submit work to a +device, including automatic selection of a queue index when necessary. +Therefore, this extension is aimed at advanced users who understand the device +hardware and think they can outperform the default implementation by specifying +an explicit queue index. + +Note that this extension can be supported on any backend, even if the backend +has no notion of a "queue index". Backends that have no native support for a +queue index can report that a device has only a single available queue index. +Applications can then only request one possible queue index, and the backend +can treat this as the default behavior (i.e. the backend can ignore the index). + + +== Specification + +=== Feature test macro + +This extension provides a feature-test macro as described in the core SYCL +specification. An implementation supporting this extension must predefine the +macro `SYCL_EXT_INTEL_QUEUE_INDEX` to one of the values defined in the table +below. Applications can test for the existence of this macro to determine if +the implementation supports this feature, or applications can test the macro's +value to determine which of the extension's features the implementation +supports. + +[%header,cols="1,5"] +|=== +|Value +|Description + +|1 +|Initial version of this extension. +|=== + +=== New device information descriptor + +This extension adds the following new device information descriptor which +allows the application to query the number of available queue indices for the +device. + +``` +namespace sycl::ext::intel::info::device { + +struct max_compute_queue_indices; + +} // namespace sycl::ext::intel::info::device +``` + +The return type for this information descriptor is `int`, and the value is a +positive integer telling the number queue indices that are available for the +device. These indices are numbered sequentially starting at `0`. + +=== New queue property + +This extension adds the following new queue property which can be specified to +the queue constructor via the `property_list` parameter. + +``` +namespace sycl::ext::intel::property::queue { + +class compute_index { + public: + compute_index(int idx); + int get_index(); +}; + +} // namespace sycl::ext::intel::property::queue +``` + +The `compute_index` property is a hint to the implementation which can affect +work submission concurrency. When two queues for the same device have +different queue indices, there is a greater chance that commands submitted to +the two queues will be concurrently submitted to the device. + +It is an error to specify a queue index that is out of range for the queue's +device. The `queue` constructor throws an `exception` with `errc::invalid` if +the index is less than `0` or if the index is greater than or equal to the +value returned by `max_compute_queue_indices` for the queue's device. + +The constructor and member functions of the `compute_index` property have the +following semantics. + +[%header,cols="1,3"] +|=== +|Function +|Description + +|`compute_index(int idx)` +|Constructs a property with the given queue index. + +|`int get_index()` +|Returns the queue index associated with the property. +|=== + + +== Example usage + +The following code snippet shows how to create a SYCL queue using a specific +queue index. + +``` +#include + +using sycl; +using sycl::ext::intel; + +void foo(device d) { + int max_index = d.get_info(); + int index = /* choose value between 0 and max_index-1 */; + queue q{d, property::queue::compute_index{index}}; +} +``` + + +== Behavior on Intel GPU devices + +:multi-CCS: https://github.com/intel/compute-runtime/blob/master/level_zero/doc/experimental_extensions/MULTI_CCS_MODES.md +:sycl_ext_intel_cslice: https://github.com/intel/llvm/pull/7513 + +This non-normative section describes the behavior of the `compute_index` +property for some specific Intel GPU devices when using {dpcpp}. These details +are not part of the extension specification, and this behavior may not apply to +other devices. + +On many Intel devices, there is just one available queue index, and there is +therefore no advantage to using the `compute_index` property. However, this +property can sometimes be useful when running on Data Center GPU Flex series +devices (aka ATS-M) or Data Center GPU Max series devices (aka PVC). + +Some models of ATS-M support multiple queue indices with the semantics +described in the sections above. When a single process submits kernels to +different queue indices, there is a greater likelihood that the kernels will +be submitted concurrently. + +PVC also supports multiple queue indices on each tile, but these queue indices +have a different semantic. In order to expose multiple queue indices on PVC, +the device driver must be configured in {multi-CCS}[multi-CCS] mode. In this +mode, the PVC root device still has just one queue index, however each "tile" +has multiple queue indices. Therefore, the application must first create +sub-devices to access each tile, and then the application can construct a queue +on these sub-devices using the `compute_index` property. + +The semantics of these PVC queue indices is different, though. On PVC, each +queue index corresponds to a fixed subset of the execution units. Queues using +different indices still have a greater likelihood of submitting kernels +concurrently, but each kernel also runs on its own partition of the execution +units. Therefore, the `compute_index` property is just an alternate way to +run on a partition of the device, exactly the same as creating a "cslice" +sub-device via the {sycl_ext_intel_cslice}[sycl_ext_intel_cslice] extension. + +In both the ATS-M case and the PVC case, constructing a SYCL queue with +`compute_index` causes the runtime to submit kernels exclusively to that index +on the underlying Level Zero or OpenCL driver. Without this property, the +runtime is free to distribute kernels across the available queue indices.