-
Notifications
You must be signed in to change notification settings - Fork 750
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SYCL] Extension spec for queue index (#7520)
Add a proposed extension specification that allows a queue to be tied to a "queue index".
- Loading branch information
Showing
1 changed file
with
214 additions
and
0 deletions.
There are no files selected for viewing
214 changes: 214 additions & 0 deletions
214
sycl/doc/extensions/proposed/sycl_ext_intel_queue_index.asciidoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,214 @@ | ||
= sycl_ext_intel_queue_index | ||
|
||
:source-highlighter: coderay | ||
:coderay-linenums-mode: table | ||
|
||
// This section needs to be after the document title. | ||
:doctype: book | ||
:toc2: | ||
:toc: left | ||
:encoding: utf-8 | ||
:lang: en | ||
:dpcpp: pass:[DPC++] | ||
|
||
// Set the default source code type in this document to C++, | ||
// for syntax highlighting purposes. This is needed because | ||
// docbook uses c++ and html5 uses cpp. | ||
:language: {basebackend@docbook:c++:cpp} | ||
|
||
|
||
== Notice | ||
|
||
[%hardbreaks] | ||
Copyright (C) 2022-2022 Intel Corporation. All rights reserved. | ||
|
||
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks | ||
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by | ||
permission by Khronos. | ||
|
||
|
||
== Contact | ||
|
||
To report problems with this extension, please open a new issue at: | ||
|
||
https://github.com/intel/llvm/issues | ||
|
||
|
||
== Dependencies | ||
|
||
This extension is written against the SYCL 2020 revision 6 specification. All | ||
references below to the "core SYCL specification" or to section numbers in the | ||
SYCL specification refer to that revision. | ||
|
||
|
||
== Status | ||
|
||
This is a proposed extension specification, intended to gather community | ||
feedback. Interfaces defined in this specification may not be implemented yet | ||
or may be in a preliminary state. The specification itself may also change in | ||
incompatible ways before it is finalized. *Shipping software products should | ||
not rely on APIs defined in this specification.* | ||
|
||
|
||
== Overview | ||
|
||
Backends such as Level Zero and OpenCL expose an "index" to a device's work | ||
submission queue, which allows the application to fine tune the way work is | ||
submitted to a device. This extension exposes that same concept to SYCL | ||
applications. | ||
|
||
Most SYCL applications should not need to use this extension because the SYCL | ||
implementation automatically selects an efficient way to submit work to a | ||
device, including automatic selection of a queue index when necessary. | ||
Therefore, this extension is aimed at advanced users who understand the device | ||
hardware and think they can outperform the default implementation by specifying | ||
an explicit queue index. | ||
|
||
Note that this extension can be supported on any backend, even if the backend | ||
has no notion of a "queue index". Backends that have no native support for a | ||
queue index can report that a device has only a single available queue index. | ||
Applications can then only request one possible queue index, and the backend | ||
can treat this as the default behavior (i.e. the backend can ignore the index). | ||
|
||
|
||
== Specification | ||
|
||
=== Feature test macro | ||
|
||
This extension provides a feature-test macro as described in the core SYCL | ||
specification. An implementation supporting this extension must predefine the | ||
macro `SYCL_EXT_INTEL_QUEUE_INDEX` to one of the values defined in the table | ||
below. Applications can test for the existence of this macro to determine if | ||
the implementation supports this feature, or applications can test the macro's | ||
value to determine which of the extension's features the implementation | ||
supports. | ||
|
||
[%header,cols="1,5"] | ||
|=== | ||
|Value | ||
|Description | ||
|
||
|1 | ||
|Initial version of this extension. | ||
|=== | ||
|
||
=== New device information descriptor | ||
|
||
This extension adds the following new device information descriptor which | ||
allows the application to query the number of available queue indices for the | ||
device. | ||
|
||
``` | ||
namespace sycl::ext::intel::info::device { | ||
|
||
struct max_compute_queue_indices; | ||
|
||
} // namespace sycl::ext::intel::info::device | ||
``` | ||
|
||
The return type for this information descriptor is `int`, and the value is a | ||
positive integer telling the number queue indices that are available for the | ||
device. These indices are numbered sequentially starting at `0`. | ||
|
||
=== New queue property | ||
|
||
This extension adds the following new queue property which can be specified to | ||
the queue constructor via the `property_list` parameter. | ||
|
||
``` | ||
namespace sycl::ext::intel::property::queue { | ||
|
||
class compute_index { | ||
public: | ||
compute_index(int idx); | ||
int get_index(); | ||
}; | ||
|
||
} // namespace sycl::ext::intel::property::queue | ||
``` | ||
|
||
The `compute_index` property is a hint to the implementation which can affect | ||
work submission concurrency. When two queues for the same device have | ||
different queue indices, there is a greater chance that commands submitted to | ||
the two queues will be concurrently submitted to the device. | ||
|
||
It is an error to specify a queue index that is out of range for the queue's | ||
device. The `queue` constructor throws an `exception` with `errc::invalid` if | ||
the index is less than `0` or if the index is greater than or equal to the | ||
value returned by `max_compute_queue_indices` for the queue's device. | ||
|
||
The constructor and member functions of the `compute_index` property have the | ||
following semantics. | ||
|
||
[%header,cols="1,3"] | ||
|=== | ||
|Function | ||
|Description | ||
|
||
|`compute_index(int idx)` | ||
|Constructs a property with the given queue index. | ||
|
||
|`int get_index()` | ||
|Returns the queue index associated with the property. | ||
|=== | ||
|
||
|
||
== Example usage | ||
|
||
The following code snippet shows how to create a SYCL queue using a specific | ||
queue index. | ||
|
||
``` | ||
#include <sycl/sycl.hpp> | ||
|
||
using sycl; | ||
using sycl::ext::intel; | ||
|
||
void foo(device d) { | ||
int max_index = d.get_info<info::device::max_compute_queue_indices>(); | ||
int index = /* choose value between 0 and max_index-1 */; | ||
queue q{d, property::queue::compute_index{index}}; | ||
} | ||
``` | ||
|
||
|
||
== Behavior on Intel GPU devices | ||
|
||
:multi-CCS: https://github.com/intel/compute-runtime/blob/master/level_zero/doc/experimental_extensions/MULTI_CCS_MODES.md | ||
:sycl_ext_intel_cslice: https://github.com/intel/llvm/pull/7513 | ||
|
||
This non-normative section describes the behavior of the `compute_index` | ||
property for some specific Intel GPU devices when using {dpcpp}. These details | ||
are not part of the extension specification, and this behavior may not apply to | ||
other devices. | ||
|
||
On many Intel devices, there is just one available queue index, and there is | ||
therefore no advantage to using the `compute_index` property. However, this | ||
property can sometimes be useful when running on Data Center GPU Flex series | ||
devices (aka ATS-M) or Data Center GPU Max series devices (aka PVC). | ||
|
||
Some models of ATS-M support multiple queue indices with the semantics | ||
described in the sections above. When a single process submits kernels to | ||
different queue indices, there is a greater likelihood that the kernels will | ||
be submitted concurrently. | ||
|
||
PVC also supports multiple queue indices on each tile, but these queue indices | ||
have a different semantic. In order to expose multiple queue indices on PVC, | ||
the device driver must be configured in {multi-CCS}[multi-CCS] mode. In this | ||
mode, the PVC root device still has just one queue index, however each "tile" | ||
has multiple queue indices. Therefore, the application must first create | ||
sub-devices to access each tile, and then the application can construct a queue | ||
on these sub-devices using the `compute_index` property. | ||
|
||
The semantics of these PVC queue indices is different, though. On PVC, each | ||
queue index corresponds to a fixed subset of the execution units. Queues using | ||
different indices still have a greater likelihood of submitting kernels | ||
concurrently, but each kernel also runs on its own partition of the execution | ||
units. Therefore, the `compute_index` property is just an alternate way to | ||
run on a partition of the device, exactly the same as creating a "cslice" | ||
sub-device via the {sycl_ext_intel_cslice}[sycl_ext_intel_cslice] extension. | ||
|
||
In both the ATS-M case and the PVC case, constructing a SYCL queue with | ||
`compute_index` causes the runtime to submit kernels exclusively to that index | ||
on the underlying Level Zero or OpenCL driver. Without this property, the | ||
runtime is free to distribute kernels across the available queue indices. |