Command-buffer queue comptabaility #1142

EwanC · 2024-04-04T08:11:55Z

The current clCreateCommandBufferKHR API takes a list of queues on creation, and enforces that the properties of these queues must match the properties of the matching queues passed in clEnqueueCommandBufferKHR. However, this API design does not work well for applications that do not know at the time of command-buffer creation what the properties of the queue that the command-buffer will be executed on are.

In particular this is an issue for SYCL-Graph layering, where when the command-buffer is created during graph.finalize(), the SYCL-RT does not yet know what SYCL queue a user will submit the graph to. The graph is treated as a black box by the SYCL queue that is executed in-order or out-of-order with respect to the other SYCL queue commands depending on the type of queue.

There are two possible approaches to help with this in cl_khr_command_buffer that I've thought about so far:

Optional feature to relax queue compatibility constraint - Add an optional feature to relax the constraint that the queues passed to clEnqueueCommandBuffer must be compatible with those set on creation (or maybe a query to check for "compatible" means for that vendor). The SYCL-RT could already use the clRemapCommandBufferKHR() API with CL_COMMAND_BUFFER_PLATFORM_REMAP_QUEUES_KHR for this to created a remap of the command-buffer before each enqueue during graph submission, but we don't really want to incur the deep copy cost here and the cl_khr_command_buffer_multi_device extension also doesn't have a large amount of vendor coverage. However, a drawback of this is that the SYCL-RT would still need to create a placeholder OpenCL queue when the command-buffer is created and commands added, which has a construction overhead.
Remove the queue object from the command-buffer interface enqueue time. We could replace the command-queue parameters with device parameters (and a context parameter on creation) on command-queue creation/command adding/remap and then only introduce the queues at clEnqueueCommandBuffer time. We'd probably still want more command-buffer properties so that a user can say up-front what type of queue they want the command-buffer to run on if they want a more optimized implementation. This is quite an invasive change.

I haven't really thought all this through yet but the design point has come up recently so I think it's worth creating an issue to track the discussion.

The text was updated successfully, but these errors were encountered:

bashbaug · 2024-11-19T17:33:37Z

Need to consider interactions with PR #850.

EwanC · 2024-11-27T11:31:12Z

I've attempted to form my current thoughts into a new proposal based around idea 1. The cl_khr_command_buffer_multi_device layered extension complicates things, which I haven't fully thought through yet, but will do if folks think this is a good direction.

Command-buffer Append

Semantics of command-queue properties set on the queue passed when appending commands.

in-order/out-of-order property - Affects the dependencies of the command being created.
Profiling property - Ignored, as the extension does not yet expose profiling at the granularity of commands.
Extension defined properties - Ignored, unless interaction with cl_khr_command_buffer commands is defined by that extension.

Command-buffer Enqueue

Semantics of command-queue properties affecting clEnqueueCommandBufferKHR is in line with the regular command-queue property semantics for enqueuing work:

in-order/out-of-order property - Affects the dependencies on the execution of the whole command-buffer.
Profiling property - Affects whether the cl_event returned for the whole command-buffer execution can be profiled.

Changes to what a compatible queue means

SUPPORTED & REQUIRED queue properties from New Command-buffer query for supported queue properties #850
only affect the queues being enqueued to.
For a queue to be compatible with the queue set on creation recording, it must now only match the device & context of the original queue, but it may have different properties within the bounds of the REQUIRED and SUPPORTED queries.

~~## Command-buffer Creation~~

~~Changes to command-buffer creation~~

* Introduce a profiling flag to CL_COMMAND_BUFFER_FLAGS_KHR that enables whether a command-buffer can be profiled on a profiling enabled queue. At the time of finalizing a command-buffer extra work may need to be done in the backend make a command-buffer profilable.

Edit: added strike-through formatting to proposal component no longer required based on #1142 (comment)

EwanC · 2024-12-04T13:10:20Z

Introduce a profiling flag to CL_COMMAND_BUFFER_FLAGS_KHR that enables whether a command-buffer can be profiled on a profiling enabled queue. At the time of finalizing a command-buffer extra work may need to be done in the backend make a command-buffer profilable.

On the 11/04/24 teleconference it was discussed whether we need this part of the proposal, as it complicates spec language by needing to say whether it's invalid/ignored if you mismatch a profiling command-buffer with profiling queue. e.g. command-buffer without profiling to a profiling queue, or command-buffer with profiling to a non-profiling queue.

Thinking about this some more, I don't think we do need to specify upfront on creation if a command-buffer is profilable or not. It can be implemented by the runtime bookending the command-buffer submission with command submissions to the queue which will give the relevant CL_PROFILING_COMMAND_START / CL_PROFILING_COMMAND_END timestamps. So we could tend on
the side of dropping this part of the proposal, and only adding it back later if required.

Background info why I initially suggested this:

If you have a command-buffer with linear dependencies, the backend might finalize the command-buffer representation to something on hardware with no device synchronization primitives, so any synchronization must be done on the host side.
Host-side sync will therefore be done to synchronize any extra implementation created commands used to get profiling info.
But if the runtime knew when finalizing the command-buffer that it could be used for profiling, then it could include extra device synchronization primitives for synchronizing on device to any implementation created profiling commands.

Using L0 to illustrate (based loosely on what's described here):

Finalized command-buffer is an in-order L0 command-list, where no L0 event objects created/signaled from any command, because they are not required for command dependencies. Therefore the host-side fence synchronization fence returned from command-list execution would need to be used to sync on host to an extra command-list submission containing a zeCommandListAppendQueryKernelTimestamps command.
If we know when the command-buffer was finalized that it could be used for profiling, then we could add an L0 event to be signaled on the completion of the last command in the in-order command-list, and use that as a dependency of the zeCommandListAppendQueryKernelTimestamps command in the extra command-list submission.

bashbaug · 2024-12-12T23:08:39Z

Discussed in the December 10th teleconference. Ewan will create a PR that relaxes the queue compatibility requirements, as described above. It will build off of the changes in #850, but we will create it as a separate PR, just in case we need to go back to #850 for some reason.

As proposed in KhronosGroup#1142 the PR changes the semantics of the command-queues parameters used for command-buffer creation and enqueue. The queues used on command-buffer creation now only inform the device and dependencies of commands, rather than restricting the properties set on the queues used for command-buffer enqueue.

As proposed in KhronosGroup#1142 the PR changes the semantics of the command-queues parameters used for command-buffer creation and enqueue. The queues used on command-buffer creation now only inform the device and dependencies of commands, rather than restricting the properties set on the queues used for command-buffer enqueue. This is based ontop on the change in KhronosGroup#850 to add supported queue property semantics.

EwanC · 2024-12-16T09:58:10Z

Discussed in the December 10th teleconference. Ewan will create a PR that relaxes the queue compatibility requirements, as described above. It will build off of the changes in #850, but we will create it as a separate PR, just in case we need to go back to #850 for some reason.

Draft opened here #1292

Ideas for additional CTS tests:

Create a command-buffer with an in-order queue, enqueue on an out-of-order queue targeting same device.
Create a command-buffer using an out-of-order queue, which is enqueued to an in-order queue targeting the same device.
Create a command-buffer using a queue without the profiling property, which is enqueue to an queue with the profiling property, and event returned queried for profiling info.
Verify that CL_DEVICE_COMMAND_BUFFER_SUPPORTED_QUEUE_PROPERTIES_KHR query returns at least CL_QUEUE_PROFILING_ENABLE .
Negative test that an error is thrown if user tries to enqueue a command-buffer to a queue with a property not in CL_DEVICE_COMMAND_BUFFER_SUPPORTED_QUEUE_PROPERTIES_KHR.

EwanC added the cl_khr_command_buffer Relating to the command-buffer family of extension label Apr 4, 2024

bashbaug added the needs-cts-coverage The CTS needs to be extended label Dec 10, 2024

bashbaug mentioned this issue Dec 12, 2024

New Command-buffer query for supported queue properties #850

Draft

EwanC mentioned this issue Dec 16, 2024

Refactor command-buffer queue compatability #1292

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Command-buffer queue comptabaility #1142

Command-buffer queue comptabaility #1142

EwanC commented Apr 4, 2024

bashbaug commented Nov 19, 2024

EwanC commented Nov 27, 2024 •

edited

Loading

EwanC commented Dec 4, 2024

bashbaug commented Dec 12, 2024

EwanC commented Dec 16, 2024 •

edited

Loading

Command-buffer queue comptabaility #1142

Command-buffer queue comptabaility #1142

Comments

EwanC commented Apr 4, 2024

bashbaug commented Nov 19, 2024

EwanC commented Nov 27, 2024 • edited Loading

Command-buffer Append

Command-buffer Enqueue

EwanC commented Dec 4, 2024

bashbaug commented Dec 12, 2024

EwanC commented Dec 16, 2024 • edited Loading

EwanC commented Nov 27, 2024 •

edited

Loading

EwanC commented Dec 16, 2024 •

edited

Loading