Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command-buffer queue comptabaility #1142

Open
EwanC opened this issue Apr 4, 2024 · 5 comments
Open

Command-buffer queue comptabaility #1142

EwanC opened this issue Apr 4, 2024 · 5 comments
Labels
cl_khr_command_buffer Relating to the command-buffer family of extension needs-cts-coverage The CTS needs to be extended

Comments

@EwanC
Copy link
Contributor

EwanC commented Apr 4, 2024

The current clCreateCommandBufferKHR API takes a list of queues on creation, and enforces that the properties of these queues must match the properties of the matching queues passed in clEnqueueCommandBufferKHR. However, this API design does not work well for applications that do not know at the time of command-buffer creation what the properties of the queue that the command-buffer will be executed on are.

In particular this is an issue for SYCL-Graph layering, where when the command-buffer is created during graph.finalize(), the SYCL-RT does not yet know what SYCL queue a user will submit the graph to. The graph is treated as a black box by the SYCL queue that is executed in-order or out-of-order with respect to the other SYCL queue commands depending on the type of queue.

There are two possible approaches to help with this in cl_khr_command_buffer that I've thought about so far:

  1. Optional feature to relax queue compatibility constraint - Add an optional feature to relax the constraint that the queues passed to clEnqueueCommandBuffer must be compatible with those set on creation (or maybe a query to check for "compatible" means for that vendor). The SYCL-RT could already use the clRemapCommandBufferKHR() API with CL_COMMAND_BUFFER_PLATFORM_REMAP_QUEUES_KHR for this to created a remap of the command-buffer before each enqueue during graph submission, but we don't really want to incur the deep copy cost here and the cl_khr_command_buffer_multi_device extension also doesn't have a large amount of vendor coverage. However, a drawback of this is that the SYCL-RT would still need to create a placeholder OpenCL queue when the command-buffer is created and commands added, which has a construction overhead.

  2. Remove the queue object from the command-buffer interface enqueue time. We could replace the command-queue parameters with device parameters (and a context parameter on creation) on command-queue creation/command adding/remap and then only introduce the queues at clEnqueueCommandBuffer time. We'd probably still want more command-buffer properties so that a user can say up-front what type of queue they want the command-buffer to run on if they want a more optimized implementation. This is quite an invasive change.

I haven't really thought all this through yet but the design point has come up recently so I think it's worth creating an issue to track the discussion.

@EwanC EwanC added the cl_khr_command_buffer Relating to the command-buffer family of extension label Apr 4, 2024
@bashbaug
Copy link
Contributor

Need to consider interactions with PR #850.

@EwanC
Copy link
Contributor Author

EwanC commented Nov 27, 2024

I've attempted to form my current thoughts into a new proposal based around idea 1. The cl_khr_command_buffer_multi_device layered extension complicates things, which I haven't fully thought through yet, but will do if folks think this is a good direction.

Command-buffer Append

Semantics of command-queue properties set on the queue passed when appending commands.

  • in-order/out-of-order property - Affects the dependencies of the command being created.
  • Profiling property - Ignored, as the extension does not yet expose profiling at the granularity of commands.
  • Extension defined properties - Ignored, unless interaction with cl_khr_command_buffer commands is defined by that extension.

Command-buffer Enqueue

Semantics of command-queue properties affecting clEnqueueCommandBufferKHR is in line with the regular command-queue property semantics for enqueuing work:

  • in-order/out-of-order property - Affects the dependencies on the execution of the whole command-buffer.
  • Profiling property - Affects whether the cl_event returned for the whole command-buffer execution can be profiled.

Changes to what a compatible queue means

  • SUPPORTED & REQUIRED queue properties from New Command-buffer query for supported queue properties #850
    only affect the queues being enqueued to.
  • For a queue to be compatible with the queue set on creation recording, it must now only match the device & context of the original queue, but it may have different properties within the bounds of the REQUIRED and SUPPORTED queries.

## Command-buffer Creation

Changes to command-buffer creation

* Introduce a profiling flag to CL_COMMAND_BUFFER_FLAGS_KHR that enables whether a command-buffer can be profiled on a profiling enabled queue. At the time of finalizing a command-buffer extra work may need to be done in the backend make a command-buffer profilable.

Edit: added strike-through formatting to proposal component no longer required based on #1142 (comment)

@EwanC
Copy link
Contributor Author

EwanC commented Dec 4, 2024

Introduce a profiling flag to CL_COMMAND_BUFFER_FLAGS_KHR that enables whether a command-buffer can be profiled on a profiling enabled queue. At the time of finalizing a command-buffer extra work may need to be done in the backend make a command-buffer profilable.

On the 11/04/24 teleconference it was discussed whether we need this part of the proposal, as it complicates spec language by needing to say whether it's invalid/ignored if you mismatch a profiling command-buffer with profiling queue. e.g. command-buffer without profiling to a profiling queue, or command-buffer with profiling to a non-profiling queue.

Thinking about this some more, I don't think we do need to specify upfront on creation if a command-buffer is profilable or not. It can be implemented by the runtime bookending the command-buffer submission with command submissions to the queue which will give the relevant CL_PROFILING_COMMAND_START / CL_PROFILING_COMMAND_END timestamps. So we could tend on
the side of dropping this part of the proposal, and only adding it back later if required.


Background info why I initially suggested this:

  • If you have a command-buffer with linear dependencies, the backend might finalize the command-buffer representation to something on hardware with no device synchronization primitives, so any synchronization must be done on the host side.

  • Host-side sync will therefore be done to synchronize any extra implementation created commands used to get profiling info.

  • But if the runtime knew when finalizing the command-buffer that it could be used for profiling, then it could include extra device synchronization primitives for synchronizing on device to any implementation created profiling commands.

Using L0 to illustrate (based loosely on what's described here):

  • Finalized command-buffer is an in-order L0 command-list, where no L0 event objects created/signaled from any command, because they are not required for command dependencies. Therefore the host-side fence synchronization fence returned from command-list execution would need to be used to sync on host to an extra command-list submission containing a zeCommandListAppendQueryKernelTimestamps command.

  • If we know when the command-buffer was finalized that it could be used for profiling, then we could add an L0 event to be signaled on the completion of the last command in the in-order command-list, and use that as a dependency of the zeCommandListAppendQueryKernelTimestamps command in the extra command-list submission.

@bashbaug
Copy link
Contributor

Discussed in the December 10th teleconference. Ewan will create a PR that relaxes the queue compatibility requirements, as described above. It will build off of the changes in #850, but we will create it as a separate PR, just in case we need to go back to #850 for some reason.

EwanC added a commit to EwanC/OpenCL-Docs that referenced this issue Dec 13, 2024
As proposed in KhronosGroup#1142
the PR changes the semantics of the command-queues parameters used for
command-buffer creation and enqueue.

The queues used on command-buffer creation now only inform the
device and dependencies of commands, rather than restricting the
properties set on the queues used for command-buffer enqueue.
EwanC added a commit to EwanC/OpenCL-Docs that referenced this issue Dec 16, 2024
As proposed in KhronosGroup#1142
the PR changes the semantics of the command-queues parameters used for
command-buffer creation and enqueue.

The queues used on command-buffer creation now only inform the
device and dependencies of commands, rather than restricting the
properties set on the queues used for command-buffer enqueue.

This is based ontop on the change in KhronosGroup#850
to add supported queue property semantics.
@EwanC
Copy link
Contributor Author

EwanC commented Dec 16, 2024

Discussed in the December 10th teleconference. Ewan will create a PR that relaxes the queue compatibility requirements, as described above. It will build off of the changes in #850, but we will create it as a separate PR, just in case we need to go back to #850 for some reason.

Draft opened here #1292

Ideas for additional CTS tests:

  • Create a command-buffer with an in-order queue, enqueue on an out-of-order queue targeting same device.
  • Create a command-buffer using an out-of-order queue, which is enqueued to an in-order queue targeting the same device.
  • Create a command-buffer using a queue without the profiling property, which is enqueue to an queue with the profiling property, and event returned queried for profiling info.
  • Verify that CL_DEVICE_COMMAND_BUFFER_SUPPORTED_QUEUE_PROPERTIES_KHR query returns at least CL_QUEUE_PROFILING_ENABLE .
  • Negative test that an error is thrown if user tries to enqueue a command-buffer to a queue with a property not in CL_DEVICE_COMMAND_BUFFER_SUPPORTED_QUEUE_PROPERTIES_KHR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cl_khr_command_buffer Relating to the command-buffer family of extension needs-cts-coverage The CTS needs to be extended
Projects
None yet
Development

No branches or pull requests

2 participants