-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL] Do not store last event for in-order queues #18277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
Conversation
ce2652e
to
dac0398
Compare
dac0398
to
375b895
Compare
a7b84a1
to
ce4ac8a
Compare
ce4ac8a
to
7ce77ca
Compare
7ce77ca
to
39c5740
Compare
39c5740
to
1e2bf93
Compare
I think I'm missing something. Is this so we can make sure that the second host task in your example depends only on specific previous commands? Why isn't it sufficient to just insert a barrier in the queue before we launch the host task, store the event associated with that barrier inside the host task implementation, and wait on it before the host task executes? Wouldn't that ensure that everything previously submitted to the queue was complete before the host task executed? |
I second that, though the proper event would be a "marker" (in OpenCL terminology) instead of a barrier. For in-order queues they are the same, but for out-of-order (which I don't know if this applies to, but better be safe) the marker won't block other work from starting before it finishes, while a barrier would. For reference:
|
@steffenlarsen @Pennycook that's what we are doing right now, however, this is not enough - the problem is with all the kernels submitted after the host task. With the scenario I mentioned in the first comment what actually happens is this:
For any operation that comes after that, we still need to make sure they are synchronized with the kernel we just submitted. Since the kernel might not yet be enqueued to UR, we need to call depends_on() and rely on the scheduler. In my patch I implement a way to 'go back' to the eventless mode by checking if the LastEvent is completed. If it is, we don't need any calls to depends_on() anymore. |
8405663
to
e96b4d6
Compare
Instead of calling Or does that not work because those barriers (and the kernels that follow them) would actually need to be enqueued to UR? |
e96b4d6
to
f63ce1e
Compare
Yes, we can only submit barrier to UR but since UR does not know anything about the host tasks, they would have no effect. The only way to synchronize with host_task (as far as I'm aware) is to use the SYCL event. The SYCL scheduler can execute the host tasks in any order (assuming that dependencies are completed), and there is no specific handling for host tasks originating from in-order queue in the scheduler. I think we won't be able to solve this until we have host task implementation in UR. |
Ah, thank you for clarifying. I wonder (and maybe this would be a follow-up) but could we have host-task register its event as an "external event" like with |
That's an interesting idea, however, I think handling graphs could be problematic. Right now, we have a separate Also, we were actually thinking about deprecating |
e1c4ba6
to
3aea08d
Compare
For opencl, always store the last event to support queue_empty(), just don't use it for synchronization
I'm fine with either. The main reason I talked about external event here is that it clears itself after first use, which seems like something we could also do for
If we need to keep the "last event" for the sake of |
unless Host Tasks are used.
Without Host Tasks, we can just rely on UR for ordering. Having no last event means that ext_oneapi_get_last_event() needs to submit a barrier to return an event to the user. Similarly, ext_oneapi_submit_barrier() now always submits a barrier, even for in-order queues.
Whenever Host Tasks are used we need to start recording all events. This is needed because of how kernel submission synchronizes with Host Tasks. With a following scenario:
q.host_task();
q.submit_kernel();
q.host_task():
The kernel won't even be submitted to UR until the first Host Task completes. To properly synchronize the second Host Task we need to keep the event describing kernel submission.