fix queue-wide barriers with multiple active command queues#1585
Merged
kbenzie merged 1 commit intooneapi-src:mainfrom May 9, 2024
Merged
fix queue-wide barriers with multiple active command queues#1585kbenzie merged 1 commit intooneapi-src:mainfrom
kbenzie merged 1 commit intooneapi-src:mainfrom
Conversation
This is another attempt to fix an issue where the L0 Adapter crashes when urEnqueueEventsWaitWithBarrier in presence of multiple active command queues with batched cmdlists. The core issue is that the current queue implementation only allows for two command lists to be active open batches, one for copy and one for compute. If that assumption doesn't hold, the getAvailableCommandList function, when executed multiple times for a command queue of the same type, will override the active open command batch. So only the last retrieved command list can actually be batched. After this, when the code attempts to execute all the command lists it collected, with batching enabled. And this is where we hit an assert because the active open command list doesn't match what is being used. The proper fix here is to allow open command batches for each command queue. But that's a fairly risky change to do this late in the release cycle. My previous attempt at a fix simply disabled batching for queue-wide barriers (oneapi-src#1555). That introduced regressions in tests that assumed that batching happens. It might also been a performance regression. Instead, this patch fixes getAvailableCommandList when batching is enabled and specific command queue is required, and disables batching only for cases where the open batch cmdlist is different than the one we are executing.
Contributor
Author
nrspruit
approved these changes
May 7, 2024
Contributor
nrspruit
left a comment
There was a problem hiding this comment.
Looks good to me, ensuring that the queue is matching should address the current issue, thanks for the catch!
igchor
approved these changes
May 8, 2024
kbenzie
added a commit
to kbenzie/unified-runtime
that referenced
this pull request
May 9, 2024
fix queue-wide barriers with multiple active command queues
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is another attempt to fix an issue where the L0 Adapter crashes when urEnqueueEventsWaitWithBarrier in presence of multiple active command queues with batched cmdlists.
The core issue is that the current queue implementation only allows for two command lists to be active open batches, one for copy and one for compute. If that assumption doesn't hold, the getAvailableCommandList function, when executed multiple times for a command queue of the same type, will override the active open command batch. So only the last retrieved command list can actually be batched. After this, when the code attempts to execute all the command lists it collected, with batching enabled. And this is where we hit an assert because the active open command list doesn't match what is being used.
The proper fix here is to allow open command batches for each command queue. But that's a fairly risky change to do this late in the release cycle.
My previous attempt at a fix simply disabled batching for queue-wide barriers (#1555). That introduced regressions in tests that assumed that batching happens. It might also been a performance regression.
Instead, this patch fixes getAvailableCommandList when batching is enabled and specific command queue is required, and disables batching only for cases where the open batch cmdlist is different than the one we are executing.