Skip to content

fix queue-wide barriers with multiple active command queues#1585

Merged
kbenzie merged 1 commit intooneapi-src:mainfrom
pbalcer:queue-barrier-fix-v2
May 9, 2024
Merged

fix queue-wide barriers with multiple active command queues#1585
kbenzie merged 1 commit intooneapi-src:mainfrom
pbalcer:queue-barrier-fix-v2

Conversation

@pbalcer
Copy link
Contributor

@pbalcer pbalcer commented May 7, 2024

This is another attempt to fix an issue where the L0 Adapter crashes when urEnqueueEventsWaitWithBarrier in presence of multiple active command queues with batched cmdlists.

The core issue is that the current queue implementation only allows for two command lists to be active open batches, one for copy and one for compute. If that assumption doesn't hold, the getAvailableCommandList function, when executed multiple times for a command queue of the same type, will override the active open command batch. So only the last retrieved command list can actually be batched. After this, when the code attempts to execute all the command lists it collected, with batching enabled. And this is where we hit an assert because the active open command list doesn't match what is being used.

The proper fix here is to allow open command batches for each command queue. But that's a fairly risky change to do this late in the release cycle.

My previous attempt at a fix simply disabled batching for queue-wide barriers (#1555). That introduced regressions in tests that assumed that batching happens. It might also been a performance regression.

Instead, this patch fixes getAvailableCommandList when batching is enabled and specific command queue is required, and disables batching only for cases where the open batch cmdlist is different than the one we are executing.

This is another attempt to fix an issue where the L0 Adapter
crashes when urEnqueueEventsWaitWithBarrier in presence of
multiple active command queues with batched cmdlists.

The core issue is that the current queue implementation only
allows for two command lists to be active open batches, one
for copy and one for compute. If that assumption doesn't hold,
the getAvailableCommandList function, when executed multiple times
for a command queue of the same type, will override the active
open command batch. So only the last retrieved command list can
actually be batched. After this, when the code attempts to execute
all the command lists it collected, with batching enabled. And this
is where we hit an assert because the active open command list
doesn't match what is being used.

The proper fix here is to allow open command batches for each
command queue. But that's a fairly risky change to do this late
in the release cycle.

My previous attempt at a fix simply disabled batching
for queue-wide barriers (oneapi-src#1555). That introduced regressions
in tests that assumed that batching happens. It might also been
a performance regression.

Instead, this patch fixes getAvailableCommandList when batching
is enabled and specific command queue is required, and disables
batching only for cases where the open batch cmdlist is different
than the one we are executing.
@pbalcer pbalcer requested a review from a team as a code owner May 7, 2024 11:02
@github-actions github-actions bot added the level-zero L0 adapter specific issues label May 7, 2024
@pbalcer pbalcer requested a review from nrspruit May 7, 2024 11:02
@pbalcer
Copy link
Contributor Author

pbalcer commented May 7, 2024

intel/llvm#13684

@pbalcer pbalcer added the v0.9.x Include in the v0.9.x release label May 7, 2024
Copy link
Contributor

@nrspruit nrspruit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, ensuring that the queue is matching should address the current issue, thanks for the catch!

@nrspruit nrspruit added the ready to merge Added to PR's which are ready to merge label May 8, 2024
@kbenzie kbenzie merged commit dd212f3 into oneapi-src:main May 9, 2024
kbenzie added a commit to kbenzie/unified-runtime that referenced this pull request May 9, 2024
fix queue-wide barriers with multiple active command queues
@kbenzie kbenzie mentioned this pull request May 9, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

level-zero L0 adapter specific issues ready to merge Added to PR's which are ready to merge v0.9.x Include in the v0.9.x release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants