[UR][L0 v2 adapter] on queueFinish() synchronize only used command lists#21224
[UR][L0 v2 adapter] on queueFinish() synchronize only used command lists#21224mateuszpn wants to merge 2 commits intointel:syclfrom
Conversation
|
|
||
| // Track which command lists have pending work to avoid unnecessary | ||
| // synchronization in queueFinish(). Each bit represents one command list. | ||
| std::atomic<uint32_t> usedCommandListsMask = 0; |
There was a problem hiding this comment.
It's probably not gonna happen, but in case the hardcoded number of command lists ever gets changed to something bigger than 32, could you add a comment at the static constexpr size_t numCommandLists = 4; definition that there's a mask that would also need to be modified.
| std::array<ur_event_handle_t, numCommandLists> barrierEvents; | ||
|
|
||
| uint32_t getNextCommandListId() { | ||
| uint32_t getNextCommandListId(bool markUsed = true) { |
There was a problem hiding this comment.
I looked at the uses of getNextCommandListId, and I don't see the point of using atomic variables, for the mask you are adding here, and commandListIndex. The parallelization we can achieve is minimal. And using a single queue from multiple threads is already not a frequent use.
I suggest changing this to regular variables under common lockable with commandListManagers. Creating a class with these three variables would make the most sense to me.
Avoid unnecessary overhead on queueFinish, synchronize only command lists that actually have been used