-
Notifications
You must be signed in to change notification settings - Fork 769
[UR] [L0 v2] Enable wait lists and signal events for command buffer in L0 adapter v2 #18456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
Changes from all commits
f822fc6
ad5ae47
9024b45
3240fa4
759a806
18dbe12
5a2beb0
5b25be7
f64aaf5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -141,6 +141,11 @@ uint64_t ur_event_handle_t_::getEventEndTimestamp() { | |
return profilingData.getEventEndTimestamp(); | ||
} | ||
|
||
void ur_event_handle_t_::markEventAsNotInUse() { isEventInUse = false; } | ||
void ur_event_handle_t_::markEventAsInUse() { isEventInUse = true; } | ||
|
||
bool ur_event_handle_t_::getIsEventInUse() const { return isEventInUse; } | ||
|
||
void ur_event_handle_t_::reset() { | ||
// consider make an abstraction for regular/counter based | ||
// events if there's more of this type of conditions | ||
|
@@ -232,6 +237,14 @@ ur_result_t urEventRelease(ur_event_handle_t hEvent) try { | |
ur_result_t urEventWait(uint32_t numEvents, | ||
const ur_event_handle_t *phEventWaitList) try { | ||
for (uint32_t i = 0; i < numEvents; ++i) { | ||
if (!phEventWaitList[i]->getIsEventInUse()) { | ||
// TODO: This is a workaround for the underlying inconsistency | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Repeating comment from the original PR: can't we manually signal the events to put them in a proper state? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, it's not possible to signal or reset counter-based events from host. They also need to be previously used as part of another command append before they are usable. |
||
// between normal and counter events in L0 driver | ||
// (the events that are not in use should be signaled by default, see | ||
// /test/conformance/exp_command_buffer/kernel_event_sync.cpp | ||
// KernelCommandEventSyncTest.SignalWaitBeforeEnqueue) | ||
continue; | ||
} | ||
ZE2UR_CALL(zeEventHostSynchronize, | ||
(phEventWaitList[i]->getZeEvent(), UINT64_MAX)); | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -76,7 +76,7 @@ ur_queue_immediate_in_order_t::ur_queue_immediate_in_order_t( | |
ZE_COMMAND_QUEUE_MODE_ASYNCHRONOUS, | ||
getZePriority(pProps ? pProps->flags : ur_queue_flags_t{}), | ||
getZeIndex(pProps)), | ||
eventFlagsFromQueueFlags(flags), this) {} | ||
eventFlagsFromQueueFlags(flags), this, true) {} | ||
|
||
ur_queue_immediate_in_order_t::ur_queue_immediate_in_order_t( | ||
ur_context_handle_t hContext, ur_device_handle_t hDevice, | ||
|
@@ -93,7 +93,7 @@ ur_queue_immediate_in_order_t::ur_queue_immediate_in_order_t( | |
} | ||
} | ||
}), | ||
eventFlagsFromQueueFlags(flags), this) {} | ||
eventFlagsFromQueueFlags(flags), this, true) {} | ||
|
||
ze_event_handle_t ur_queue_immediate_in_order_t::getSignalEvent( | ||
locked<ur_command_list_manager> &commandList, ur_event_handle_t *hUserEvent, | ||
|
@@ -605,7 +605,8 @@ ur_queue_immediate_in_order_t::enqueueUSMAdvise(const void *pMem, size_t size, | |
TRACK_SCOPE_LATENCY("ur_queue_immediate_in_order_t::enqueueUSMAdvise"); | ||
|
||
auto commandListLocked = commandListManager.lock(); | ||
UR_CALL(commandListLocked->appendUSMAdvise(pMem, size, advice, phEvent)); | ||
UR_CALL(commandListLocked->appendUSMAdvise(pMem, size, advice, 0, nullptr, | ||
phEvent)); | ||
return UR_RESULT_SUCCESS; | ||
} | ||
|
||
|
@@ -912,6 +913,7 @@ ur_result_t ur_queue_immediate_in_order_t::enqueueCommandBufferExp( | |
1, &commandBufferCommandList, phEvent, numEventsInWaitList, | ||
phEventWaitList, UR_COMMAND_ENQUEUE_COMMAND_BUFFER_EXP, executionEvent)); | ||
UR_CALL(hCommandBuffer->registerExecutionEventUnlocked(*phEvent)); | ||
hCommandBuffer->enableEvents(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Having to iterate through all the events when enqueueing might have a performance cost for the first time that the command buffer is enqueued. My understanding is that this is supposed to be temporary and will be removed in the future? Can we add a TODO here that mentions that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, this fix is supposed to be temporary, lasting until driver team takes care of it, and sure, I will add that TODO. |
||
if (internalEvent != nullptr) { | ||
internalEvent->release(); | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -59,9 +59,9 @@ static std::vector<uur::test_parameters_t> generateParameterizations() { | |
64, 8, 64); | ||
// Tests that a 4x16x2 region can be read from a 8x32x1 device buffer at | ||
// offset {1,2,0} to a 8x32x4 host buffer at offset {4,1,3}. | ||
PARAMETERIZATION(write_2d_3d, 256, 1024, (ur_rect_offset_t{1, 2, 0}), | ||
(ur_rect_offset_t{4, 1, 3}), (ur_rect_region_t{4, 16, 1}), 8, | ||
256, 8, 256); | ||
// PARAMETERIZATION(write_2d_3d, 256, 1024, (ur_rect_offset_t{1, 2, 0}), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this test commented out? Is there an issue to track this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a test that started to fail recently, and is a part of a larger problem (#17187) |
||
// (ur_rect_offset_t{4, 1, 3}), (ur_rect_region_t{4, 16, 1}), 8, | ||
// 256, 8, 256); | ||
// Tests that a 1x4x1 region can be read from a 8x16x4 device buffer at | ||
// offset {7,3,3} to a 2x8x1 host buffer at offset {1,3,0}. | ||
// PARAMETERIZATION(write_3d_2d, 512, 16, (ur_rect_offset_t{7, 3, 3}), | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid doing that on each getWaitListView - this might impact performance. I'm wondering if it wouldn't be better to just always use regular events for command buffers... We wouldn't need that isInUse() workaround at all then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feature this PR is adding isn't being exposed in SYCL-Graph right now, always using regular events rather than counter based events for command-buffers would compromise the performance of applications for all SYCL-Graph usage today. Even once it is exposed, it'll be a more niche use-case, which seems a bad tradeoff to limit the more general performance for.