Skip to content

Commit

Permalink
[SYCL] Add mode where only last command in each batch yields a host-v…
Browse files Browse the repository at this point in the history
…isible event (#5354)
  • Loading branch information
smaslov-intel authored Jan 25, 2022
1 parent 159a516 commit c6b7b8e
Show file tree
Hide file tree
Showing 3 changed files with 151 additions and 107 deletions.
2 changes: 1 addition & 1 deletion sycl/doc/EnvironmentVariables.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ variables in production code.</span>
| `SYCL_PI_LEVEL_ZERO_FILTER_EVENT_WAIT_LIST` | Integer | When set to 0, disables filtering of signaled events from wait lists when using the Level Zero backend. The default is 1. |
| `SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE` | Any(\*) | This environment variable enables users to control use of copy engines for copy operations. If the value is an integer, it will allow the use of copy engines, if available in the device, in Level Zero plugin to transfer SYCL buffer or image data between the host and/or device(s) and to fill SYCL buffer or image data in device or shared memory. The value of this environment variable can also be a pair of the form "lower_index:upper_index" where the indices point to copy engines in a list of all available copy engines. The default is 1. |
| `SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE_FOR_D2D_COPY` (experimental) | Integer | Allows the use of copy engine, if available in the device, in Level Zero plugin for device to device copy operations. The default is 0. This option is experimental and will be removed once heuristics are added to make a decision about use of copy engine for device to device copy operations. |
| `SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS` | Any(\*) | Enable support of device-scope events whose state is not visible to the host. If enabled the Level Zero plugin would create all events having device-scope only and create proxy host-visible events for them when their status is needed (wait/query) on the host. The default is 0, meaning all events are host-visible. |
| `SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS` | Any(\*) | Enable support of device-scope events whose state is not visible to the host. If enabled mode is SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS=1 the Level Zero plugin would create all events having device-scope only and create proxy host-visible events for them when their status is needed (wait/query) on the host. If enabled mode is SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS=2 the Level Zero plugin would create all events having device-scope and add proxy host-visible event at the end of each command-list submission. The default is 0, meaning all events are host-visible. |

## Debugging variables for CUDA Plugin

Expand Down
231 changes: 136 additions & 95 deletions sycl/plugins/level_zero/pi_level_zero.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ extern "C" {
// Forward declarartions.
static pi_result EventRelease(pi_event Event, pi_queue LockedQueue);
static pi_result QueueRelease(pi_queue Queue, pi_queue LockedQueue);
static pi_result EventCreate(pi_context Context, bool HostVisible,
pi_event *RetEvent);
}

namespace {
Expand Down Expand Up @@ -186,12 +188,31 @@ static void zePrint(const char *Format, ...) {
}
}

// Controls whether device-scope events are used.
static const bool ZeAllHostVisibleEvents = [] {
// Controls whether device-scope events are used, and how.
static const enum EventsScope {
// All events are created host-visible (the default mode)
AllHostVisible,
// All events are created with device-scope and only when
// host waits them or queries their status that a proxy
// host-visible event is created and set to signal after
// original event signals.
OnDemandHostVisibleProxy,
// All events are created with device-scope and only
// when a batch of commands is submitted for execution a
// last command in that batch is added to signal host-visible
// completion of each command in this batch.
LastCommandInBatchHostVisible
} EventsScope = [] {
const auto DeviceEventsStr =
std::getenv("SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS");
bool result = (DeviceEventsStr ? (std::atoi(DeviceEventsStr) == 0) : true);
return result;

switch (DeviceEventsStr ? std::atoi(DeviceEventsStr) : 0) {
case 1:
return OnDemandHostVisibleProxy;
case 2:
return LastCommandInBatchHostVisible;
}
return AllHostVisible;
}();

// Maximum number of events that can be present in an event ZePool is captured
Expand Down Expand Up @@ -415,14 +436,11 @@ _pi_context::getFreeSlotInExistingOrNewPool(ze_event_pool_handle_t &Pool,
ze_event_pool_flag_t ZePoolFlag = {};
std::list<ze_event_pool_handle_t> *ZePoolCache;

if (ZeAllHostVisibleEvents) {
ZePoolFlag = ZE_EVENT_POOL_FLAG_HOST_VISIBLE;
ZePoolCache = &ZeEventPoolCache;
} else if (HostVisible) {
if (HostVisible) {
ZePoolFlag = ZE_EVENT_POOL_FLAG_HOST_VISIBLE;
ZePoolCache = &ZeHostVisibleEventPoolCache;
} else {
ZePoolCache = &ZeEventPoolCache;
ZePoolCache = &ZeDeviceScopeEventPoolCache;
}

// Remove full pool from the cache.
Expand Down Expand Up @@ -468,30 +486,24 @@ pi_result _pi_context::decrementUnreleasedEventsInPool(pi_event Event) {
return PI_SUCCESS;
}

std::list<ze_event_pool_handle_t> *ZePoolCache;
if (Event->IsHostVisible()) {
ZePoolCache = &ZeHostVisibleEventPoolCache;
} else {
ZePoolCache = &ZeDeviceScopeEventPoolCache;
}

// Put the empty pool to the cache of the pools.
std::lock_guard<std::mutex> Lock(ZeEventPoolCacheMutex);
if (NumEventsUnreleasedInEventPool[Event->ZeEventPool] == 0)
die("Invalid event release: event pool doesn't have unreleased events");
if (--NumEventsUnreleasedInEventPool[Event->ZeEventPool] == 0) {
if (ZeEventPoolCache.front() != Event->ZeEventPool) {
ZeEventPoolCache.push_back(Event->ZeEventPool);
if (ZePoolCache->front() != Event->ZeEventPool) {
ZePoolCache->push_back(Event->ZeEventPool);
}
NumEventsAvailableInEventPool[Event->ZeEventPool] = MaxNumEventsPerPool;
}

if (Event->ZeHostVisibleEventPool) {
if (NumEventsUnreleasedInEventPool[Event->ZeHostVisibleEventPool] == 0)
die("Invalid host visible event release: host visible event pool doesn't "
"have unreleased events");
if (--NumEventsUnreleasedInEventPool[Event->ZeHostVisibleEventPool] == 0) {
if (ZeHostVisibleEventPoolCache.front() !=
Event->ZeHostVisibleEventPool) {
ZeHostVisibleEventPoolCache.push_back(Event->ZeHostVisibleEventPool);
}
NumEventsAvailableInEventPool[Event->ZeHostVisibleEventPool] =
MaxNumEventsPerPool;
}
}
return PI_SUCCESS;
}

Expand Down Expand Up @@ -788,12 +800,12 @@ pi_result _pi_context::finalize() {
// For example, event pool caches would be still alive.
{
std::lock_guard<std::mutex> Lock(ZeEventPoolCacheMutex);
for (auto &ZePool : ZeEventPoolCache)
for (auto &ZePool : ZeDeviceScopeEventPoolCache)
ZE_CALL(zeEventPoolDestroy, (ZePool));
for (auto &ZePool : ZeHostVisibleEventPoolCache)
ZE_CALL(zeEventPoolDestroy, (ZePool));

ZeEventPoolCache.clear();
ZeDeviceScopeEventPoolCache.clear();
ZeHostVisibleEventPoolCache.clear();
}

Expand Down Expand Up @@ -1321,6 +1333,39 @@ pi_result _pi_queue::executeCommandList(pi_command_list_ptr_t CommandList,
KernelsToBeSubmitted.clear();
}

// In this mode all inner-batch events have device visibility only,
// and we want the last command in the batch to signal a host-visible
// event that anybody waiting for any event in the batch will
// really be using.
//
if (EventsScope == LastCommandInBatchHostVisible) {
// Create a "proxy" host-visible event.
//
pi_event HostVisibleEvent;
PI_CALL(EventCreate(Context, true, &HostVisibleEvent));

// Update each command's event in the command-list to "see" this
// proxy event as a host-visible counterpart.
for (auto &Event : CommandList->second.EventList) {
Event->HostVisibleEvent = HostVisibleEvent;
PI_CALL(piEventRetain(HostVisibleEvent));
}

// Decrement the reference count by 1 so all the remaining references
// are from the other commands in this batch. This host-visible event
// will be destroyed after all events in the batch are gone.
PI_CALL(piEventRelease(HostVisibleEvent));
// Indicate no cleanup is needed for this PI event as it is special.
HostVisibleEvent->CleanedUp = true;

// Finally set to signal the host-visible event at the end of the
// command-list.
// TODO: see if we need a barrier here (or explicit wait for all events in
// the batch).
ZE_CALL(zeCommandListAppendSignalEvent,
(CommandList->first, HostVisibleEvent->ZeEvent));
}

// Close the command list and have it ready for dispatch.
ZE_CALL(zeCommandListClose, (CommandList->first));
// Offload command list to the GPU for asynchronous execution
Expand Down Expand Up @@ -1504,9 +1549,10 @@ pi_result _pi_ze_event_list_t::createAndRetainPiZeEventList(
auto ZeEvent = EventList[I]->ZeEvent;

// Poll of the host-visible events.
auto ZeEventHostVisible = EventList[I]->getHostVisibleEvent();
if (FilterEventWaitList && ZeEventHostVisible) {
auto Res = ZE_CALL_NOCHECK(zeEventQueryStatus, (ZeEventHostVisible));
auto HostVisibleEvent = EventList[I]->HostVisibleEvent;
if (FilterEventWaitList && HostVisibleEvent) {
auto Res =
ZE_CALL_NOCHECK(zeEventQueryStatus, (HostVisibleEvent->ZeEvent));
if (Res == ZE_RESULT_SUCCESS) {
// Event has already completed, don't put it into the list
continue;
Expand Down Expand Up @@ -1792,8 +1838,11 @@ pi_result piPlatformsGet(pi_uint32 NumEntries, pi_platform *Platforms,
if (NumPlatforms)
*NumPlatforms = PiPlatformsCache->size();

zePrint("Using %s events\n",
ZeAllHostVisibleEvents ? "all host-visible" : "device-only");
zePrint("Using events scope: %s\n",
EventsScope == AllHostVisible ? "all host-visible"
: EventsScope == OnDemandHostVisibleProxy
? "on demand host-visible proxy"
: "only last command in a batch is host-visible");
return PI_SUCCESS;
}

Expand Down Expand Up @@ -4724,45 +4773,16 @@ pi_result piextKernelGetNativeHandle(pi_kernel Kernel,
//
// Events
//
ze_event_handle_t _pi_event::getHostVisibleEvent() const {
if (ZeAllHostVisibleEvents) {
return ZeEvent;
} else if (ZeHostVisibleEvent) {
return ZeHostVisibleEvent;
} else {
return nullptr;
}
}

pi_result
_pi_event::getOrCreateHostVisibleEvent(ze_event_handle_t &HostVisibleEvent) {
_pi_event::getOrCreateHostVisibleEvent(ze_event_handle_t &ZeHostVisibleEvent) {

if (ZeAllHostVisibleEvents) {
HostVisibleEvent = ZeEvent;
} else if (ZeHostVisibleEvent) {
HostVisibleEvent = ZeHostVisibleEvent;
} else {
size_t Index;
ze_event_pool_handle_t ZeEventPool = {};
if (auto Res =
Context->getFreeSlotInExistingOrNewPool(ZeEventPool, Index, true))
return Res;
if (!HostVisibleEvent) {
if (EventsScope != OnDemandHostVisibleProxy)
die("getOrCreateHostVisibleEvent: missing host-visible event");

// Create a "proxy" host-visible event.
//
// TODO: consider creating just single host-visible proxy event to
// represent multiple device-scope events. E.g. have a host-visible
// event at the end of each command-list to represent device-scope
// events from every command in that command-list.
//
ZeStruct<ze_event_desc_t> ZeEventDesc;
ZeEventDesc.signal = ZE_EVENT_SCOPE_FLAG_HOST;
ZeEventDesc.wait = 0;
ZeEventDesc.index = Index;

ZE_CALL(zeEventCreate, (ZeEventPool, &ZeEventDesc, &ZeHostVisibleEvent));
ZeHostVisibleEventPool = ZeEventPool;
HostVisibleEvent = ZeHostVisibleEvent;
// Create a "proxy" host-visible event on demand.
PI_CALL(EventCreate(Context, true, &HostVisibleEvent));
HostVisibleEvent->CleanedUp = true;

// Submit the command(s) signalling the proxy event to the queue.
// We have to first submit a wait for the device-only event for which this
Expand All @@ -4783,36 +4803,41 @@ _pi_event::getOrCreateHostVisibleEvent(ze_event_handle_t &HostVisibleEvent) {
ZE_CALL(zeCommandListAppendWaitOnEvents,
(CommandList->first, 1, &ZeEvent));
ZE_CALL(zeCommandListAppendSignalEvent,
(CommandList->first, ZeHostVisibleEvent));
(CommandList->first, HostVisibleEvent->ZeEvent));

if (auto Res = Queue->executeCommandList(CommandList, false, OkToBatch))
return Res;
}
}

ZeHostVisibleEvent = HostVisibleEvent->ZeEvent;
return PI_SUCCESS;
}

pi_result piEventCreate(pi_context Context, pi_event *RetEvent) {
static pi_result EventCreate(pi_context Context, bool HostVisible,
pi_event *RetEvent) {
size_t Index = 0;
ze_event_pool_handle_t ZeEventPool = {};
if (auto Res = Context->getFreeSlotInExistingOrNewPool(ZeEventPool, Index))
if (auto Res = Context->getFreeSlotInExistingOrNewPool(ZeEventPool, Index,
HostVisible))
return Res;

ze_event_handle_t ZeEvent;
ZeStruct<ze_event_desc_t> ZeEventDesc;
ZeEventDesc.index = Index;
ZeEventDesc.wait = 0;
//
// Set the scope to "device" for every event. This is sufficient for global
// device access and peer device access. If needed to be waited on the host
// we are doing special handling, see piEventsWait.
//
// TODO: see if "sub-device" (ZE_EVENT_SCOPE_FLAG_SUBDEVICE) can better be
// used in some circumstances.
//
if (ZeAllHostVisibleEvents) {

if (HostVisible) {
ZeEventDesc.signal = ZE_EVENT_SCOPE_FLAG_HOST;
} else {
//
// Set the scope to "device" for every event. This is sufficient for global
// device access and peer device access. If needed to be seen on the host
// we are doing special handling, see EventsScope options.
//
// TODO: see if "sub-device" (ZE_EVENT_SCOPE_FLAG_SUBDEVICE) can better be
// used in some circumstances.
//
ZeEventDesc.signal = 0;
}

Expand All @@ -4828,9 +4853,17 @@ pi_result piEventCreate(pi_context Context, pi_event *RetEvent) {
} catch (...) {
return PI_ERROR_UNKNOWN;
}

if (HostVisible)
(*RetEvent)->HostVisibleEvent = *RetEvent;

return PI_SUCCESS;
}

pi_result piEventCreate(pi_context Context, pi_event *RetEvent) {
return EventCreate(Context, EventsScope == AllHostVisible, RetEvent);
}

pi_result piEventGetInfo(pi_event Event, pi_event_info ParamName,
size_t ParamValueSize, void *ParamValue,
size_t *ParamValueSizeRet) {
Expand Down Expand Up @@ -4860,10 +4893,11 @@ pi_result piEventGetInfo(pi_event Event, pi_event_info ParamName,
// Make sure that we query a host-visible event only.
// If one wasn't yet created then don't create it here as well, and
// just conservatively return that event is not yet completed.
auto ZeHostVisibleEvent = Event->getHostVisibleEvent();
if (ZeHostVisibleEvent) {
auto HostVisibleEvent = Event->HostVisibleEvent;
if (HostVisibleEvent) {
ze_result_t ZeResult;
ZeResult = ZE_CALL_NOCHECK(zeEventQueryStatus, (ZeHostVisibleEvent));
ZeResult =
ZE_CALL_NOCHECK(zeEventQueryStatus, (HostVisibleEvent->ZeEvent));
if (ZeResult == ZE_RESULT_SUCCESS) {
return getInfo(ParamValueSize, ParamValue, ParamValueSizeRet,
pi_int32{CL_COMPLETE}); // Untie from OpenCL
Expand Down Expand Up @@ -5072,15 +5106,17 @@ pi_result piEventsWait(pi_uint32 NumEvents, const pi_event *EventList) {
if (NumEvents && !EventList) {
return PI_INVALID_EVENT;
}
// Make sure to add all host-visible "proxy" event signals if needed.
// This ensures that all signalling commands are submitted below and
// thus proxy events can be waited without a deadlock.
//
for (uint32_t I = 0; I < NumEvents; I++) {
ze_event_handle_t ZeHostVisibleEvent;
if (auto Res =
EventList[I]->getOrCreateHostVisibleEvent(ZeHostVisibleEvent))
return Res;
if (EventsScope == OnDemandHostVisibleProxy) {
// Make sure to add all host-visible "proxy" event signals if needed.
// This ensures that all signalling commands are submitted below and
// thus proxy events can be waited without a deadlock.
//
for (uint32_t I = 0; I < NumEvents; I++) {
ze_event_handle_t ZeHostVisibleEvent;
if (auto Res =
EventList[I]->getOrCreateHostVisibleEvent(ZeHostVisibleEvent))
return Res;
}
}
// Submit dependent open command lists for execution, if any
for (uint32_t I = 0; I < NumEvents; I++) {
Expand All @@ -5096,10 +5132,11 @@ pi_result piEventsWait(pi_uint32 NumEvents, const pi_event *EventList) {
}
}
for (uint32_t I = 0; I < NumEvents; I++) {
ze_event_handle_t ZeEvent = EventList[I]->getHostVisibleEvent();
if (!ZeEvent)
auto HostVisibleEvent = EventList[I]->HostVisibleEvent;
if (!HostVisibleEvent)
die("The host-visible proxy event missing");

ze_event_handle_t ZeEvent = HostVisibleEvent->ZeEvent;
zePrint("ZeEvent = %#lx\n", pi_cast<std::uintptr_t>(ZeEvent));
ZE_CALL(zeHostSynchronize, (ZeEvent));

Expand Down Expand Up @@ -5159,8 +5196,12 @@ static pi_result EventRelease(pi_event Event, pi_queue LockedQueue) {
if (Event->OwnZeEvent) {
ZE_CALL(zeEventDestroy, (Event->ZeEvent));
}
if (Event->ZeHostVisibleEvent) {
ZE_CALL(zeEventDestroy, (Event->ZeHostVisibleEvent));
// It is possible that host-visible event was never created.
// In case it was check if that's different from this same event
// and release a reference to it.
if (Event->HostVisibleEvent && Event->HostVisibleEvent != Event) {
// Decrement ref-count of the host-visible proxy event.
PI_CALL(piEventRelease(Event->HostVisibleEvent));
}

auto Context = Event->Context;
Expand Down
Loading

0 comments on commit c6b7b8e

Please sign in to comment.