-
Notifications
You must be signed in to change notification settings - Fork 765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] [DOC] Prepare design-document for assert feature #3461
Conversation
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
ze_result Result = zeEventQueryStatus(Event); | ||
``` | ||
|
||
If kernel failed an assertion `zeEventQueryStatus` should return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don;t think this is possible to achieve in asynchronous / non-blocking way in L0.
We dont have any communication between kernel and event - so we can;t signal events with "assert happened" information.
if we use global / program wide assert buffer - each kernel will be using the same assert happened flag - we do not have fine grain control to determine which kernel - and which connected event fired the assert.
Fences could be used - allowing to synchronize at cmdQueue level and not kernel - any kernel causing assert executed in cmd Queue can then make fence synchronize to return error:https://spec.oneapi.com/level-zero/latest/core/PROG.html#fences
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it still possible in OpenCL?
Can the OpenCL approach be reused in Level-Zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you, please, provide more details about using fences?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fences are decribed in L0 spec - they are similar to events, but directly connected to command queues: https://spec.oneapi.com/level-zero/latest/core/PROG.html#fences
In OpenCL the submission model is different - each enqueue is independent - single kernel is submitted ( queued) at a time. L0 operates on command lists that may contain multiple kernels - once cmd list is submitted to HW - we can;t control when a kernel in whole sequence is started completed.
OpenCL handles kernels with printf in a blocking way - enqueueNDRangeKErnel with printf makes this a blocking call - so we have fine control when specific kernel is completed - we can do the same for assert() message - output event will be created when the kernel has already finished. I L0 this is not possible - as we would have to synchronize whoel command list.
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
sycl/doc/Assert.md
Outdated
`sycl::event_error` exception. Otherwise, SYCL Runtime should trigger abort. | ||
Even though multiple failures of the same or different assertions can happen in | ||
multiple workitems, implementation is required to deliver only one. The | ||
assertion failure message is printed to `stderr` by SYCL Runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it happen always or only without async_handler
set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should always print the assertion message because:
-
This would be consistent with the "safe" implementation (the one that depends on hardware support), which is defined to print the message even before notifying the host.
-
This is also consistent with the way
assert
works on the host, which prints the assertion message even before raisingSIGABRT
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, even if user set an async_handler
in order to gracefully react to assert
failure, we still print something to stderr
? What for? It is not that bad as if we printed into stdout
, but still seems unnecessary a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was weird when I first read this spec also. But then I tried the following test:
#include <cassert>
#include <csignal>
#include <cstdlib>
void handle(int sig) {
std::exit(0); // Exit silently
}
int main() {
std::signal(SIGABRT, handle);
assert(false);
}
The results:
$ clang -std=c++17 -pedantic -o t t.cpp
$ ./t
t: t.cpp:11: int main(): Assertion `false' failed.
$ echo $?
0
Despite the fact that I catch the SIGABRT and exit without printing anything, I still get a message printed to stderr.
Therefore, it seems like the behavior defined in this spec is consistent with the way assert
works on the host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's kind of obvious due to the fact that assert(expr)
on host is unwrapped into
if (!(expr)) {
fprintf(stderr, ...);
abort();
}
In device-code, assert(expr)
unwraps to:
if (!(expr)) {
__devicelib_assert_fail(#expr, __FILE__, __LINE__, __PRETTY_FUNCTION__, global ID, local ID);
}
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com> Co-authored-by: kbobrovs <konstantin.s.bobrovsky@intel.com>
@kbobrovs , a friendly ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a few nits.
Co-authored-by: bader <alexey.bader@intel.com> Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
I'd suggest changing this paragraph in the extension specification now that we have the new aspect:
Maybe something like this:
Note that this also defines the terms "native support" and "fallback implementation", which you use later in the description of |
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The failure in |
Please, disable related tests and re-run the job. |
Created PR to disable the test: intel/llvm-test-suite#303 |
performed only when assertion is enabled and Device-side Runtime doesn't provide | ||
implementation of `__devicelib_assert_fail`. | ||
|
||
In DPCPP headers one can see if assert is enabled with status of `NDEBUG` macro |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had a many user reported issues after functionality is merged #3767 which seems caused by fall back design.
@s-kanaev is there possibility to NOT enable/define/link `__devicelib_assert_fail by default?
tagging @AlexeySachkov @gmlueck @kbobrovs
See extension document for SYCL describing
assert
behaviour