Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] [DOC] Prepare design-document for assert feature #3461

Merged
merged 49 commits into from
May 31, 2021

Conversation

s-kanaev
Copy link
Contributor

@s-kanaev s-kanaev commented Apr 1, 2021

See extension document for SYCL describing assert behaviour

Sergey Kanaev added 2 commits March 31, 2021 17:07
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
@s-kanaev s-kanaev requested review from kbobrovs, pvchupin and a team as code owners April 1, 2021 08:24
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
@intel intel deleted a comment from gmlueck Apr 2, 2021
Sergey Kanaev added 3 commits April 5, 2021 16:16
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
ze_result Result = zeEventQueryStatus(Event);
```

If kernel failed an assertion `zeEventQueryStatus` should return

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don;t think this is possible to achieve in asynchronous / non-blocking way in L0.

We dont have any communication between kernel and event - so we can;t signal events with "assert happened" information.

if we use global / program wide assert buffer - each kernel will be using the same assert happened flag - we do not have fine grain control to determine which kernel - and which connected event fired the assert.

Fences could be used - allowing to synchronize at cmdQueue level and not kernel - any kernel causing assert executed in cmd Queue can then make fence synchronize to return error:https://spec.oneapi.com/level-zero/latest/core/PROG.html#fences

Copy link
Contributor Author

@s-kanaev s-kanaev Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still possible in OpenCL?
Can the OpenCL approach be reused in Level-Zero?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you, please, provide more details about using fences?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fences are decribed in L0 spec - they are similar to events, but directly connected to command queues: https://spec.oneapi.com/level-zero/latest/core/PROG.html#fences

In OpenCL the submission model is different - each enqueue is independent - single kernel is submitted ( queued) at a time. L0 operates on command lists that may contain multiple kernels - once cmd list is submitted to HW - we can;t control when a kernel in whole sequence is started completed.

OpenCL handles kernels with printf in a blocking way - enqueueNDRangeKErnel with printf makes this a blocking call - so we have fine control when specific kernel is completed - we can do the same for assert() message - output event will be created when the kernel has already finished. I L0 this is not possible - as we would have to synchronize whoel command list.

Sergey Kanaev added 2 commits April 6, 2021 17:31
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
`sycl::event_error` exception. Otherwise, SYCL Runtime should trigger abort.
Even though multiple failures of the same or different assertions can happen in
multiple workitems, implementation is required to deliver only one. The
assertion failure message is printed to `stderr` by SYCL Runtime.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it happen always or only without async_handler set?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should always print the assertion message because:

  • This would be consistent with the "safe" implementation (the one that depends on hardware support), which is defined to print the message even before notifying the host.

  • This is also consistent with the way assert works on the host, which prints the assertion message even before raising SIGABRT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, even if user set an async_handler in order to gracefully react to assert failure, we still print something to stderr? What for? It is not that bad as if we printed into stdout, but still seems unnecessary a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was weird when I first read this spec also. But then I tried the following test:

#include <cassert>
#include <csignal>
#include <cstdlib>

void handle(int sig) {
  std::exit(0);  // Exit silently
}

int main() {
  std::signal(SIGABRT, handle);
  assert(false);
}

The results:

$ clang -std=c++17 -pedantic -o t t.cpp
$ ./t
t: t.cpp:11: int main(): Assertion `false' failed.
$ echo $?
0

Despite the fact that I catch the SIGABRT and exit without printing anything, I still get a message printed to stderr.

Therefore, it seems like the behavior defined in this spec is consistent with the way assert works on the host.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's kind of obvious due to the fact that assert(expr) on host is unwrapped into

if (!(expr)) {
  fprintf(stderr, ...);
  abort();
}

In device-code, assert(expr) unwraps to:

if (!(expr)) {
  __devicelib_assert_fail(#expr, __FILE__, __LINE__, __PRETTY_FUNCTION__, global ID, local ID);
}

Sergey Kanaev added 8 commits April 8, 2021 16:30
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Sergey Kanaev added 2 commits May 19, 2021 16:35
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Co-authored-by: kbobrovs <konstantin.s.bobrovsky@intel.com>
@s-kanaev s-kanaev requested a review from kbobrovs May 20, 2021 09:03
@bader bader requested a review from AlexeySachkov May 24, 2021 09:12
@s-kanaev
Copy link
Contributor Author

@kbobrovs , a friendly ping

kbobrovs
kbobrovs previously approved these changes May 25, 2021
Copy link
Contributor

@bader bader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few nits.

Sergey Kanaev and others added 2 commits May 27, 2021 13:23
Co-authored-by: bader <alexey.bader@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
bader
bader previously approved these changes May 27, 2021
@gmlueck
Copy link
Contributor

gmlueck commented May 27, 2021

I'd suggest changing this paragraph in the extension specification now that we have the new aspect:

It is unspecified whether a failing assert() returns to its caller before the kernel terminates. If a failing call returns, the device code may need to continue execution without deadlocking for the assertion message to be printed or for std::abort() to be called.

Maybe something like this:

Some devices implement assert() natively while others use a fallback implementation, and the two implementations provide different guarantees. The native implementation is most similar to the way assert() works on the host. If an assertion fails in the native implementation, the assertion message is immediately printed to stderr and the program terminates by calling std::abort(). If an assertion fails with the fallback implementation, the failing assert() returns back to its caller and the device code must continue executing (without deadlocking) until the kernel completes. The implementation prints the assertion message to stderr and terminates with std::abort() only after the kernel completes execution. An application can determine which of the two mechanisms a device uses by testing the device aspect aspect::ext_oneapi_native_assert.

Note that this also defines the terms "native support" and "fallback implementation", which you use later in the description of ext_oneapi_native_assert.

kbobrovs
kbobrovs previously approved these changes May 27, 2021
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
@s-kanaev s-kanaev dismissed stale reviews from kbobrovs and bader via fbca768 May 27, 2021 17:48
@s-kanaev
Copy link
Contributor Author

I'd suggest changing this paragraph in the extension specification now that we have the new aspect:

Done.

@s-kanaev s-kanaev requested review from bader and kbobrovs May 27, 2021 17:49
Copy link
Contributor

@gmlueck gmlueck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@s-kanaev
Copy link
Contributor Author

The failure in Jenkins/Precommit doesn't relate to the changes here. Reported the failure.

@bader
Copy link
Contributor

bader commented May 31, 2021

The failure in Jenkins/Precommit doesn't relate to the changes here. Reported the failure.

Please, disable related tests and re-run the job.

@s-kanaev
Copy link
Contributor Author

Created PR to disable the test: intel/llvm-test-suite#303

@bader bader merged commit 69fc6dc into intel:sycl May 31, 2021
performed only when assertion is enabled and Device-side Runtime doesn't provide
implementation of `__devicelib_assert_fail`.

In DPCPP headers one can see if assert is enabled with status of `NDEBUG` macro
Copy link

@olegmaslovatintel olegmaslovatintel Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a many user reported issues after functionality is merged #3767 which seems caused by fall back design.
@s-kanaev is there possibility to NOT enable/define/link `__devicelib_assert_fail by default?

tagging @AlexeySachkov @gmlueck @kbobrovs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Missing documentation for the code, compiler or runtime features, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.