-
Notifications
You must be signed in to change notification settings - Fork 792
[SYCL] Detect ze call leaking in E2E tests #19710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
Conversation
sycl/test-e2e/format.py
Outdated
def check_leak(output): | ||
keyword_found = False | ||
for line in output.splitlines(): | ||
if keyword_found and "LEAK" in line: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, we already have a lit var %{l0_leak_check}
which a bunch of the tests still use (instead of UR_L0_LEAKS_DEBUG
). It gets replaced by the env var in the final invocation, but I'm not sure if this python script here will detect.
I think that l0_leak_check
directive is no longer needed and probably could be replaced by the actual env var. It's in hundreds of tests right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the tests that set the env var (either directly or by using %{l0_leak_check}) already check for absence of "LEAK" (--implicit-check-not=LEAK) so we don't have to check it again here. As far as I understand this is only needed when the user decides to run all the tests with leak checking (e.g. passing --param ur_l0_leaks_debug=1 to lit).
Pretty sure the failed test is not caused by this PR. @cperkinsintel is it a known flaky issue? |
I don't know if it;s a known issue but it seems unrelated to the PR |
@intel/llvm-reviewers-runtime could you please take a look the PR? |
Why does "unification" cause that? |
The leak checking is now happening in the loader, during loader teardown, and so there is no place for us to throw the exception from anymore. |
Because it's C and not C++? Can it |
Mostly because it's done in the library destructor which means we don't have an entry point from which we could return an error (we had urAdapterTeardown when leak checking was done in UR). We could abort() but leaks are not really a critical failure so I think just parsing the output in tests is a better option. |
Can we have one more env variable control to request that |
@nrspruit What do you think about calling abort? Do you think there is any other way to report the leaks? |
So, the layers in the loader are meant to gracefully catch errors, by convention we avoid any and all aborts within the L0 loader and drivers unless it is an unrecoverable error. In this case, even if you changed the validation layer, you will not get that update in the CI to fix this problem until you have a driver with that loader. The CI does not use a different loader than what is in your driver so you would not see that change for a couple of months unless one changed the loader only. However, I recommend only adding the "abort" handling in case of a leak into the llvm-lit which already scrapes the logs to determine if a test passes. Because this is already done, the stdout/stderr in llvm-lit will never be redirected, otherwise many tests would fail so I see that as a non issue, noone is redirecting stdout/stderr in the llvm-lit testing unless they wanted all the tests to fail due to the logs being read already..... |
Where is it proven that this patch works as intended? |
For example, run USM cases by commenting its memory releasing With this patch:
Without this patch:
|
Can you please do the same but also modifying its
or something similar to verify that any pipes inside the test don't interfere with this approach? |
Good suggestion! I find an invalid case when validating. Push a commit to fix such scenario
|
Please paste the logs (-a) of that run. |
@aelovikov-intel please check below output. The problem has been fixed.
|
Why "PASS"? I asked for a modified case (with a leak) that uses pipe (both stdout and stderr) to see if your approach works for those tests. The logs (-a) should clearly show a fail with a leak in such scenario. |
Done. It fails with expected errors with new method:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
After #19328, UR_L0_LEAKS_DEBUG stop throwing exceptions when leaks are detected so LIT can't report failures. Add a leak checking in format.py to keep "--param ur_l0_leaks_debug=1" work as before.