Skip to content

Conversation

@EwanC
Copy link
Contributor

@EwanC EwanC commented Jul 9, 2024

Coverity has reported that there is the possibility of the command-buffer destructor throwing an exception from its calls
to CleanupCompletedEvent.

This is from an underlying throw in GetQueue when a dynamic_cast doesn't behave as expected. I initially tried
changing this to an assert but the V2 Level Zero adapter is triggering this case while in it's developmental state, so an assert brings down the whole test suite rather than just being reported as a fail (the UR loader catches exceptions and returns them as error codes).

Therefore this PR addresses the Coverity report by moving the calls to CleanupCompletedEvent() outside of the command-buffer destructor. This allows us to cleanup prior to calling the destructor, without the risk of throwing an exception.

DPC++ PR intel/llvm#14503

@EwanC EwanC mentioned this pull request Jul 9, 2024
@github-actions github-actions bot added level-zero L0 adapter specific issues command-buffer Command Buffer feature addition/changes/specification labels Jul 9, 2024
@EwanC EwanC force-pushed the ewan/cmd_buf_resource_cleanup branch 3 times, most recently from dec3f85 to b621c75 Compare July 10, 2024 13:32
@EwanC EwanC marked this pull request as ready for review July 11, 2024 08:51
@EwanC EwanC requested a review from a team as a code owner July 11, 2024 08:51
@pbalcer
Copy link
Contributor

pbalcer commented Jul 12, 2024

E2E L0 failed, but it doesn't seem related:

Failed Tests (1):
  SYCL :: ESIMD/regression/minmax.cpp

Copy link
Contributor

@pbalcer pbalcer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but this is quite a lot of changes to handle a fairly trivial problem.

@igchor it may make sense to create a nothrow variant of the getQueue method that returns something like an std::optional if succesful. Thoughts?

@EwanC
Copy link
Contributor Author

EwanC commented Jul 19, 2024

@igchor it may make sense to create a nothrow variant of the getQueue method that returns something like an std::optional if succesful. Thoughts?

ping @igchor, I could try make this change instead if you think it's a better way to go

@EwanC EwanC force-pushed the ewan/cmd_buf_resource_cleanup branch from 647aa83 to ce20a98 Compare July 22, 2024 14:00
@igchor
Copy link
Contributor

igchor commented Jul 22, 2024

@igchor it may make sense to create a nothrow variant of the getQueue method that returns something like an std::optional if succesful. Thoughts?

ping @igchor, I could try make this change instead if you think it's a better way to go

Sorry, I missed the notification.

I think it would be best to just move all the logic from the destructor to a function like you did but not using UR_CALL_STORE_ERR inside - leaving all the logic as is. Also, the function can just return void.

If an exception from GetQueue is thrown, (.e.g from urDeviceRelease) the rest of the cleanup functions won't be called anyway because the exception will abort the cleanupCommandBufferResources function and propagate up. To avoid that, you would have to wrap each function in a try/catch but I don't think that's a good idea (you might end up with some inconsistent state - it's better to just have a leak). Also, as far as I understand this should not happen during execution anyway, it is just something coverity thinks is possible, right?

@igchor
Copy link
Contributor

igchor commented Jul 22, 2024

As for a nothrow variant of GetQueue - you would still need to handle nullopt and propagate the error somehow, I'm not sure if it would be easier.

Coverity has reported that there is the possibility of the
command-buffer destructor throwing an exception from its calls
to `CleanupCompletedEvent`.

This is from an underlying `throw` in
[GetQueue](https://github.com/oneapi-src/unified-runtime/blob/main/source/adapters/level_zero/queue.hpp#L890)
when a `dynamic_cast` doesn't behave as expected. I initially tried
changing this to an assert but the V2 Level Zero adapter is triggering
this case while in it's developmental state, so an assert brings down the
whole test suite rather than just being reported as a fail (the UR
loader catches exceptions and returns them as error codes).

Therefore this PR addresses the issue by moving the calls to
`CleanupCompletedEvent()` outside of the command-buffer destructor. This allows
us us to report an error from command-buffer release by doing cleanup prior to
calling the destructor, without the risk of throwing an exception.
@EwanC EwanC force-pushed the ewan/cmd_buf_resource_cleanup branch from ce20a98 to a3d84be Compare July 22, 2024 16:32
@EwanC
Copy link
Contributor Author

EwanC commented Jul 22, 2024

Thanks for your reply @igchor, I've updated the PR with your suggestion.

It is indeed only the possibility of an exception being thrown in a C++ object destructor that Coverity has detected. Avoiding leaks and returning a reliable error when things go wrong when freeing resources is over an above that, so can be considered out of scope for the purposes of this PR.

@EwanC EwanC added the ready to merge Added to PR's which are ready to merge label Jul 23, 2024
@EwanC EwanC added the v0.10.x Include in the v0.10.x release label Jul 29, 2024
@omarahmed1111 omarahmed1111 merged commit de53693 into oneapi-src:main Jul 29, 2024
EwanC pushed a commit to reble/llvm that referenced this pull request Jul 29, 2024
Updates L0 adapter to UR PR oneapi-src/unified-runtime#1840
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

command-buffer Command Buffer feature addition/changes/specification level-zero L0 adapter specific issues ready to merge Added to PR's which are ready to merge v0.10.x Include in the v0.10.x release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants