Skip to content

Conversation

@AllanZyne
Copy link
Contributor

@AllanZyne AllanZyne commented Jul 22, 2024

Adapters are released earlier than loader by SYCL Runtime, so we hold them in interceptor to prevent crash

@AllanZyne AllanZyne requested a review from a team as a code owner July 22, 2024 05:36
@github-actions github-actions bot added loader Loader related feature/bug sanitizer Sanitizer layer issues/changes/specification labels Jul 22, 2024
@AllanZyne AllanZyne changed the title [DeviceSanitizer] Fix intercepter destruction order [DeviceSanitizer] Fix interceptor destruction order Jul 22, 2024
Copy link
Contributor

@pbalcer pbalcer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit us a couple of times now. Can you please add some tests in UR to verify that the sanitizer at least initializes, parses options and destroys correctly.
Our CI infra now has PVC systems (the label is "L0_E2E"), so you could even add tests for the core functionality, but for now it would be good to cover just the general init/teardown flow.

if (result == UR_RESULT_SUCCESS) {
const uint32_t NumAdapters = pNumAdapters ? *pNumAdapters : NumEntries;
for (uint32_t i = 0; i < NumAdapters; ++i) {
UR_CALL(getContext()->interceptor->holdAdapter(phAdapters[i]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If urAdapterGet() get called multiple times, then we would have duplicated handles.

Copy link
Contributor Author

@AllanZyne AllanZyne Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we needn't hold the handle of adapter, it seems ur loader has already hold this.
I'll investigate this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every time you call urAdapterGet, an internal reference count is incremented and you need to decrement it using urAdapterRelease.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@AllanZyne
Copy link
Contributor Author

This bit us a couple of times now. Can you please add some tests in UR to verify that the sanitizer at least initializes, parses options and destroys correctly. Our CI infra now has PVC systems (the label is "L0_E2E"), so you could even add tests for the core functionality, but for now it would be good to cover just the general init/teardown flow.

Sorry for the delay! I will add some tests in this PR.

@AllanZyne
Copy link
Contributor Author

AllanZyne commented Aug 2, 2024

unified-runtime/source/loader/ur_lib.hpp

    const std::vector<LayerData> layers = {
        {ur_validation_layer::getContext(),
         ur_validation_layer::context_t::forceDelete},
#if UR_ENABLE_TRACING
        {ur_tracing_layer::getContext(),
         ur_tracing_layer::context_t::forceDelete},
#endif
#if UR_ENABLE_SANITIZER
        {ur_sanitizer_layer::getContext(),
         ur_sanitizer_layer::context_t::forceDelete},
#endif

Hi @pbalcer, is ur_validation_layer's context released before ur_sanitizer_layer?

I had found where this logic is.
The layers' releasing order needs to be reversed.

@AllanZyne AllanZyne marked this pull request as draft August 2, 2024 08:44
@pbalcer
Copy link
Contributor

pbalcer commented Aug 2, 2024

~~

I had found where this logic is. The layers' releasing order needs to be reversed.

Makes sense, please feel free to do that change.

@AllanZyne AllanZyne marked this pull request as ready for review August 6, 2024 06:55
@AllanZyne
Copy link
Contributor Author

@pbalcer I think this fix is also needed for our next release, can you help to add a label for it?

@pbalcer pbalcer added the v0.10.x Include in the v0.10.x release label Aug 6, 2024
@AllanZyne
Copy link
Contributor Author

LLVM test: intel/llvm#14963

@pbalcer
Copy link
Contributor

pbalcer commented Aug 6, 2024

@AllanZyne at this point things will need to be cherry-picked into the release branch, https://github.com/oneapi-src/unified-runtime/tree/v0.10.x. No need to do that now though. Just make sure all the PRs that you want included are tagged with v0.10.x.

@AllanZyne
Copy link
Contributor Author

@AllanZyne at this point things will need to be cherry-picked into the release branch, https://github.com/oneapi-src/unified-runtime/tree/v0.10.x. No need to do that now though. Just make sure all the PRs that you want included are tagged with v0.10.x.

Ok, thanks for your information.

I've checked CI tests in UR and LLVM, it seems they're unrelated to this PR.

Copy link
Contributor

@yingcong-wu yingcong-wu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than a commenting problem.

Copy link
Contributor

@zhaomaosu zhaomaosu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@AllanZyne
Copy link
Contributor Author

Hi @oneapi-src/unified-runtime-maintain, since this PR is an essential bugfix for next release, can you help to review and merge this PR ASAP?
Thanks very much!

Copy link
Contributor

@aarongreig aarongreig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending comments

@aarongreig aarongreig added the ready to merge Added to PR's which are ready to merge label Aug 15, 2024
@omarahmed1111 omarahmed1111 merged commit 1a1839b into oneapi-src:main Aug 16, 2024
@AllanZyne
Copy link
Contributor Author

Hi @oneapi-src/unified-runtime-maintain, should I cherry pick this PR by myself?

@aarongreig
Copy link
Contributor

Hi @oneapi-src/unified-runtime-maintain, should I cherry pick this PR by myself?

The bump for this was included in intel/llvm#15101 which is already on our list to cherry-pick

kbenzie pushed a commit that referenced this pull request Aug 20, 2024
[DeviceSanitizer] Fix interceptor destruction order
@kbenzie kbenzie mentioned this pull request Aug 20, 2024
53 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

loader Loader related feature/bug ready to merge Added to PR's which are ready to merge sanitizer Sanitizer layer issues/changes/specification v0.10.x Include in the v0.10.x release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants