-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Matter.framework] When the device is entering suspended state and at… #35973
base: master
Are you sure you want to change the base?
[Matter.framework] When the device is entering suspended state and at… #35973
Conversation
… the same time _ensureSubscriptionForExistingDelegates is called with an error from getting a session it crashes with _os_unfair_lock_recursive_abort since it is called on the same thread and the error handling code also try to get a lock
Review changes with SemanticDiff. |
PR #35973: Size comparison from 0a2e58d to 491bcb7 Full report (88 builds for bl602, bl702, bl702l, cc13x4_26x4, cc32xx, cyw30739, efr32, esp32, linux, nrfconnect, nxp, psoc6, qpg, stm32, telink, tizen)
|
if (self.suspended) { | ||
MTR_LOG_ERROR("%@ suspended: can't get session for node %016llX-%016llx (%llu)", self, self.compressedFabricID.unsignedLongLongValue, nodeID, nodeID); | ||
// TODO: Can we do a better error here? | ||
completion(nullptr, chip::NullOptional, [MTRError errorForCHIPErrorCode:CHIP_ERROR_INCORRECT_STATE], nil); | ||
dispatch_async(_chipWorkQueue, ^{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the thing is... asyncGetCommissionerOnMatterQueue can also sync-call completion, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to either fix whoever is calling directlyGetSessionForNode
to handle the fact that they might get a sync-call in whatever state they are in to the error handler, or change the API documentation for getSessionForNode
and the implementations of both that and directlyGetSessionForNode
to guarantee async callbacks.
We used to not be able to use the Matter queue there, because without a running controller we potentially had no running queue. But now that we never shut the queue down (right?) maybe we can consistently deliver the errors on that queue....
… the same time _ensureSubscriptionForExistingDelegates is called with an error from getting a session it crashes with _os_unfair_lock_recursive_abort since it is called on the same thread and the error handling code also try to get a lock
Problem
While attempting to reproduce a specific memory leak with Darwin-framework-tool, I encountered a crash caused by a recursive lock issue.
In essence, this occurs because, when the tool is suspended, directlyGetSessionForNode returns synchronously. However, the code that triggered this behavior already held the lock, and the code managing subscription errors also attempts to acquire the same lock.
When this happens, the thread states are as follows: