Skip to content

Commit

Permalink
Fix ThreadSanitizer failure in controller factory.
Browse files Browse the repository at this point in the history
The failure looks like this:

  WARNING: ThreadSanitizer: race on NSMutableArray (pid=11619)
    Read-only access of NSMutableArray at 0x7b0c0005f5b0 by thread T3:
      #0 -[__NSArrayM countByEnumeratingWithState:objects:count:] <null>:2 (CoreFoundation:x86_64+0x4a338)
      #1 -[MTRDeviceControllerFactory(InternalMethods) operationalInstanceAdded:] MTRDeviceControllerFactory.mm:855 (Matter:x86_64+0x1fd2a)
      #2 MTROperationalBrowser::OnBrowse(_DNSServiceRef_t*, unsigned int, unsigned int, int, char const*, char const*, char const*, void*) MTROperationalBrowser.mm:100 (Matter:x86_64+0x20ee63c)
      #3 handle_browse_response <null>:2 (libsystem_dnssd.dylib:x86_64+0x3733)
      #4 _dispatch_client_callout <null>:2 (libdispatch.dylib:x86_64+0x3316)

    Previous modifying access of NSMutableArray at 0x7b0c0005f5b0 by main thread:
      #0 -[__NSArrayM addObject:] <null>:2 (CoreFoundation:x86_64+0x2457a)
      #1 -[MTRDeviceControllerFactory createController] MTRDeviceControllerFactory.mm:719 (Matter:x86_64+0x1cee3)
      #2 -[MTRDeviceControllerFactory createControllerOnExistingFabric:error:] MTRDeviceControllerFactory.mm:534 (Matter:x86_64+0x19792)

The basic problem is that we are in the middle of adding an object to
_controllers on the API consumer thread when on the Matter thread we get our
browse notification.

The changes here don't aim to lock around all access to _controllers, but just
to make sure that our mutations of it can't race with the access on the Matter
thread.  More coarse locking would need to be done very carefully, given the
amount of dispath_sync to the Matter thread we have going on.
  • Loading branch information
bzbarsky-apple committed May 25, 2023
1 parent 9f43988 commit e8b4fe8
Showing 1 changed file with 28 additions and 1 deletion.
29 changes: 28 additions & 1 deletion src/darwin/Framework/CHIP/MTRDeviceControllerFactory.mm
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@
#import "MTRPersistentStorageDelegateBridge.h"
#import "NSDataSpanConversion.h"

#import <os/lock.h>

#include <controller/CHIPDeviceControllerFactory.h>
#include <credentials/CHIPCert.h>
#include <credentials/FabricTable.h>
Expand Down Expand Up @@ -88,6 +90,20 @@ @interface MTRDeviceControllerFactory ()
@property () chip::Credentials::DeviceAttestationVerifier * deviceAttestationVerifier;
@property (readonly) BOOL advertiseOperational;
@property (nonatomic, readonly) Credentials::IgnoreCertificateValidityPeriodPolicy * certificateValidityPolicy;
// Lock used to serialize access to the "controllers" array, since it needs to
// be touched from both whatever queue is starting controllers and from the
// Matter queue. The way this lock is used assumes that:
//
// 1) The only mutating accesses to the controllers array happen from outside
// the Matter queue (which is a good assumption, because those functions do
// sync dispatch to the Matter queue).
// 2) It's our API consumer's responsibility to serialize access to us from
// outside.
//
// This means that we only take the lock around mutations of the array and
// accesses to the array that are from code running on the Matter queue.

@property (nonatomic, readonly) os_unfair_lock controllersLock;

- (BOOL)findMatchingFabric:(FabricTable &)fabricTable
params:(MTRDeviceControllerStartupParams *)params
Expand Down Expand Up @@ -123,6 +139,7 @@ - (instancetype)init
_running = NO;
_chipWorkQueue = DeviceLayer::PlatformMgrImpl().GetWorkQueue();
_controllerFactory = &DeviceControllerFactory::GetInstance();
_controllersLock = OS_UNFAIR_LOCK_INIT;

_sessionKeystore = new chip::Crypto::RawKeySessionKeystore();
if ([self checkForInitError:(_sessionKeystore != nullptr) logMsg:kErrorSessionKeystoreInit]) {
Expand Down Expand Up @@ -716,7 +733,9 @@ - (MTRDeviceController * _Nullable)createController

// Add the controller to _controllers now, so if we fail partway through its
// startup we will still do the right cleanups.
os_unfair_lock_lock(&_controllersLock);
[_controllers addObject:controller];
os_unfair_lock_unlock(&_controllersLock);

return controller;
}
Expand Down Expand Up @@ -808,7 +827,9 @@ - (void)controllerShuttingDown:(MTRDeviceController *)controller
});
}

os_unfair_lock_lock(&_controllersLock);
[_controllers removeObject:controller];
os_unfair_lock_unlock(&_controllersLock);

if ([_controllers count] == 0) {
dispatch_sync(_chipWorkQueue, ^{
Expand Down Expand Up @@ -852,7 +873,13 @@ - (nullable MTRDeviceController *)runningControllerForFabricIndex:(chip::FabricI

- (void)operationalInstanceAdded:(chip::PeerId &)operationalID
{
for (MTRDeviceController * controller in _controllers) {
assertChipStackLockedByCurrentThread();

os_unfair_lock_lock(&_controllersLock);
NSArray<MTRDeviceController *> * controllersCopy = [_controllers copy];
os_unfair_lock_unlock(&_controllersLock);

for (MTRDeviceController * controller in controllersCopy) {
auto * compressedFabricId = controller.compressedFabricID;
if (compressedFabricId != nil && compressedFabricId.unsignedLongLongValue == operationalID.GetCompressedFabricId()) {
ChipLogProgress(Controller, "Notifying controller at fabric index %u about new operational node 0x" ChipLogFormatX64,
Expand Down

0 comments on commit e8b4fe8

Please sign in to comment.