Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when accessory reboots during pairing #21106

Closed
kpark-apple opened this issue Jul 22, 2022 · 2 comments · Fixed by #21121
Closed

Crash when accessory reboots during pairing #21106

kpark-apple opened this issue Jul 22, 2022 · 2 comments · Fixed by #21121

Comments

@kpark-apple
Copy link
Contributor

Problem

Controller crashes when accessory was rebooted during pairing.

Steps to repro:

  1. Start pairing accessory.
  2. Reboot the accessory before pairing is complete.

Configuration:
SHA 5d8599d

Backtrace is as follows:

Thread 3 Crashed [↩](https://crashtracer.ios.apple.com/report/show?app=18343&build=20A315&signature=78607745&users=all#log_sections)::   Dispatch queue: com.zigbee.chip.framework.controller.workqueue
0   CHIP                          	       0x10422b110 chip::Controller::DeviceCommissioner::DisarmDone() + 68 (/Library/Caches/com.apple.xbs/Binaries/CHIPFramework/install/TempContent/Objects/CHIP.build/CHIP.build/out/../../../../../../../../Sources/344a74a1-8ecf-4022-8fba-6c80e26d1af6/CHIPFramework-59/connectedhomeip/src/controller/CHIPDeviceController.cpp:1420)
1   CHIP                          	       0x10422b0c0 chip::Controller::DeviceCommissioner::OnDisarmFailsafeFailure(void*, chip::ChipError) + 92 (/Library/Caches/com.apple.xbs/Binaries/CHIPFramework/install/TempContent/Objects/CHIP.build/CHIP.build/out/../../../../../../../../Sources/344a74a1-8ecf-4022-8fba-6c80e26d1af6/CHIPFramework-59/connectedhomeip/src/controller/CHIPDeviceController.cpp:1414)
2   CHIP                          	       0x10422b0c0 chip::Controller::DeviceCommissioner::OnDisarmFailsafeFailure(void*, chip::ChipError) + 92 (/Library/Caches/com.apple.xbs/Binaries/CHIPFramework/install/TempContent/Objects/CHIP.build/CHIP.build/out/../../../../../../../../Sources/344a74a1-8ecf-4022-8fba-6c80e26d1af6/CHIPFramework-59/connectedhomeip/src/controller/CHIPDeviceController.cpp:1414)
3   CHIP                          	       0x10406b814 std::__1::__function::__func<chip::ChipError chip::Controller::ClusterBase::InvokeCommand<chip::app::Clusters::GeneralCommissioning::Commands::ArmFailSafe::Type>(chip::app::Clusters::GeneralCommissioning::Commands::ArmFailSafe::Type const&, void*, void (*)(void*, chip::app::Clusters::GeneralCommissioning::Commands::ArmFailSafe::Type::ResponseType const&), void (*)(void*, chip::ChipError), chip::Optional<unsigned short> const&)::'lambda'(chip::ChipError), std::__1::allocator<chip::ChipError chip::Controller::ClusterBase::InvokeCommand<chip::app::Clusters::GeneralCommissioning::Commands::ArmFailSafe::Type>(chip::app::Clusters::GeneralCommissioning::Commands::ArmFailSafe::Type const&, void*, void (*)(void*, chip::app::Clusters::GeneralCommissioning::Commands::ArmFailSafe::Type::ResponseType const&), void (*)(void*, chip::ChipError), chip::Optional<unsigned short> const&)::'lambda'(chip::ChipError)>, void (chip::ChipError)>::operator()(chip::ChipError&&) + 44 (/Library/Caches/com.apple.xbs/Sources/344a74a1-8ecf-4022-8fba-6c80e26d1af6/CHIPFramework-59/connectedhomeip/src/controller/CHIPCluster.h:87)
4   CHIP                          	       0x10406b440 chip::Controller::TypedCommandCallback<chip::app::Clusters::GeneralCommissioning::Commands::ArmFailSafeResponse::DecodableType>::OnError(chip::app::CommandSender const*, chip::ChipError) + 80 (/Volumes/BuildRootMonorailCobaltnitrate/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS16.0.Internal.sdk/usr/include/c++/v1/__functional/function.h:505)
5   CHIP                          	       0x1041e2068 chip::app::CommandSender::OnResponseTimeout(chip::Messaging::ExchangeContext*) + 232 (/Library/Caches/com.apple.xbs/Binaries/CHIPFramework/install/TempContent/Objects/CHIP.build/CHIP.build/out/../../../../../../../../Sources/344a74a1-8ecf-4022-8fba-6c80e26d1af6/CHIPFramework-59/connectedhomeip/src/app/CommandSender.cpp:211)
6   CHIP                          	       0x1042de644 chip::Messaging::ExchangeContext::NotifyResponseTimeout() + 116 (/Library/Caches/com.apple.xbs/Binaries/CHIPFramework/install/TempContent/Objects/CHIP.build/CHIP.build/out/../../../../../../../../Sources/344a74a1-8ecf-4022-8fba-6c80e26d1af6/CHIPFramework-59/connectedhomeip/src/messaging/ExchangeContext.cpp:429)
7   CHIP                          	       0x1042de698 chip::Messaging::ExchangeContext::HandleResponseTimeout(chip::System::Layer*, void*) + 60 (/Library/Caches/com.apple.xbs/Binaries/CHIPFramework/install/TempContent/Objects/CHIP.build/CHIP.build/out/../../../../../../../../Sources/344a74a1-8ecf-4022-8fba-6c80e26d1af6/CHIPFramework-59/connectedhomeip/src/messaging/ExchangeContext.cpp:417)
8   CHIP                          	       0x1042fb4d8 chip::System::TimerData::Callback::Invoke() const + 40 (/Library/Caches/com.apple.xbs/Binaries/CHIPFramework/install/TempContent/Objects/CHIP.build/CHIP.build/out/../../../../../../../../Sources/344a74a1-8ecf-4022-8fba-6c80e26d1af6/CHIPFramework-59/connectedhomeip/src/system/SystemTimer.h:61)
9   CHIP                          	       0x1042fad1c chip::System::TimerPool<chip::System::TimerList::Node>::Invoke(chip::System::TimerList::Node*) + 88 (/Library/Caches/com.apple.xbs/Binaries/CHIPFramework/install/TempContent/Objects/CHIP.build/CHIP.build/out/../../../../../../../../Sources/344a74a1-8ecf-4022-8fba-6c80e26d1af6/CHIPFramework-59/connectedhomeip/src/system/SystemTimer.h:224)
10  CHIP                          	       0x1042f992c chip::System::LayerImplSelect::HandleTimerComplete(chip::System::TimerList::Node*) + 60 (/Library/Caches/com.apple.xbs/Binaries/CHIPFramework/install/TempContent/Objects/CHIP.build/CHIP.build/out/../../../../../../../../Sources/344a74a1-8ecf-4022-8fba-6c80e26d1af6/CHIPFramework-59/connectedhomeip/src/system/SystemLayerImplSelect.cpp:421)
11  CHIP                          	       0x1042f98e4 ___ZN4chip6System15LayerImplSelect10StartTimerENSt3__16chrono8durationIjNS2_5ratioILl1ELl1000EEEEEPFvPNS0_5LayerEPvESA__block_invoke + 80 (/Library/Caches/com.apple.xbs/Binaries/CHIPFramework/install/TempContent/Objects/CHIP.build/CHIP.build/out/../../../../../../../../Sources/344a74a1-8ecf-4022-8fba-6c80e26d1af6/CHIPFramework-59/connectedhomeip/src/system/SystemLayerImplSelect.cpp:148)
12  libdispatch.dylib             	       0x1a37ef0f4 _dispatch_client_callout + 20 (/Library/Caches/com.apple.xbs/Sources/libdispatch/src/object.m:560)
13  libdispatch.dylib             	       0x1a37f2584 _dispatch_continuation_pop + 504 (/Library/Caches/com.apple.xbs/Sources/libdispatch/src/inline_internal.h:2631)
14  libdispatch.dylib             	       0x1a3805b04 _dispatch_source_invoke + 1588 (/Library/Caches/com.apple.xbs/Sources/libdispatch/src/source.c:596)
15  libdispatch.dylib             	       0x1a37f6684 _dispatch_lane_serial_drain + 376 (/Library/Caches/com.apple.xbs/Sources/libdispatch/src/inline_internal.h:0)
16  libdispatch.dylib             	       0x1a37f72f8 _dispatch_lane_invoke + 384 (/Library/Caches/com.apple.xbs/Sources/libdispatch/src/queue.c:3939)
17  libdispatch.dylib             	       0x1a3801ebc _dispatch_workloop_worker_thread + 652 (/Library/Caches/com.apple.xbs/Sources/libdispatch/src/queue.c:6766)
18  libsystem_pthread.dylib       	       0x21dca0df8 _pthread_wqthread + 288 (/Library/Caches/com.apple.xbs/Sources/libpthread/src/pthread.c:2618)
19  libsystem_pthread.dylib       	       0x21dca0b98 start_wqthread + 8
@bzbarsky-apple
Copy link
Contributor

Steps to reproduce, which work on both that SHA and one from yesterday (94f8974):

  1. Compile chip-all-clusters-app.
  2. Modify DeviceCommissioner::PerformCommissioningStep by inserting sleep(5) right after case CommissioningStage::kSendNOC:.
  3. Compile chip-tool.
  4. Run chip-all-clusters-app, which should enter pairing mode.
  5. Run chip-tool interactive start
  6. In chip-tool run pairing code 17 749701123365521327694 --timeout 200
  7. When it pauses on the sleep(5), kill chip-all-clusters-app.
  8. Wait for a while as things time out and crash.

The crash happens because we have the following sequence of events:

  1. The AddNOC fails, triggering a call to CleanupCommissioning. Since we did not get far enough, this sets mDeviceBeingCommissioned, schedules a failsafe disarm, and waits for that to happen before notifying that commissioning failed.
  2. While we are waiting for that ArmFailSafe to fail (which it will, since we have no more session), the 40-second kSessionEstablishmentTimeout timer fires. It looks like we don't disarm this timer until we succeed at AddNOC....
  3. The timer expiring calls OnSessionEstablishmentTimeout, which does StopPairing(mDeviceBeingCommissioned->GetDeviceId());
  4. StopPairing calls ReleaseCommissioneeDevice on the device it finds by id.
  5. ReleaseCommissioneeDevice deletes the object, and clears the mDeviceInPASEEstablishment pointer if that points the given object, but does not clear the mDeviceBeingCommissioned (which is wrong).
  6. Eventually the ArmFailSafe times out, and we get:
    2022-07-22 12:27:19.898863-0400 chip-tool[45140:34138623] ==45140==ERROR: AddressSanitizer: heap-use-after-free on address 0x61d00002f880 at pc 0x0001016cb92e bp 0x700001ea2150 sp 0x700001ea2148
    2022-07-22 12:27:19.898873-0400 chip-tool[45140:34138623] READ of size 8 at 0x61d00002f880 thread T8
    2022-07-22 12:27:19.898894-0400 chip-tool[45140:34138623]     #0 0x1016cb92d in chip::Controller::DeviceCommissioner::DisarmDone() CHIPDeviceController.cpp:1528
    2022-07-22 12:27:19.898904-0400 chip-tool[45140:34138623]     #1 0x1016cb6fb in chip::Controller::DeviceCommissioner::OnDisarmFailsafeFailure(void*, chip::ChipError) CHIPDeviceController.cpp:1522
    

So a few things:

  • ReleaseCommissioneeDevice needs to clear mDeviceBeingCommissioned. That converts this crash from a use-after-free to a null-deref, but it's still a crash.
  • This whole setup with the kSessionEstablishmentTimeout is not good, as kSessionEstablishmentTimeout is redundant #14650 notes. It's forcing commissioning to take no more than 40 seconds from "PASE establishment starts" to "AddNOC succeeds". This might not be great depending on the speed of the device, the speed at which a CSR can be converted into certs (e.g. if that involves contacting off-premises signing infrastructure), etc. We should just remove this timer.
  • The StopPairing API is a bit weird. Calling it will clear out various state, but things may still be relying on that state.

@bzbarsky-apple
Copy link
Contributor

Filed #21120 on some other issues I discovered while debugging this.

bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Jul 22, 2022
Several changes here:

1. Fix ReleaseCommissioneeDevice so it does not leave mDeviceBeingCommissioned
   dangling.
2. Ensure that we have null-checks for mDeviceBeingCommissioned before all
   uses, in case it got nulled out by StopPairing or the like while some async
   operation was in flight.  This changes:
   * DeviceCommissioner::DisarmDone
   * DeviceCommissioner::OnDeviceAttestationInformationVerification
3. Remove kSessionEstablishmentTimeout so we don't have a random hardcoded
   timeout partway through commissioning.

Fixes project-chip#21106
Fixes project-chip#14650
woody-apple pushed a commit that referenced this issue Jul 25, 2022
Several changes here:

1. Fix ReleaseCommissioneeDevice so it does not leave mDeviceBeingCommissioned
   dangling.
2. Ensure that we have null-checks for mDeviceBeingCommissioned before all
   uses, in case it got nulled out by StopPairing or the like while some async
   operation was in flight.  This changes:
   * DeviceCommissioner::DisarmDone
   * DeviceCommissioner::OnDeviceAttestationInformationVerification
3. Remove kSessionEstablishmentTimeout so we don't have a random hardcoded
   timeout partway through commissioning.

Fixes #21106
Fixes #14650
github-actions bot pushed a commit that referenced this issue Jul 25, 2022
Several changes here:

1. Fix ReleaseCommissioneeDevice so it does not leave mDeviceBeingCommissioned
   dangling.
2. Ensure that we have null-checks for mDeviceBeingCommissioned before all
   uses, in case it got nulled out by StopPairing or the like while some async
   operation was in flight.  This changes:
   * DeviceCommissioner::DisarmDone
   * DeviceCommissioner::OnDeviceAttestationInformationVerification
3. Remove kSessionEstablishmentTimeout so we don't have a random hardcoded
   timeout partway through commissioning.

Fixes #21106
Fixes #14650
woody-apple added a commit that referenced this issue Jul 25, 2022
…21167)

Several changes here:

1. Fix ReleaseCommissioneeDevice so it does not leave mDeviceBeingCommissioned
   dangling.
2. Ensure that we have null-checks for mDeviceBeingCommissioned before all
   uses, in case it got nulled out by StopPairing or the like while some async
   operation was in flight.  This changes:
   * DeviceCommissioner::DisarmDone
   * DeviceCommissioner::OnDeviceAttestationInformationVerification
3. Remove kSessionEstablishmentTimeout so we don't have a random hardcoded
   timeout partway through commissioning.

Fixes #21106
Fixes #14650

Co-authored-by: Boris Zbarsky <bzbarsky@apple.com>
isiu-apple pushed a commit to isiu-apple/connectedhomeip that referenced this issue Sep 16, 2022
…chip#21121)

Several changes here:

1. Fix ReleaseCommissioneeDevice so it does not leave mDeviceBeingCommissioned
   dangling.
2. Ensure that we have null-checks for mDeviceBeingCommissioned before all
   uses, in case it got nulled out by StopPairing or the like while some async
   operation was in flight.  This changes:
   * DeviceCommissioner::DisarmDone
   * DeviceCommissioner::OnDeviceAttestationInformationVerification
3. Remove kSessionEstablishmentTimeout so we don't have a random hardcoded
   timeout partway through commissioning.

Fixes project-chip#21106
Fixes project-chip#14650
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants