Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [Linux - Simulated] Stack teardown is crashing after test is completed. #24705

Closed
krypton36 opened this issue Jan 27, 2023 · 16 comments · Fixed by #24714
Closed

[BUG] [Linux - Simulated] Stack teardown is crashing after test is completed. #24705

krypton36 opened this issue Jan 27, 2023 · 16 comments · Fixed by #24714

Comments

@krypton36
Copy link
Contributor

Reproduction steps

  1. Run chip-app for the below test case
  2. Run simulated test case TC-DESC-2-2 steps using chip-tool
  3. observe last step succeeds but still chip-app crashes

Bug prevalence

Every time

GitHub hash of the SDK that was being used

3f360d7

Platform

raspi

Platform Version(s)

No response

Anything else?

It appears that after the simulated device test steps are complete that the crasher is occurring during teardown of the stack. This is an issue because the exit code is reporting a failure and the test succeeded. TH team is consistently seeing this during executions of the simulated device.

Logs:
CNET_TH1.txt
UI_Test_Run_2023_1_27_19_3_54.txt

@raju-apple
Copy link
Contributor

@bzbarsky-apple can you take a look when you get a chance please ?

@sholagi
Copy link

sholagi commented Jan 27, 2023

I am noticing similar behavior with chip-app1 in my local setup.
I ran the below test cases without involving/running via TH.

The chip-app1 seem to fail/crash after the last test step. The last test step completes successfully but still chip-app1 crashes at the very end.

TC_DESC_2_2
TC_PSCFG_3_1
TC_RH_3_1
TC_OCC_2_4

@bzbarsky-apple
Copy link
Contributor

From the CNET_TH1.txt log:

[1673250410.021895][1099740:1099740] CHIP:SPT: VerifyOrDie failure at ../../examples/placeholder/linux/third_party/connectedhomeip/src/lib/support/Pool.h:337: Allocated() == 0

This typically happens if you quit the application without actually shutting down the Matter stack first. Which is what I see here: there is no shutdown before that assertion failure.

I suggest running under a debugger so you can get a stack to whatever made the exit() call here...

@bzbarsky-apple
Copy link
Contributor

Also, I could probably do the debugging myself, but then I need actual steps to reproduce. I have no idea what chip-app1 is, so either pointer to documentation or step-by-step "run these commands in this order" directions would be useful.

@sholagi
Copy link

sholagi commented Jan 28, 2023

@bzbarsky-apple
These are the steps I follow to reproduce the issue. In this example the test case Test_TC_DESC_2_2_Simulated has 5 test steps. After the last test step to read parts-list succeeds, chip-app1 fails/crashes.

  1. On Raspi, run chip-app1 using the command ./chip-app1 --command Test_TC_DESC_2_2_Simulated --secured-device-port 5540 --trace_file "/logs/trace_log_2023-01-27_17.21.19_0x6eb576d97ba1b2c8_SIMULATED_APP_TEST_Test_TC_DESC_2_2_Simulated.log" --trace_decode 1
  2. On your laptop terminal, run the below commands:
    2.1) rm -rf /tmp/chip*;./chip-tool pairing code 305414945 MT:-24J0AFN00KA0648G00
    2.2) ./chip-tool descriptor read device-type-list 305414945 0
    2.3) ./chip-tool descriptor read server-list 305414945 0
    2.4) ./chip-tool descriptor read client-list 305414945 0
    2.5) ./chip-tool descriptor read parts-list 305414945 0

Observe chip_app1 fails with the error CHIP:SPT: VerifyOrDie failure at ../../examples/placeholder/linux/third_party/connectedhomeip/src/lib/support/Pool.h:337: Allocated() == 0

Additional Info:
MT:-24J0AFN00KA0648G00 is the pairing code, 305414945 is the and 1 is the in the above examples

@raju-apple
Copy link
Contributor

Also , docker pull connectedhomeip/chip-cert-bins:3f360d7565b88d271a1097505e1ddaf8f8a3dc40 is the cert bins docker tag that was used

@bzbarsky-apple
Copy link
Contributor

@sholagi First step: what is chip-app1 and how does one get it? I guess ./scripts/examples/gn_build_test_example.sh app1?

@sholagi
Copy link

sholagi commented Jan 28, 2023

@bzbarsky-apple you can use docker pull method suggested by Raju to get the chip-app1.

@bzbarsky-apple
Copy link
Contributor

@sholagi That really does not work well on an M1 Mac. ;)

@sholagi
Copy link

sholagi commented Jan 28, 2023

@bzbarsky-apple Is there any way I can share the binary? Do you have suggestions? may be slack or box ?

@bzbarsky-apple
Copy link
Contributor

bzbarsky-apple commented Jan 28, 2023

OK, so when I do the above steps, on this step:

 2.2) ./chip-tool descriptor read device-type-list 305414945 1

the server claims:

[1674873207071] [96045:61954523] [TOO] AttributePath does not match

and the server then calls SetCommandExitStatus(CHIP_ERROR_INTERNAL) from TestCommand::CheckAttributePath, which then does:

   58  	        chip::DeviceLayer::PlatformMgr().StopEventLoopTask();
-> 59  	        exit(CHIP_NO_ERROR == status ? EXIT_SUCCESS : EXIT_FAILURE);

which is very much an unclean shutdown that is not shutting down the Matter stack... But that's not what's going on in the logs above. Trying to figure out why this thing is stopping at the very first read.

@bzbarsky-apple
Copy link
Contributor

Ah, the test expects device-type-list to be read on EP0, not EP1.

@sholagi
Copy link

sholagi commented Jan 28, 2023

Yes you are right. My bad. Please change the endpoints to 0 in all the steps.

@bzbarsky-apple
Copy link
Contributor

OK, same deal in the end. Now we are doing AsyncExit, which does SetCommandExitStatus(CHIP_NO_ERROR), which is exiting without properly shutting down the stack.

bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Jan 28, 2023
Otherwise we end up crashing on asserts about leaks.

Fixes project-chip#24705
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Jan 28, 2023
Otherwise we end up crashing on asserts about leaks.

Fixes project-chip#24705
@bzbarsky-apple
Copy link
Contributor

#24714 should fix this.

@raju-apple
Copy link
Contributor

Thanks 🙏🏼 @bzbarsky-apple . Hopefully we can fast track this and we pick the TOT on Monday for the TH build.

woody-apple pushed a commit that referenced this issue Jan 28, 2023
…24714)

Otherwise we end up crashing on asserts about leaks.

Fixes #24705
kkasperczyk-no pushed a commit to kkasperczyk-no/sdk-connectedhomeip that referenced this issue Mar 15, 2023
kkasperczyk-no pushed a commit to kkasperczyk-no/sdk-connectedhomeip that referenced this issue Mar 15, 2023
lecndav pushed a commit to lecndav/connectedhomeip that referenced this issue Mar 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants