Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SVE] [Android] Crash on Android13 #22440

Closed
Allen-Guof opened this issue Sep 7, 2022 · 18 comments
Closed

[SVE] [Android] Crash on Android13 #22440

Allen-Guof opened this issue Sep 7, 2022 · 18 comments
Assignees

Comments

@Allen-Guof
Copy link

Allen-Guof commented Sep 7, 2022

Reproduction steps

I have an esp32-c3 device connected to my Android 13 phone. When the app runs for a while, the app will crash. But I can't see the specific stack info. Log is as follows:

2022-09-07 10:33:45.374 11967-11979/com.yeelight.yeelight_fluid E/DL: Chip stack locking error at '../../src/transport/Session.h:190'. Code is unsafe/racy
2022-09-07 10:33:45.374 11967-11979/com.yeelight.yeelight_fluid E/-: chipDie chipDie chipDie
    
    --------- beginning of crash
2022-09-07 10:33:45.375 11967-11979/com.yeelight.yeelight_fluid A/libc: Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 11979 (FinalizerDaemon), pid 11967 (.yeelight_fluid)

Bug prevalence

It crashes every time it starts up for a while。

GitHub hash of the SDK that was being used

f046787

Platform

android

Platform Version(s)

Android 13

Type

Platform Issue

Anything else?

No response

@Allen-Guof
Copy link
Author

Same error on Android 12

@bzbarsky-apple
Copy link
Contributor

The actual issue is the "Chip stack locking error" bit: that triggers a crash. What's really needed here is a stack to figure out what code is removing session holders on the wrong thread....

@andy31415
Copy link
Contributor

The stack locking means that the code attempts to call CHIP APIs in a thread that is not supposed to do that. May be an android binding problem, but a stack trace would really be needed to debug.

@andy31415
Copy link
Contributor

same as #21708 maybe? that one has a stack trace.

@Allen-Guof
Copy link
Author

I saw more logs in out flutter project. Hope this helps with the issue. @andy31415 @bzbarsky-apple

I/.yeelight_fluid(14946): Explicit concurrent copying GC freed 42949(1675KB) AllocSpace objects, 3(124KB) LOS objects, 83% free, 4759KB/28MB, paused 65us,27us total 51.164ms
D/DMG     (14946): MoveToState ReadClient[0xb400007ac3530d50]: Moving to [      Idle]
E/DL      (14946): Chip stack locking error at '../../src/transport/Session.h:190'. Code is unsafe/racy
E/-       (14946): chipDie chipDie chipDie
F/libc    (14946): Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 14965 (FinalizerDaemon), pid 14946 (.yeelight_fluid)
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Build fingerprint: 'google/coral/coral:13/TP1A.220624.014/8819323:user/release-keys'
Revision: 'MP1.0'
ABI: 'arm64'
Timestamp: 2022-09-26 14:07:40.511265991+0800
Process uptime: 398s
Cmdline: com.yeelight.yeelight_fluid
pid: 14946, tid: 14965, name: FinalizerDaemon  >>> com.yeelight.yeelight_fluid <<<
uid: 10497
signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
    x0  0000000000000000  x1  0000000000003a75  x2  0000000000000006  x3  00000079804921d0
    x4  0000000000000010  x5  0000000000000010  x6  0000000000000010  x7  7f7f7f7f7f7f7f7f
    x8  00000000000000f0  x9  0000007ca0703a00  x10 0000000000000001  x11 0000007ca0741ce4
    x12 00000079804909c0  x13 0000000000000018  x14 0000007980491bf8  x15 000000000e63186e
    x16 0000007ca07a6d60  x17 0000007ca0783b70  x18 000000797f684000  x19 0000000000003a62
    x20 0000000000003a75  x21 00000000ffffffff  x22 0000007977816476  x23 0000007977ba6a61
    x24 00000079f1c00880  x25 0000007980492578  x26 0000007980492590  x27 0000007980492578
    x28 0000007980492470  x29 0000007980492250
    lr  0000007ca0733868  sp  00000079804921b0  pc  0000007ca0733894  pst 0000000000000000
backtrace:
      #00 pc 0000000000051894  /apex/com.android.runtime/lib64/bionic/libc.so (abort+164) (BuildId: 058e3ec96fa600fb840a6a6956c6b64e)
      #01 pc 00000000000fb080  /data/app/~~NjKnruDNt8DW7fya4V26lA==/com.yeelight.yeelight_fluid-0Z8O0mJaY9uUivOIfmlggQ==/base.apk (chipAbort+8)
      #02 pc 00000000010f8538  /data/app/~~NjKnruDNt8DW7fya4V26lA==/com.yeelight.yeelight_fluid-0Z8O0mJaY9uUivOIfmlggQ==/base.apk (chipDie+28)
      #03 pc 00000000010f84d8  /data/app/~~NjKnruDNt8DW7fya4V26lA==/com.yeelight.yeelight_fluid-0Z8O0mJaY9uUivOIfmlggQ==/base.apk (chip::Platform::Internal::AssertChipStackLockedByCurrentThread(char const*, int)+60)
      #04 pc 000000000110a47c  /data/app/~~NjKnruDNt8DW7fya4V26lA==/com.yeelight.yeelight_fluid-0Z8O0mJaY9uUivOIfmlggQ==/base.apk (chip::Transport::Session::RemoveHolder(chip::SessionHolder&)+48)
      #05 pc 0000000001109d58  /data/app/~~NjKnruDNt8DW7fya4V26lA==/com.yeelight.yeelight_fluid-0Z8O0mJaY9uUivOIfmlggQ==/base.apk (chip::SessionHolder::Release()+72)
      #06 pc 0000000001109ce0  /data/app/~~NjKnruDNt8DW7fya4V26lA==/com.yeelight.yeelight_fluid-0Z8O0mJaY9uUivOIfmlggQ==/base.apk (chip::SessionHolder::~SessionHolder()+48)
      #07 pc 0000000000114be4  /data/app/~~NjKnruDNt8DW7fya4V26lA==/com.yeelight.yeelight_fluid-0Z8O0mJaY9uUivOIfmlggQ==/base.apk (chip::app::ReadPrepareParams::~ReadPrepareParams()+40)
      #08 pc 0000000001099814  /data/app/~~NjKnruDNt8DW7fya4V26lA==/com.yeelight.yeelight_fluid-0Z8O0mJaY9uUivOIfmlggQ==/base.apk (chip::app::ReadClient::~ReadClient()+180)
      #09 pc 00000000000f7958  /data/app/~~NjKnruDNt8DW7fya4V26lA==/com.yeelight.yeelight_fluid-0Z8O0mJaY9uUivOIfmlggQ==/base.apk (void chip::Platform::Delete<chip::app::ReadClient>(chip::app::ReadClient*)+56)
      #10 pc 00000000000f78f4  /data/app/~~NjKnruDNt8DW7fya4V26lA==/com.yeelight.yeelight_fluid-0Z8O0mJaY9uUivOIfmlggQ==/base.apk (chip::Controller::ReportCallback::~ReportCallback()+196)
      #11 pc 00000000000f79f0  /data/app/~~NjKnruDNt8DW7fya4V26lA==/com.yeelight.yeelight_fluid-0Z8O0mJaY9uUivOIfmlggQ==/base.apk (chip::Controller::ReportCallback::~ReportCallback()+36)
      #12 pc 00000000000f6838  /data/app/~~NjKnruDNt8DW7fya4V26lA==/com.yeelight.yeelight_fluid-0Z8O0mJaY9uUivOIfmlggQ==/base.apk (Java_chip_devicecontroller_ReportCallbackJni_deleteCallback+116)
      #13 pc 0000000000440354  /apex/com.android.art/lib64/libart.so (art_quick_generic_jni_trampoline+148) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
      #14 pc 000000000020a910  /apex/com.android.art/lib64/libart.so (nterp_helper+5648) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
      #15 pc 00000000002b8476  [anon:dalvik-classes.dex extracted in memory from /data/app/~~NjKnruDNt8DW7fya4V26lA==/com.yeelight.yeelight_fluid-0Z8O0mJaY9uUivOIfmlggQ==/base.apk] (chip.devicecontroller.ReportCallbackJni.finalize+22)
      #16 pc 000000000020a254  /apex/com.android.art/lib64/libart.so (nterp_helper+3924) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
      #17 pc 000000000002a736  /apex/com.android.art/javalib/core-libart.jar (java.lang.Daemons$FinalizerDaemon.doFinalize+22)
      #18 pc 000000000020a254  /apex/com.android.art/lib64/libart.so (nterp_helper+3924) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
      #19 pc 000000000002a838  /apex/com.android.art/javalib/core-libart.jar (java.lang.Daemons$FinalizerDaemon.runInternal+180)
      #20 pc 000000000020a254  /apex/com.android.art/lib64/libart.so (nterp_helper+3924) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
      #21 pc 000000000002a4ee  /apex/com.android.art/javalib/core-libart.jar (java.lang.Daemons$Daemon.run+50)
      #22 pc 000000000020b074  /apex/com.android.art/lib64/libart.so (nterp_helper+7540) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
      #23 pc 00000000000f6748  /apex/com.android.art/javalib/core-oj.jar (java.lang.Thread.run+8)
      #24 pc 000000000043696c  /apex/com.android.art/lib64/libart.so (art_quick_invoke_stub+556) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
      #25 pc 0000000000468738  /apex/com.android.art/lib64/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+156) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
      #26 pc 0000000000468420  /apex/com.android.art/lib64/libart.so (art::JValue art::InvokeVirtualOrInterfaceWithJValues<art::ArtMethod*>(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, art::ArtMethod*, jvalue const*)+388) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
      #27 pc 0000000000617f0c  /apex/com.android.art/lib64/libart.so (art::Thread::CreateCallback(void*)+1668) (BuildId: 97fdb979efb7d2b596fa4fceabaad95b)
      #28 pc 00000000000b62b8  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+208) (BuildId: 058e3ec96fa600fb840a6a6956c6b64e)
      #29 pc 0000000000052fb8  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64) (BuildId: 058e3ec96fa600fb840a6a6956c6b64e)

@Allen-Guof
Copy link
Author

I edited the C++ source code. The issue seems to be solved. Is this the correct way?

ReportCallback::~ReportCallback()
{
    chip::DeviceLayer::StackLock stack;
    
    JNIEnv * env = JniReferences::GetInstance().GetEnvForCurrentThread();
    VerifyOrReturn(env != nullptr, ChipLogError(Controller, "Could not get JNIEnv for current thread"));
    if (mSubscriptionEstablishedCallbackRef != nullptr)
    {
        env->DeleteGlobalRef(mSubscriptionEstablishedCallbackRef);
    }
    env->DeleteGlobalRef(mReportCallbackRef);
    if (mReadClient != nullptr)
    {
        Platform::Delete(mReadClient);
    }
}

@Allen-Guof
Copy link
Author

@andy31415 Hi, any idea?

@DanijelBojcic
Copy link

We are seeing the same issue. Is there any progress on this?

@Allen-Guof
Copy link
Author

@DanijelBojcic There is no progress. When I add the code: chip::DeviceLayer::StackLock stack. ANR sometimes happens on some phones (such as Android 13). No one has responded yet on how to deal with this issue.

@DanijelBojcic
Copy link

We did not test it in production yet, but i can confirm that our team did not see the issue since I added that line.

@stale
Copy link

stale bot commented Aug 11, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale Stale issue or PR label Aug 11, 2023
@Allen-Guof
Copy link
Author

Any update?

@stale stale bot removed the stale Stale issue or PR label Aug 17, 2023
@MrLiuYunPing
Copy link

mSubscriptionEstablishedCallbackRef

Hope this will be helpful, thanks!

@MrLiuYunPing
Copy link

same crash!
the log:
12-19 14:49:35.458 2387 2429 F libc : Fatal signal 7 (SIGBUS), code 1 (BUS_ADRALN), fault addr 0xbfd915f900000001 in tid 2429 (FinalizerDaemon), pid 2387 (uShiKeJi.Matter)
12-19 14:49:35.511 2387 2433 E BLASTBufferQueue: SurfaceView[com.ShuShiKeJi.Matter/com.shuShiKeJi.flutter_matter_home.MainActivity]#1 Faking releaseBufferCallback from transactionCompleteCallback
12-19 14:49:35.511 2387 2433 E BLASTBufferQueue: SurfaceView[com.ShuShiKeJi.Matter/com.shuShiKeJi.flutter_matter_home.MainActivity]#1 Faking releaseBufferCallback from transactionCompleteCallback
12-19 14:49:35.511 2387 2433 E BLASTBufferQueue: SurfaceView[com.ShuShiKeJi.Matter/com.shuShiKeJi.flutter_matter_home.MainActivity]#1 Faking releaseBufferCallback from transactionCompleteCallback
12-19 14:49:35.527 2387 2433 E BLASTBufferQueue: SurfaceView[com.ShuShiKeJi.Matter/com.shuShiKeJi.flutter_matter_home.MainActivity]#1 Faking releaseBufferCallback from transactionCompleteCallback
12-19 14:49:35.527 2387 2433 E BLASTBufferQueue: SurfaceView[com.ShuShiKeJi.Matter/com.shuShiKeJi.flutter_matter_home.MainActivity]#1 Faking releaseBufferCallback from transactionCompleteCallback
12-19 14:49:35.527 2387 2433 E BLASTBufferQueue: SurfaceView[com.ShuShiKeJi.Matter/com.shuShiKeJi.flutter_matter_home.MainActivity]#1 Faking releaseBufferCallback from transactionCompleteCallback
12-19 14:49:35.996 5013 5013 F DEBUG : Softversion: PD2072_A_6.9.0
12-19 14:49:35.996 5013 5013 F DEBUG : Time: 2023-12-19 14:49:35
12-19 14:49:35.996 5013 5013 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
12-19 14:49:35.996 5013 5013 F DEBUG : Build fingerprint: 'vivo/PD2072/PD2072:13/TP1A.220624.014/compiler08141930:user/release-keys'
12-19 14:49:35.996 5013 5013 F DEBUG : Revision: '0'
12-19 14:49:35.997 5013 5013 F DEBUG : ABI: 'arm64'
12-19 14:49:35.997 5013 5013 F DEBUG : Timestamp: 2023-12-19 14:49:35.616480867+0800
12-19 14:49:35.997 5013 5013 F DEBUG : Process uptime: 201s
12-19 14:49:35.997 5013 5013 F DEBUG : Cmdline: com.ShuShiKeJi.Matter
12-19 14:49:35.997 5013 5013 F DEBUG : pid: 2387, tid: 2429, name: FinalizerDaemon >>> com.ShuShiKeJi.Matter <<<
12-19 14:49:35.997 5013 5013 F DEBUG : uid: 10391
12-19 14:49:35.997 5013 5013 F DEBUG : signal 7 (SIGBUS), code 1 (BUS_ADRALN), fault addr 0xbfd915f900000001
12-19 14:49:35.997 5013 5013 F DEBUG : x0 b4000075f405a490 x1 00000074abe437e8 x2 b4000075f405a490 x3 00000074a5058260
12-19 14:49:35.997 5013 5013 F DEBUG : x4 0000000000000004 x5 0000007533fe3640 x6 0000627000006a3e x7 00008604000054a8
12-19 14:49:35.997 5013 5013 F DEBUG : x8 bfd915f900000001 x9 000000748043c268 x10 0000000000000000 x11 0000000000000000
12-19 14:49:35.997 5013 5013 F DEBUG : x12 0000000000000004 x13 0000000000000003 x14 00000074abe437e0 x15 0000000000000000
12-19 14:49:35.997 5013 5013 F DEBUG : x16 0000007480421a90 x17 00000077d2ff0830 x18 00000074a8a26000 x19 b4000076a400dd80
12-19 14:49:35.997 5013 5013 F DEBUG : x20 0000000000000000 x21 b4000076a400de40 x22 b4000075f4002e30 x23 0000000000000000
12-19 14:49:35.997 5013 5013 F DEBUG : x24 00000074fcfc2810 x25 00000074abe43808 x26 0000000018380004 x27 0000000000000014
12-19 14:49:35.997 5013 5013 F DEBUG : x28 00000074abe43840 x29 00000074abe436e0
12-19 14:49:35.997 5013 5013 F DEBUG : lr 000000747eb1eab4 sp 00000074abe436d0 pc bfd915f900000001 pst 0000000060001000
12-19 14:49:35.997 5013 5013 F DEBUG : backtrace:
12-19 14:49:35.997 5013 5013 F DEBUG : #00 pc bfd915f900000001
12-19 14:49:35.997 5013 5013 F DEBUG : #1 pc 0000000000d14ab0 /data/app/~~D5I0et2PRM46SGdN3Y6hEg==/com.ShuShiKeJi.Matter-oHNFEWLpDJlAk-xsvvWJbw==/base.apk
12-19 14:49:35.997 5013 5013 F DEBUG : #2 pc 0000000000d14a6c /data/app/~~D5I0et2PRM46SGdN3Y6hEg==/com.ShuShiKeJi.Matter-oHNFEWLpDJlAk-xsvvWJbw==/base.apk (Java_chip_devicecontroller_ReportCallbackJni_deleteCallback+96)
12-19 14:49:35.997 5013 5013 F DEBUG : #3 pc 00000000000502d4 /data/app/~~D5I0et2PRM46SGdN3Y6hEg==/com.ShuShiKeJi.Matter-oHNFEWLpDJlAk-xsvvWJbw==/oat/arm64/base.odex (art_jni_trampoline+116)
12-19 14:49:35.997 5013 5013 F DEBUG : #4 pc 000000000020a910 /apex/com.android.art/lib64/libart.so (nterp_helper+5648) (BuildId: 5904ade38b34a628ab2f645cadebea19)
12-19 14:49:35.997 5013 5013 F DEBUG : #5 pc 0000000000318016 /data/app/~~D5I0et2PRM46SGdN3Y6hEg==/com.ShuShiKeJi.Matter-oHNFEWLpDJlAk-xsvvWJbw==/base.apk (chip.devicecontroller.ReportCallbackJni.finalize+22)
12-19 14:49:35.997 5013 5013 F DEBUG : #6 pc 0000000000046558 /system/framework/arm64/boot-core-libart.oat (java.lang.Daemons$FinalizerDaemon.doFinalize+104) (BuildId: a8be4e17d00cda2ebf1d788b540851eb682f4b55)
12-19 14:49:35.997 5013 5013 F DEBUG : #7 pc 00000000000467f8 /system/framework/arm64/boot-core-libart.oat

@andy31415
Copy link
Contributor

The change in #22440 (comment) seems correct (aquire the stack lock before doing deletion) ... it can probably have a smaller scope and we should also check if that delete could happen while lock is held and avoid a deadlock).

Since you are able to reproduce this issue, could you create a fix PR ?

@yunhanw-google
Copy link
Contributor

yunhanw-google commented Jan 11, 2024

I think we have fixed it in master via #29710 and #25625, could you retest it using latest master? @MrLiuYunPing thanks

@yunhanw-google yunhanw-google self-assigned this Jan 11, 2024
@yunhanw-google
Copy link
Contributor

@MrLiuYunPing you can reach me(yunhanw) at slack for further discussion as well, thanks

@yunhanw-google
Copy link
Contributor

I have locally retried it several time, I cannot reproduce it with latest code. Close it now. Feel free to create a new ticket if the crash continues @MrLiuYunPing . thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

6 participants