Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in libjpeg.so when BitmapFactory.decodeStream called #1058

Closed
rauk123 opened this issue Aug 6, 2019 · 31 comments
Closed

Crash in libjpeg.so when BitmapFactory.decodeStream called #1058

rauk123 opened this issue Aug 6, 2019 · 31 comments

Comments

@rauk123
Copy link

rauk123 commented Aug 6, 2019

Since we need to support 64bit CPU in order to publish on Google Play.Then we found a strange crash happened in libjpeg.so in some devices(arm64-v8a).

Some logs are below:

1 #00 pc 0000000000004a00 /system/lib64/libjpeg.so [arm64-v8a]
2 #01 pc 00000000000189d4 /system/lib64/libjpeg.so [arm64-v8a]
3 #02 pc 000000000001fef4 /system/lib64/libjpeg.so [arm64-v8a]
4 #03 pc 0000000000017188 /system/lib64/libjpeg.so (jpeg_read_scanlines+152) [arm64-v8a]
5 #04 pc 00000000002484e4 /system/lib64/libskia.so (SkJPEGImageDecoder::onDecode(SkStream*, SkBitmap*, SkImageDecoder::Mode)+4532) [arm64-v8a]
6 #05 pc 000000000023d9b4 /system/lib64/libskia.so (SkImageDecoder::decode(SkStream*, SkBitmap*, SkColorType, SkImageDecoder::Mode)+152) [arm64-v8a]
7 #06 pc 00000000000c7b48 /system/lib64/libandroid_runtime.so (android::NativeInputEventSender::NativeInputEventSender(_JNIEnv*, _jobject*, android::sp<android::InputChannel> const&, android::sp<android::MessageQueue> const&)+136) [arm64-v8a]
8 #07 pc 00000000000c857c /system/lib64/libandroid_runtime.so (android::NativeInputEventSender::receiveFinishedSignals(_JNIEnv*)+48) [arm64-v8a]
9 #08 pc 0000000000f647b8 /system/framework/arm64/boot.oat (oatdata+16136120) [arm64-v8a]
10 java:
11 android.graphics.BitmapFactory.decodeStreamInternal(BitmapFactory.java:677)
12 android.graphics.BitmapFactory.decodeStream(BitmapFactory.java:653)
13 com.c.a.utils.be.a(Encore:820)

Any ideas? Thanks a lot.

@alexcohn
Copy link

alexcohn commented Aug 7, 2019

Does your app include libjpeg? Is it your obfuscated Java code com.c.a.utils.be.a that calls BitmapFactory.decodeStream ?

@rauk123
Copy link
Author

rauk123 commented Aug 8, 2019

Does your app include libjpeg? Is it your obfuscated Java code com.c.a.utils.be.a that calls BitmapFactory.decodeStream ?

com.c.a.utils.be.a is the code calles BitmapFactory.decodeStream. App doesn't include libjpeg.so. But some android system uses this libjpeg.so to decode bitmap. And some uses libqc-opt.so. I also found some crashes about libqc-opt.so.

1 #00 pc 0000000000004b6c /system/vendor/lib64/libqc-opt.so [arm64-v8a]
2 #01 pc 00000000000187e0 /system/vendor/lib64/libqc-opt.so [arm64-v8a]
3 #02 pc 000000000001fe30 /system/vendor/lib64/libqc-opt.so [arm64-v8a]
4 #03 pc 0000000000017008 /system/vendor/lib64/libqc-opt.so (jReadScanlines+152) [arm64-v8a]
5 #04 pc 000000000037712c /system/lib64/libskia.so (SkJPEGImageDecoder::onDecode(SkStream*, SkBitmap*, SkImageDecoder::Mode)+1084) [arm64-v8a]
6 #05 pc 000000000023ac48 /system/lib64/libskia.so (SkImageDecoder::decode(SkStream*, SkBitmap*, SkColorType, SkImageDecoder::Mode)+80) [arm64-v8a]
7 #06 pc 00000000000c59f0 /system/lib64/libandroid_runtime.so (android::android_view_InputDevice_create(_JNIEnv*, android::InputDeviceInfo const&)+568) [arm64-v8a]
8 #07 pc 00000000000c613c /system/lib64/libandroid_runtime.so [arm64-v8a]
9 #08 pc 0000000002827288 /data/dalvik-cache/arm64/system@framework@boot.oat (oatexec+6062728) [arm64-v8a]
10 java:
11 android.graphics.BitmapFactory.decodeStream(BitmapFactory.java:630)
12 android.graphics.BitmapFactory.decodeResourceStream(BitmapFactory.java:456)
13 android.graphics.BitmapFactory.decodeResource(BitmapFactory.java:479)
14 com.c.a.utils.be.a(Encore:825)

Unfortunately,we can't reproduce this problem. It was happened on some devices.

@alexcohn
Copy link

alexcohn commented Aug 8, 2019

You did not answer the specific question: Does this stack start with your obfuscated Java code? Does your app contain some C++ components, too?

It could be very helpful for the community if you share the names of offending devices. There may be a bug in some vendor-specific implementations of 64-bit libjpeg.so. Note that on the qualcomm device, this libqc-opt.so is simply a wrapper for the same libjpeg.

If the crash reproduce on the same device often, you can try to install a 32-bit version of your app on such device. Will the crashes still happen?

@rauk123
Copy link
Author

rauk123 commented Aug 8, 2019

Thanks.

Yes,this stack starts with my obfuscated Java code. And my app also contains some C++ components.

Before we supported 64-bit version, we never received this bug report. But now, we have to support 64-bit version in order to publish on Google Play. After that, many reports about this bug appeared. So I am sure that it will not crash with a 32-bit version app.

It happened at most on these devices.

a
b
c

@alexcohn
Copy link

alexcohn commented Aug 8, 2019

Auch, this hurts. Don't hesitate to write to OPPO and to HTC, let them fix their bugs. I would also suggest you to escalate this issue with your local Google representative. With the new 64-bit regulations, it will definitely hit not only you.

As for immediate practical steps… You use split build, don't you? Then you can set (in build.gradle) minSdkVersion for the 64-bit flavor to 26 or even higher. This will exclude the older devices where libjpeg or other 64-bit system libraries may still have glitches.

@rauk123
Copy link
Author

rauk123 commented Aug 8, 2019

Yes, I agree that it looks like a bug in some OPPO devices.
Thanks for your suggestions. I'll try

@srikanthsunkari
Copy link

Hi @alexcohn I am facing a similar issue but I observed Oppo Realme 3Pro (RMX1851), Android 9 was the one device which is from many other brands, Mi A1 (tissot_sprout), Android 9 is the most effective device. It could be specific to Android 9, or latest ndk lib since it is only happening on Android 9 version.
And also It doesn't come from our code.

@alexcohn
Copy link

alexcohn commented Aug 11, 2019

@srikanthsunkari generally speaking, you can blacklist specific devices for a version on Play Store. Unfortunately, this may become hard to manage if you want to keep the 32-bit flavor of your app available for users with RMX1851 and other problematic devices.

@srikanthsunkari
Copy link

@alexcohn thank you for your reply, May I know what is RMC1851?

@alexcohn
Copy link

alexcohn commented Aug 12, 2019 via email

@srikanthsunkari
Copy link

@alexcohn , One of our lib uses ndk, But this issue is not from that lib or from our stack trace It is completely from ndk lib.
Can you please provide an explanation for us why this scenario is producible?

@alexcohn
Copy link

@srikanthsunkari as I wrote above, this seems to be a bug in OPPO system: the way the built the 64-bit libjpeg.so (a system library) has some fault.

@DanAlbert
Copy link
Member

Thanks a bunch for the help as always, @alexcohn

I sent this thread along to the PM tracking 64-bit transition issues. Given that this appears to be a device-specific issue, I'm not sure there's much we can do on our end short of documenting the advice @alexcohn gave in #1058 (comment)

@DanAlbert
Copy link
Member

set (in build.gradle) minSdkVersion for the 64-bit flavor to 26 or even higher.

@alexcohn: Where'd you get 26 from? Is that a guess based on those devices all being 25 or lower (I haven't checked all of them), or is there some indication that this was fixed in that release?

We clearly have a missing CTS test somewhere, I'm just not sure what that test is without knowing how to repro.

@alexcohn
Copy link

@DanAlbert the devices listed in the original table are all 25 and lower. Then came @srikanthsunkari with Realme 3 Pro from the same manufacturer, but with API 28, so my suggestion happens to have been too optimistic.

This is really a 64-bit transition issue. Note that it may be extremely hard for indie developers to handle such rare crashes on a rather exotic phone. I hope that with Google's resources, you can find what really happens in 64-bit Jpeg decoder on these devices. There is a slim chance that the affected apps have some very specific Jpeg resources that trigger the crash, but more likely these libs were built with some wrong combination of flags and tools. For example, it could be that OPPO chose to use libjpeg while AOSP has switched to libjpeg-turbo some while ago.

I believe that for an individual developer who is haunted by such crash there is no real harm in setting minSdkVersion to 30, effectively opting out from the forced transition to 64-bit for the meanwhile. Methinks, such workaround should not be considered illegitimate on Play Store.

On the other hand, even before fixing the bug in Jpeg decoder, it is in power of Play Store to

  1. track down such crashes across all apps
  2. blacklist the devices that demonstrate such behavior so that they get a 32-bit bundle if available

@DanAlbert
Copy link
Member

Then came @srikanthsunkari with Realme 3 Pro from the same manufacturer, but with API 28, so my suggestion happens to have been too optimistic.

It definitely looks to me like there's more than one bug being discussed here. Could be that your suggestion is correct for working around a subset of them.

@DanAlbert
Copy link
Member

Found https://stackoverflow.com/q/57412918/632035, which is probably the other crash, and yeah, that's yet another different stack trace.

It is always possible that this is actually just a case of memory corruption in the app that is causing a segfault later on. ASan might help if that is the case. Don't have enough information to know if that's the problem or if this actually is a device bug.

@srikanthsunkari
Copy link

srikanthsunkari commented Aug 13, 2019

Hi, @DanAlbert @alexcohn I managed to pull crash report here from my device which happens randomly, can you conclude something from this report?
Thanks.

@alexcohn
Copy link

@srikanthsunkari It seems that the crash happens while the surface is not ready:

18:35:34.323 18685-18685/com.test.surfaceview E/Target BindViewHolder: called
18:35:34.800 18685-18898/com.test.surfaceview A/libc: Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x40 in tid 18898 (hwuiTask1), pid 18685 (teststaging)

compare this to

18:35:26.732 18685-18685/com.test.surfaceview E/Target BindViewHolder: called
18:35:27.107 18685-18685/com.test.surfaceview I/ExoPlayerImpl: Init 2.2.0 [tissot_sprout, Mi A1, Xiaomi, 28]
18:35:27.122 795-795/? D/SurfaceFlinger: duplicate layer name: changing SurfaceView - com.test.surfaceview/com.demo.test.ui.activities.MainActivity to SurfaceView - com.test.surfaceview/com.demo.test.ui.activities.MainActivity#2
18:35:27.124 795-795/? D/SurfaceFlinger: duplicate layer name: changing Background for -SurfaceView - com.test.surfaceview/com.demo.test.ui.activities.MainActivity to Background for -SurfaceView - com.test.surfaceview/com.demo.test.ui.activities.MainActivity#1
18:35:27.139 18685-20296/com.test.surfaceview I/OMXClient: IOmx service obtained

Also the name _ZZN7androidL46android_view_RenderNode_requestPositionUpdatesEP7_JNIEnvP8_jobjectlS3_EN26SurfaceViewPositionUpdater21doUpdatePositionAsyncEliiii suggests that a surfaceview position update failed here, and may not be directly caused by switch to 64-bit architecture.

@srikanthsunkari
Copy link

Thank you @alexcohn means issue may be due to my code changes? If so, since it is not reproducible any suggestion for this?

@alexcohn
Copy link

alexcohn commented Aug 19, 2019

This could be some delicate time race, and it may be beyond your control. How often is this crash reproduced? How is it distributed across devices, OS versions, whatever? Crashlytics has a lot of helpful hints for you.

Actually, when you look at the full statistics, you can usually see that there are some crashes that have no rational explanation. If they affect less than 1 in 10000 active users, you can probably just ignore them (unless this 1 user happens to be your boss).

@DanAlbert
Copy link
Member

DanAlbert commented Aug 20, 2019

Just to be sure: the switch to 64-bit is the only variable that changed when this crash started appearing? It wasn't the same release that also did something like switch to app bundles/APK splits, updated your NDK, switched compilers, or some other change? I'm guessing not, but someone asked so I wanted to clarify.

@srikanthsunkari
Copy link

This could be some delicate time race, and it may be beyond your control. How often is this crash reproduced? How is it distributed across devices, OS versions, whatever? Crashlytics has a lot of helpful hints for you.

After so many test cases, crash is always reproducible after we increased the SurfaceView instances > 6 in the layout, and leaving the SurfaceView tag without visibility tag.

@srikanthsunkari
Copy link

Just to be sure: the switch to 64-bit is the only variable that changed when this crash started appearing? It wasn't the same release that also did something like switch to app bundles/APK splits, updated your NDK, switched compilers, or some other change? I'm guessing not, but someone asked so I wanted to clarify.

Yes, we have had the change for the 64-bit flavor support, but no splits. And it is observed that only after this apk we have had this crash. And it happens only in OS 9.0 and above.
We have tested this in Samsung M20 (OS 9.0) manufacturer we have observed a reasonable error message saying below.

2019-08-13 17:26:32.364 4136-4136/com.test.surfaceview E/SurfaceView: A process tried to create too many surfaces
    android.view.WindowManager$TooManySurfacesException
        at android.view.SurfaceControl.nativeCreate(Native Method)
        at android.view.SurfaceControl.<init>(SurfaceControl.java:704)
        at android.view.SurfaceControl.<init>(SurfaceControl.java:74)
        at android.view.SurfaceControl$Builder.build(SurfaceControl.java:456)
        at android.view.SurfaceView$SurfaceControlWithBackground.<init>(SurfaceView.java:1474)
        at android.view.SurfaceView.updateSurface(SurfaceView.java:651)
        at android.view.SurfaceView$2.onPreDraw(SurfaceView.java:188)
        at android.view.ViewTreeObserver.dispatchOnPreDraw(ViewTreeObserver.java:991)
        at android.view.ViewRootImpl.performTraversals(ViewRootImpl.java:3069)
        at android.view.ViewRootImpl.doTraversal(ViewRootImpl.java:1942)
        at android.view.ViewRootImpl$TraversalRunnable.run(ViewRootImpl.java:8595)
        at android.view.Choreographer$CallbackRecord.run(Choreographer.java:988)
        at android.view.Choreographer.doCallbacks(Choreographer.java:765)
        at android.view.Choreographer.doFrame(Choreographer.java:700)
        at android.view.Choreographer$FrameDisplayEventReceiver.run(Choreographer.java:967)
        at android.os.Handler.handleCallback(Handler.java:873)
        at android.os.Handler.dispatchMessage(Handler.java:99)
        at android.os.Looper.loop(Looper.java:214)
        at android.app.ActivityThread.main(ActivityThread.java:7156)
        at java.lang.reflect.Method.invoke(Native Method)
        at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:494)
        at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:975)

@alexcohn
Copy link

@DanAlbert note that there are at least 3 distinct cases discussed in this thread

@alexcohn
Copy link

@srikanthsunkari it's not clear: on devices with OS version <9, you can open more surfaces than on Android 9, can't you? Also, on Android 9, you can open more surfaces when you force 32-bit bundle?

@srikanthsunkari
Copy link

on devices with OS version <9, you can open more surfaces than on Android 9, can't you?

This is not happening on below Android 9 (32-bit) devices.

Also, on Android 9, you can open more surfaces when you force 32-bit bundle?

Android 9 with 32-bit bundle APK also this crash produces.

@DanAlbert
Copy link
Member

Given the error message it doesn't sound like the crash you're seeing is a bug, but just a limit you've exceeded.

@srikanthsunkari
Copy link

@DanAlbert would you confirm that this limit is from vendors implementation and not from the ndk? If it is from ndk it has to be handled and thrown a readable message from the android framework, not from ndk.

@DanAlbert
Copy link
Member

I can guarantee it's not from the NDK. Your stack trace is Java.

@DanAlbert
Copy link
Member

It doesn't seem like there's an NDK bug here, so closing. If that's wrong, lmk and we can reopen, but it sounds more like device bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants