Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes in oboe::AudioStreamAAudio::read(void*, int, long) [AudioStreamAAudio.cpp:490] #1484

Closed
taemincho opened this issue Jan 24, 2022 · 14 comments · Fixed by #1497
Closed
Assignees
Labels
bug P1 high priority
Milestone

Comments

@taemincho
Copy link

Android version(s): Android 11, and 12
Android device(s): Pixel 6, Pixel 6 Pro, Redmi Note 10, and TECNO POVA 2
Oboe version: 1.6.1
App name used for testing:
(Please try to reproduce the issue using the OboeTester or an Oboe sample.)

Recently, we got a noticeable number of crashes in oboe::AudioStreamAAudio::read(void*, int, long) [AudioStreamAAudio.cpp:490]
The crashes have been rare before but remarkably increased recently in the latest phones and thus 99% in Android 11 and 12.

The firebase shows that

  • about 50% of the crashes happened in Pixel6 and Pixel6Pro (Android 12)
  • and 23% happened in Xiaomi Redmi Note 10 (Android 11)
  • and 12% happened in TECNO POVA 2 (Android 11)

Another interesting pattern was that only TECNO POVA 2 crashed with SIGSEGV while all other phones crashed with SIGABRT as shown below:

Pixel6, Pixel6Pro, and Xiaomi Redmi Note 10

Crashed: Thread :  SIGABRT  0x0000000000000000
#00 pc 0x89b0c libc.so 
#01 pc 0x89adc libc.so 
#02 pc 0x3774c libaaudio_internal.so 
#03 pc 0x33438 libaaudio_internal.so 
#04 pc 0x32788 libaaudio_internal.so 
#05 pc 0x351490 libbandlab_audio_engine.so (oboe::AudioStreamAAudio::read(void*, int, long) [AudioStreamAAudio.cpp:490])
#06 pc 0x2b62cc libbandlab_audio_engine.so (AudioInputDevice_Oboe::process(int) [AudioDevice_Oboe.cpp:348])

TECNO POVA 2

Crashed: Thread :  SIGSEGV  0x00000072f077c580
#00 pc 0x4a408 libc.so 
#01 pc 0x8afc libsoundtouch.so 
#02 pc 0xbe74 libsoundtouch.so 
#03 pc 0x5aaf8 libaudioclient.so 
#04 pc 0x280d8 libaaudio_internal.so 
#05 pc 0x351490 libbandlab_audio_engine.so (oboe::AudioStreamAAudio::read(void*, int, long) [AudioStreamAAudio.cpp:490])
#06 pc 0x2b62cc libbandlab_audio_engine.so (AudioInputDevice_Oboe::process(int) [AudioDevice_Oboe.cpp:348])
@taemincho taemincho added the bug label Jan 24, 2022
@philburk philburk added the P1 high priority label Jan 26, 2022
@philburk
Copy link
Collaborator

@taemincho - Thanks for reporting this. A crash like this generally occurs if one thread is closing a stream while another thread is reading from the stream.

In Oboe there is an error callback when a headset is disconnected. That spawns a thread that stops and closes the stream. This could collide with another thread reading the stream that is being closed.. We will try to reproduce this scenario.

Do you know if this happens when users unplug or plug in a headset?

The three devices are all very different. So I don't think it is a HAL bug that is common to all three phones. It seems more likely that they have similar timing when there is a disconnect and that leads to a race condition happening more often.

We need to try to reproduce this scenario in our lab.
Are you doing the reads from a callback of an output stream? Full-duplex mode?
If so then maybe the input stream is being closed while the output callback is still running.

Please send us more details on how the different streams and threads are interacting.

@philburk
Copy link
Collaborator

I notice that read() and write() do not hold mLock.

AAudioStream *stream = mAAudioStream.load();

Maybe they should. But that could create new deadlocks.

@robertwu1
Copy link
Collaborator

I noticed that we added mStreamLock around a year ago to protect against this case (ff689ea).

@philburk Do you still remember the context of the PR. Should we also add mStreamLock to read() and write()?

@philburk
Copy link
Collaborator

@RobertWu - the PR #1200 was to fix #1180
This seems related.

Should we also add mStreamLock to read() and write()?

Yes, perhaps. Please write an Oboe unit test that writes (or reads) a BIG buffer that will take a second or more to complete. Then from another thread try closing the stream. it will probably crash.

Then try the locks. We have to make sure we are not creating new issues with these locks.

@philburk philburk added this to the V1.7.0 milestone Feb 18, 2022
@philburk
Copy link
Collaborator

Pixel 6, Pixel 6 Pro, Redmi Note 10, and TECNO POVA 2

The fact that it is more often on those phones suggests it might also be related to an audioserver crash.

@robertwu1
Copy link
Collaborator

robertwu1 commented Mar 1, 2022

I wrote the unit test and the change in mStreamLock read and writes. See https://github.com/google/oboe/tree/robertwu/testStreamStopWrite

It seems like there is no noticeable differences between not having and having locks. I don't notice any crashes nor significant changes in the buffer size. Is there a programatic way of checking for crashes?

@robertwu1
Copy link
Collaborator

Same thing happens for Android R and T. I don't see any issues with my unittest.

@philburk
Copy link
Collaborator

philburk commented Mar 2, 2022

Please create a PR for your test. it looks good but I have some suggestions that I want to tie to the code.

@philburk
Copy link
Collaborator

philburk commented Mar 2, 2022

@taemincho - Is the read() being called from the output stream callback? Or is it being called from another thread?

Here is a possible sequence of events:

The outputStream callback is called.
it starts to read the input stream

Then an error callback occurs because of a disconnect on the output stream.
That spawns a thread that tries to stop and close the output stream.
That thread should wait until the callback thread has completed.
But we have seen instances where it does not wait and the callback is still running.
That could cause a crash.

Another scenario is if BandLab is defining an onErrorBeforeClose() method and that method closes the input stream.
The close might happen while the callback is reading the input stream.
If so then the input stream closing should be moved to onErrorAfterClose().

@robertwu1
Copy link
Collaborator

Phil and I talked offline and it seems like I've made a mistake in my test and tested for stop() instead of close(). I've opened a PR that solves these issues for AAudio in Oboe.

@taemincho
Copy link
Author

Sorry for the late response,

@philburk Yes, we call read() from the output callback thread, and we don't use onErrorBeforeClose().

In addition, actually, we couldn't find strong evidence that the headphone connection was causing the problem.
We saw a few cases with routing notifications before the crashes but didn't look like directly related to the crashes.
(Can Bluetooth cause the problem? among the few, half was Bluetooth devices)

@philburk
Copy link
Collaborator

philburk commented Mar 3, 2022

Can Bluetooth cause the problem?

A Bluetooth connection can cause a routing change and a DISCONNECT the same as a wired headset.

@taemincho - Are you by any chance closing the streams because of a Broadcast Device change event?

Is it possible that you are closing the input stream before the output stream is getting closed?
If so then the output stream would still be reading the input stream.
Always close the input stream last.

@taemincho
Copy link
Author

taemincho commented Mar 4, 2022

@philburk
Broadcast Device change events are handled on the client (Android) side and when they have been detected the client calls stop() to the native.

We always close the output stream first then close the input stream.
However, when we open the streams we open the output first and the input next,
then start the input first then output next.

We also check oboe::StreamState for the input stream and call read() only when the state is oboe::StreamState::Starting or oboe::StreamState::Started. Should we avoid oboe::StreamState::Starting ?

@philburk
Copy link
Collaborator

philburk commented Mar 5, 2022

@taemincho wrote:

when we open the streams we open the output first and the input next,

That's good. Then you can set the sample rate for input to match the output sample rate.

then start the input first then output next.

Good. That way the output will not have errors when it reads the input.

It does seem like the callback is still running when the close() happens. We are continuing to investigate this.

When closing an Oboe stream, Oboe stops the stream in case it is not stopped,
then sleeps for 10 msec,
then closes the stream.
I plan to improve that code: #1500

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug P1 high priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants