-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible race on device disconnect, callback after close #953
Comments
This sounds like #820. Probably a "use after free".
|
After doing more tests, I've encountered another, probably related, crash: https://gist.github.com/gavv/b4222d9600d6ac3dab95a37e3b26c96d Previously, I was disabling Bluetooth or powering off BT headset, my app was receiving device removal event, and was trying to close oboe streams immediately, and crashed. This time I did a bit different test. I was stopping/starting BT SCO and oboe streams repeatedly, and after one of the restarts headset disappeared (likely because "AudioFlinger server died" from logs). It happens from time to time, I don't know why, but I'm trying to make the app robust to this situation. So Bluetooth was enabled, BT SCO was enabled, headset was enabled, but it disappeared from device list temporary and my app received device removal event. This time, instead of trying to close oboe streams immediately, my app waited for 3 seconds until oboe streams would report failure. However that didn't happen and after 3 seconds it started closing the streams. It stopped and closed both streams, released shared pointers, and destroyed my AudioStreamCallback implementation. Then the app disabled BT SCO. Then onErrorBeforeClose callback was invoked on a destroyed object, and it crashed. Maybe the callback was triggered by disabling BT SCO, or maybe it was triggered but earlier call to onAudioDeviceUpdate() during close, I'm not sure (see logs). |
The reason why I think the second crash is related is that in both crashes there are onAudioDeviceUpdate() calls, and sanitizer complains when we're still inside that function, before my code is even called. |
I've tried to delay deletion of my AudioStreamCallback implementation and oboe stream. Now, after closing oboe stream, I don't delete them immediately, but instead schedule deletion after 5 minutes. With this approach, sanitizer doesn't complain for my own code any more, but it still sometimes complains for errors in AudioStreamLegacy.cpp. A bit later after the complaint, I'm getting yet another crash, maybe caused by memory corruption.
I think it confirms that free-after-use or similar issue happens with some lower-level object, not with my callback and not with oboe stream. Maybe onAudioDeviceUpdate() is not properly synchronized? BTW note that I got all these crashes during quite intensive "stress testing" - I randomly and repeatedly stopped and started my stream, Bluetooth on phone, and the headset (by putting AirPods into the box and getting them back). All these crashes are rare enough. Also note that every time I stop the stream, I also stop BT SCO, and then I start it again before starting the stream. Maybe it's not optimal, but for now I keep this code because probably it helps to reproduce the crashes (I don't know). |
Are you saying that onErrorBeforeClose callback was invoked by a stream that had already been closed! |
@gavv - I am trying to reproduce this error on P using a BT headset. Do you think reproducing the crash requires opening both INPUT and OUTPUT streams.
Were you using AudioManager.registerAudioDeviceCallback() for that? |
@gavv - When are you calling startBluetoothSCO() in relation to opening the stream? How are you choosing the SCO device? Are you letting Oboe/AAudio select the device or do you look for a SCO device in the AudioDeviceInfo array? |
It seems yes, at least in the last (third) crash.
So far I tested it only on one phone (Android 9, Xiaomi Redmi Note 7) on which I'm doing initial development of the app. If it helps, I can also test it on Android 8 if it helps. For now I don't have access to other devices.
I'm always using two streams, but I'm not sure if it's important. I can test it with one stream if it helps.
Yes.
I call startBluetoothSco, wait SCO_AUDIO_STATE_CONNECTED event, then create and start oboe streams. When I receive device removal event, I tried several approaches:
I get IDs from AudioDeviceInfo and pass them to oboe. You can see those IDs in logs above. |
Thanks for looking into this! I'll be offline (mostly) a few days. I can provide more info or prepare a reproducer on next week. |
Thanks! Those details are very helpful. I am now seeing a crash that may be related. Also please, what are you using for sampleRate, channelCount and inputPreset? |
Note that setBluetoothScoOn() "should only be used by applications that replace the platform-wide management of audio settings or the main telephony application." Is that true for your app? https://developer.android.com/reference/android/media/AudioManager#setBluetoothScoOn(boolean) |
I have been doing a lot of testing on this and found that if a legacy AAudio stream is not stopped then a data callback, and possibly an error callback, can occur after the stream has been closed and deleted. This is more likely to occur when a routing change is occurring. If an app is closing and restarting streams from two threads, one Java and one triggered by an error callback, then a race condition in the application could cause this error. Consider the following sequence: JNI Thread , Error Callback
The stream is running when the close is called, causing the crash. To avoid this problem, the app can make sure the stop and close occur together by putting them all under a single lock:
The app may wish to do all stream management under a single thread. The Java code and the error callback can send messages to that thread so that everything is coherent. I will provide an example of this. #960 In Oboe, we can force a stop from the close() method under a lock. |
Sorry for delay, I'm back.
I'm using device-native parameters:
Oops, setBluetoothScoOn() is unneeded here. I removed it and nothing changed. I only called |
Hmm. Here is how I was closing streams in, for example, third test above:
All manipulations with oboe streams, except data callback, are performed under a mutex. The mutex protects both streams. I always stop and close both streams and only then open two new streams. Also, it seems that the crash happens before creating and opening new streams. The stream is stopped and closed, then the error callback is called on closed stream, and I didn't try to create a new one yet. |
@philburk I've prepared a minimal app reproducing the problem. Although it's rather small, my customer is more comfortable with not making it fully public, so I've pushed it to a private repo on github and sent you an invite. Hope this is OK. The app has two buttons, start and stop. Start button selects device IDs, starts BT SCO, and creates two oboe streams. Stop button closes and stops oboe streams and stops BT SCO. The app never destroys oboe streams. This memory leak is intentional. It allows us to print a log message when callback is called after close, instead of crashing. The app still crashes, but not in our code. Also, the app monitors device manager events and automatically invokes start when a BT SCO device appears and stop when it disappears. Actually, to reproduce the problem, you don't even need to press start and stop buttons manually. They are needed just for convenience. I can reproduce two variations of the problem using this example app:
This github gist provides the logs for both issues: https://gist.github.com/gavv/5eda6ae760c2eb979c34efb0e64eb272 For callback-after-close issue, search for these lines:
Here is how to reproduce the problem:
Eventually, the app will crash after step 3 and you'll see a crash and possibly also callback-after-close message in logs. When I keep the call to To build and deploy the app, you can just run The app comes with clang UB sanitizer enabled. I'm using the debug version of the APK. |
Thank you! Being able to reproduce the problem is very important. I have found the invitation and am looking at the repo. I really appreciate this. |
Thanks gavv. I am able to reproduce the crash consistently with your test program when running P. I could only reproduce this when using Bluetooth I then tried calling usleep(N*1000) before the stream is closed. So it is a pretty tight race. It seems a short delay right before the close is enough to prevent the race. Five msec works OK but ten msec should be a safer value. |
Thanks! So, presumably, this is indeed a race in AAudio? And we don't know whether it's fixed or just not reproducing on master, right? Should we report it? I'm not familiar with the source code, but after quick look, I didn't find a place where AudioStreamLegacy waits until onDeviceUpdate is exited and unregistered, before closing the stream. |
Right. That is the important question. I am looking at old bugs and change logs for this.
It is unregistered in AudioStreamRecord::release_l(). That is only for the error callback. |
See PR #970 |
Related crash on Android 10 (on same device after update): https://gist.github.com/gavv/b0f21f0137271534e072603a1630ce02 Also in onAudioDeviceUpdate(), this time with sanitizers disabled. Happened after switching BT off/on. Patch from #970 was NOT applied. |
Thanks again for this report. We have identified several sources of late callbacks. One is that AudioTrack only joins it threads in its destructor. We have addressed these problems in R and S. |
Is this something app is expected to do prior to calling close()? That is, prior to calling close(), are we expected to call stop() and sleep for a few milliseconds? We thought oboe was doing this already. |
Great, thanks!!
As far as I understand, this is already addressed by oboe in #967. @philburk Right? |
In Oboe 1.5.0, in the close() method we issue a stop and also pause for a few msec. |
b/161914201 has been fixed with three CLs in Android R. |
Android version(s): 9
Android device(s): Xiaomi Redmi Note 7
Oboe version: 1.4.2
Hi,
First of all, sorry if I'm reporting this to a wrong tracker. I'm still not 100% sure if this is Oboe bug, AAudio bug, or bug in my own code.
My app uses Oboe with BT SCO headset (AirPods) and creates two oboe streams (input and output). It also subscribes AudioManager events (from Java SDK), and when device disappears, it calls
stream->stop()
, thenstream->close()
, and then destroysstream
; first for one stream, and then for another.(I'm using shared stream with data and error callbacks).
At the same time, concurrently with the second stream close call, onAudioDeviceUpdate() callback is invoked somewhere in background. Then close() returns and I destroy the stream. And then AudioStreamLegacy segfaults in background. Before the segfault, clang UB sanitizer reports integrity check failure.
It seems to me that close() doesn't wait until onAudioDeviceUpdate() is finished, or onAudioDeviceUpdate() doesn't check that the stream is already closed, or it fails if the stream is being closed during its invocation, or it invokes something (error callback?) that is not being waited by close(), etc.
Here are the logs:
Relevant lines are:
And here is the backtrace:
My own code looks sane to me: I just call close() and expect that it will wait everything that should be waited, and when it returns, I destroy the stream.
If my code is correct, I'm not sure whether it's related to Oboe or AAudio. Is there a chance that Oboe close() implementation is incomplete and doesn't wait for something?
Currently the reproducer is a rather large closed-source app. If my expectations to close() are correct and you think I'm using it right, and this may be related to Oboe, I can try to create a stand-alone reproducer. Please let me know if this makes sense.
The text was updated successfully, but these errors were encountered: