-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent ANR during SDK initialization #709
Prevent ANR during SDK initialization #709
Conversation
FYI we are working on getting the CLA signed. |
6f1e90f
to
52018e3
Compare
@breedx-splk CLA signed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ansman for taking this on, I think this is going to be really helpful...and there are bound to be a few other areas that are problematic at startup, so I appreciate you both shining some light on this and for going the distance in submitting this PR.
core/src/main/java/io/opentelemetry/android/export/BufferedDelegatingExporter.kt
Outdated
Show resolved
Hide resolved
core/src/main/java/io/opentelemetry/android/export/BufferedDelegatingExporter.kt
Outdated
Show resolved
Hide resolved
core/src/main/java/io/opentelemetry/android/export/BufferedDelegatingExporter.kt
Outdated
Show resolved
Hide resolved
core/src/main/java/io/opentelemetry/android/export/BufferedDelegatingExporter.kt
Outdated
Show resolved
Hide resolved
core/src/main/java/io/opentelemetry/android/OpenTelemetryRumBuilder.java
Show resolved
Hide resolved
core/src/main/java/io/opentelemetry/android/export/BufferDelegatingLogExporter.kt
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking the time to contribute 🙏 I like the overall idea. I left some questions and suggestions, so please have a look when you get a chance.
core/src/main/java/io/opentelemetry/android/export/BufferedDelegatingExporter.kt
Outdated
Show resolved
Hide resolved
core/src/main/java/io/opentelemetry/android/OpenTelemetryRumBuilder.java
Outdated
Show resolved
Hide resolved
core/src/main/java/io/opentelemetry/android/OpenTelemetryRumBuilder.java
Show resolved
Hide resolved
core/src/main/java/io/opentelemetry/android/export/BufferedDelegatingExporter.kt
Outdated
Show resolved
Hide resolved
Description When initializing the OpenTelemetry Android SDK with disk buffering enabled, we discovered that synchronous disk space checks were causing ANRs in production. These checks occur during the creation of disk buffering exporters, specifically in `DiskManager.getMaxFolderSize()`, which makes blocking IPC calls through `StorageManager.getAllocatableBytes()` on the main thread. The issue manifests in the following ANR stacktrace: ``` android.os.BinderProxy.transact (BinderProxy.java:662) android.os.storage.IStorageManager$Stub$Proxy.getAllocatableBytes (IStorageManager.java:2837) android.os.storage.StorageManager.getAllocatableBytes (StorageManager.java:2414) android.os.storage.StorageManager.getAllocatableBytes (StorageManager.java:2404) io.opentelemetry.android.internal.services.CacheStorage.getAvailableSpace (CacheStorage.java:66) io.opentelemetry.android.internal.services.CacheStorage.ensureCacheSpaceAvailable (CacheStorage.java:50) io.opentelemetry.android.internal.features.persistence.DiskManager.getMaxFolderSize (DiskManager.kt:58) io.opentelemetry.android.OpenTelemetryRumBuilder.createStorageConfiguration (OpenTelemetryRumBuilder.java:338) io.opentelemetry.android.OpenTelemetryRumBuilder.build (OpenTelemetryRumBuilder.java:286) ``` Our Solution To fix this we moved initialization to run on a background executor and buffer the data in memory until it completes. The process works like this: 1. Initialize the SDK with `BufferDelegatingExporter` instances that can immediately accept telemetry data. 2. Move exporter initialization off the main thread. 3. Once async initialization completes, flush buffered signals to initialized exporters and delegate all future signals. The primary goal of this solution is to be unobtrusive and prevent ANRs caused by initialization of disk exporters, while preventing signals from being dropped. Testing We have added unit tests to cover the buffering, delevation, and RUM building. We've also verified this with both disk enabled and disk disabled.
ff505fc
to
5c4f0e5
Compare
I've updated the PR with the feedback and everything except the logging has been addressed. In addition the returned result will not be success any longer but rather resolve when the delegate is set. |
core/src/main/java/io/opentelemetry/android/export/DelegatingExporter.kt
Show resolved
Hide resolved
core/src/main/java/io/opentelemetry/android/export/DelegatingExporter.kt
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ansman for sticking this out, I like how it's come together! Would still like to see the log warning emitted, and I'll wait to let @LikeTheSalad do the merge with his approval, but I think things are looking solid now. Thanks for the contribution.
core/src/main/java/io/opentelemetry/android/export/DelegatingExporter.kt
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution and patience, @ansman! 🙏 I'm looking forward to getting your PR merged, though it looks like there are some formatting issues that are making the build fail, if you could please update this PR after running ./gradlew spotlessApply
locally, I think that should do. I'm happy to proceed with the merge once the GH build passes :)
Description
When initializing the OpenTelemetry Android SDK with disk buffering enabled, we discovered that synchronous disk space checks were causing ANRs in production. These checks occur during the creation of disk buffering exporters, specifically in
DiskManager.getMaxFolderSize()
, which makes blocking IPC calls throughStorageManager.getAllocatableBytes()
on the main thread. The issue manifests in the following ANR stacktrace:Solution
To fix this we moved initialization to run on a background executor and buffer the data in memory until it completes.
The process works like this:
BufferDelegatingExporter
instances that can immediately accept telemetry data.The primary goal of this solution is to be unobtrusive and prevent ANRs caused by initialization of disk exporters, while preventing signals from being dropped.
Testing
We have added unit tests to cover the buffering, delevation, and RUM building. We've also verified this with both disk enabled and disk disabled.