Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No implementation found for method addError on channel datadog_sdk_flutter.rum #596

Closed
androidmitry opened this issue Apr 23, 2024 · 38 comments
Labels
awaiting response Waiting for response from the reporter of the issue crash Crashes caused by the SDK

Comments

@androidmitry
Copy link

androidmitry commented Apr 23, 2024

Stack trace

Fatal Exception: io.flutter.plugins.firebase.crashlytics.FlutterError: MissingPluginException(No implementation found for method addError on channel datadog_sdk_flutter.rum)
at MethodChannel._invokeMethod(platform_channel.dart:332)
at ._willHandleError(helpers.dart:14)

Reproduction steps

Add the datadog_flutter_plugin package, release to App Store

Volume

0,0021 (1-2 users per day)

Affected SDK versions

2.4.0

Does the crash manifest in the latest SDK version?

Yes

Flutter Version

3.19.5

Setup Type

Flutter Application

Device Information

OS - Android
per version:
Android 12 - 88%
Android 10 - 7%
Android 14 - 3%
Android 13 - 2%

per device:
Samsung - 93%
Oneplus - 7%

Other relevant information

Device states: background 60%

@androidmitry androidmitry added the crash Crashes caused by the SDK label Apr 23, 2024
@fuzzybinary
Copy link
Member

Hi @androidmitry ,

Thanks for the report, I'll look into this as soon as I can.

Can you give me anymore information about a possible reproduction? Have you been able to reproduce locally at all? Is there anything strange about your setup that might be disconnecting the MethodChannel from our plugin? We tend to wrap every call we make to try to avoid crashes, so I'm very concerned that this is causing a crash....

@androidmitry
Copy link
Author

Hi @fuzzybinary , unfortunately thats all information I have so far. I wasn't able to reproduce it. We had some custom platform code, but it was removed. Whats interesting is that number of reports is decreasing. I will update the issue if crash goes away.

@fuzzybinary
Copy link
Member

@androidmitry Yeah if you can keep me posted I would appreciate it.

I'm seen issues in the past where the method channel can get disconnected from the plugin, but I've fixed those, and most threw errors in the native layer, not Dart.

@nirmal0707
Copy link

This issue is also occurring in version 2.1.0, and we have encountered the MissingPluginException from the Android channels datadog_sdk_flutter.rum and datadog_sdk_flutter.logs in the production release. Due to consecutive RUM events, the error count is excessively high. Below are some error messages we've received:

  1. MissingPluginException(No implementation found for method addError on channel datadog_sdk_flutter.rum)
  2. MissingPluginException(No implementation found for method createLogger on channel datadog_sdk_flutter.logs)
  3. MissingPluginException(No implementation found for method stopView on channel datadog_sdk_flutter.rum)

@fuzzybinary
Copy link
Member

Hi @nirmal0707,

I'm actively investigating this, but I haven't had much reproducing. Do you happen to have any steps to reproduce, or anything you can tell me about your app before / after you started seeing the errors?

@nirmal0707
Copy link

nirmal0707 commented May 10, 2024

Hi @fuzzybinary ,

This issue was not reproducible but began occurring when we migrated our codebase to Flutter 3.16.4, three months ago. Previously, we were using version 1.5.1, and the Flutter upgrade required us to move the package version to 2.1.0, resulting in this issue arising for some users in production.

@fuzzybinary
Copy link
Member

Alright, thanks @nirmal0707, That may help me track down the issue.

@fuzzybinary
Copy link
Member

Hi folks -- a few questions for everyone to see if I can try to diagnose this:

  • Is anyone using background tasks or foreground services, or the flutter_background_service?
  • Are you using push notifications or a push notification service like firebase_cloud_messaging? Do these errors tend to spike immediately after a push notification is sent out?
  • Is anyone using Flutter in an add to app scenario, or using attachToExisting in the SDK?
  • Does GeneratedPluginRegistrant.java enclose all the plugins in a try/catch block?
  • Do the MissingPluginException errors correlate with any other errors around the same time?

Sorry this is taking so long but I am having a really hard time reproducing, even when forcing certain error states, and. comparing with Crashlytics, we perform the registration and de-registration of our method channels the same way they do, so I'm not sure how or why they'd catch the errors and we don't.

@fuzzybinary
Copy link
Member

Another question as I continue to investigate -- Does anyone have any customizations of their FlutterActivity? Overriding onCreate, configureFlutterEngine, onDestroy or any other methods?

@androidmitry
Copy link
Author

For us crash reports started coming when we upgraded flutter from 3.16.9 to 3.19.5

Is anyone using background tasks or foreground services

We have foreground service but we don't use flutter_background_service package. Also according to breadcrumb events attached to crash it usually happens in foreground.

Are you using push notifications or a push notification service like firebase_cloud_messaging ? Do these errors tend to spike immediately after a push notification is sent out ?

Yes. No.

Is anyone using Flutter in an add to app scenario, or using attachToExisting in the SDK?

No

Does GeneratedPluginRegistrant.java enclose all the plugins in a try/catch block?

Yes

Do the MissingPluginException errors correlate with any other errors around the same time?

Checked several users and no other issues were reported around same time

Does anyone have any customizations of their FlutterActivity?

We do, I will double check them.

@feinstein
Copy link

feinstein commented May 20, 2024

My error message is a bit different MissingPluginException(No implementation found for method reportLongTask on channel datadog_sdk_flutter.rum)

These are my Sentry logs:

image

Then a bunch of:

image

And then:

image

Maybe you are not handling the destroyed lifecycle correctly? Or another plugin is interfering?

@fuzzybinary
Copy link
Member

Hi @feinstein, thanks for the additional information. All of the MissingPluginException issues are related, regardless of the method channel named and the method recorded, so any additional info is helpful.

The FlutterJNI error is interesting, that wouldn't be us so I'm very curious what might cause that, and curious if they're related.

We actually don't handle activity lifecycle at all, instead relying on Flutter's onAttachedToEngine and onDetachedFromEngine, which is what makes this error so frustrating, as those should be triggered properly when Flutter itself starts and stops.

Have you been able to reproduce locally at all?

@feinstein
Copy link

AFAIK Flutter JNI is the Java interop for connecting the C++ Flutter engine to the Android app.

Maybe Flutter is not triggering the engine's life cycle correctly to your lib.

@androidmitry
Copy link
Author

We were not able to reproduce it locally. We made some tiny changes to our FlutterActivity, I will report if it helped.

@btrautmann
Copy link

@fuzzybinary just noting that we are still experiencing the issue mentioned in #552 (which I believe is the same issue being tracked here) despite removing the native cruft I referred to in my last comment on that issue. IIRC I am able to reproduce this in our application fairly consistently. If I have a sec today I'll play around and see if I can reproduce. According to another engineer on my team we're seeing ~249k instances of this issue per week. We've had to filter these issues out of our crash reporting to avoid going beyond our contracted threshold 🙃

@fuzzybinary
Copy link
Member

STR would would be ridiculously helpful. If I can reproduce I can likely get it fixed and out with the next version ASAP.

@maks-ucs
Copy link

@fuzzybinary Just chimining in again on @nirmal0707 behalf, looking at our Sentry error logs, we also see a large number of lifecycle events being reported in quick succession in the error events for this:

image

And the above screenshot is only about a quarter of the pause/resume breadcrumb events in that particular Sentry error event.

Not sure if thats relevant, but perhaps this rapid set of lifecycle events causes some sort of race condition in the Datadog plugins setup code?

@feinstein
Copy link

feinstein commented May 22, 2024

This looks weird, so many transitions in under 1 second.

What makes me exclude a Flutter error is that only the DD plugin is raising this exception.... but on a second thought, few packages would trigger a method channel call when the app is being destroyed

@fuzzybinary
Copy link
Member

Another question from research:

Is anyone suffering from this error still using runZonedGuarded over PlatformDispatcher.instance.onError? (If you are using Datadog.runApp we do not use runZonedGuarded)

I'm looking for commonalities here, since I cannot reproduce with any example I have, but all of my examples use PlatformDispatcher.

@androidmitry
Copy link
Author

We use runZonedGuarded, is it deprecated ? We set PlatformDispatcher.instance.onError as well

@fuzzybinary
Copy link
Member

PlatformDispatcher.instance.onError is preferred and the two do essentially the same thing.

I'm going to do more research but I'm curious if the new zone creation is occasionally bypassed by backgrounding / foregrounding.

@fuzzybinary
Copy link
Member

Tests on my side related to runZonedGuarded don't duplicate the issue unfortunately.

Next question -- is everyone experiencing this potentially using multiple Flutter engines or booting engines themselves for any reason? There is a potentially related Flutter issue if so. Doing a quick scan of the issue it's possible we might be able to fix this on the Datadog side, but knowing would help me focus efforts.

@maks-ucs
Copy link

Thanks for your continued efforts on this @fuzzybinary ! 👍

For our app we are not using multiple Flutter engines and we do use runZonedGuarded, though it seems thats likely not the source of the issue from your last comment.

@feinstein
Copy link

feinstein commented May 23, 2024

I am also using runZonedGuarded, I initialize Sentry, then DataDog. Here's how I initialize it:

Future<void> setupDatadog() async {
  final configuration = DatadogConfiguration(
    clientToken: 'mytoken1234',
    env: appFlavor ?? 'no-flavour',
    site: DatadogSite.us5,
    nativeCrashReportEnabled: true,
    loggingConfiguration: DatadogLoggingConfiguration(),
    rumConfiguration: DatadogRumConfiguration(
      applicationId: 'my-app-id-1234',
    ),
  );

  final originalOnError = FlutterError.onError;
  FlutterError.onError = (details) {
    DatadogSdk.instance.rum?.handleFlutterError(details);
    originalOnError?.call(details); // This allows me to not override other listeners, like Sentry.
  };
  final platformOriginalOnError = PlatformDispatcher.instance.onError;
  PlatformDispatcher.instance.onError = (e, st) {
    DatadogSdk.instance.rum?.addErrorInfo(
      e.toString(),
      RumErrorSource.source,
      stackTrace: st,
    );
    return platformOriginalOnError?.call(e, st) ?? false;
  };

  await DatadogSdk.instance.initialize(configuration, TrackingConsent.granted);
  DatadogSdk.instance.updateConfigurationInfo(LateConfigurationProperty.trackErrors, true);
}

That function is called inside a runZonedGuarded, after await SentryFlutter.init and WidgetsFlutterBinding.ensureInitialized();.

@fuzzybinary
Copy link
Member

Hi folks - we still cannot reproduce this issue unfortunately. My guess is that this is some sort of race condition on the platform channel during backgrounding, where we are attempting to send view or log events while the app is backgrounding on Android.

However, I will say we do know that even though Sentry / Crashlytics report this as a “Fatal” error, it does not result in the application terminating, and is silent to the user. I verified this by essentially “force disconnecting” the method channel during testing and seeing what the response is from Flutter. This means that users are not seeing a degraded app experience because of this issue.

This doesn’t mean we don’t take the issue seriously, and if anyone can provide us with reproduction steps that would be incredibly helpful.

@feinstein
Copy link

Maybe contact the flutter team and ask them what might be causing this?

@fuzzybinary
Copy link
Member

fuzzybinary commented Jun 4, 2024

I've gone through some of the less formal channels (Discord, for example), but I may raise a github issue and see if it gets more attention.

@androidmitry
Copy link
Author

All previous changes I made didn't help. We are planning a flutter sdk upgrade. I will post here if it helps.

@btrautmann
Copy link

@fuzzybinary I've been unable to give this attention due to some other pressing work, but I wanted to respond to:

Next question -- is everyone experiencing this potentially using multiple Flutter engines or booting engines themselves for any reason? There is a flutter/flutter#103483 if so. Doing a quick scan of the issue it's possible we might be able to fix this on the Datadog side, but knowing would help me focus efforts.

A coworker of mine was toying around with this and was able to confirm that there's a case where a user taps a deep link and in doing so a new Flutter engine gets created (I think because of some code we have on the native side, I doubt that this is default Flutter behavior). As a result, our main function is called again which calls the code that would initialize Datadog twice. My hunch (without really looking at the code on either side, I'm just leaving this comment between tasks) is that the move to a singleton on your end made this bug which was already occurring more obvious (because of all the errors we're seeing).

Obviously we have more triaging and likely some fixes to put in on our end, but I did want to (cautiously) confirm your hypothesis that the 2 engine thing may be one cause of the issue folks are seeing.

@fuzzybinary
Copy link
Member

Thanks @btrautmann, that's really good information to have. I'm not sure if all of these issues are related to multiple Flutter engines, but I feel like its possible there are situations I don't know about that could legitimately create a second Flutter engine.

I'll have to think about how we can support that situation, but knowing that I can create a fake situation that artificially creates multiple engines and test that my solution works.

I'll try to get a solution for you in the next few weeks.

@btrautmann
Copy link

@fuzzybinary a couple interesting things to note here that I hope help:

Probable cause seems to be the singleton migration: In both #596 (comment) and my issue #552 mention version 2.1.0 as the first version containing this issue. IIRC that was when the singleton was introduced.

but I feel like its possible there are situations I don't know about that could legitimately create a second Flutter engine.

The default launchMode for Android Flutter applications is singleTop. The docs here do mention several scenarios in which a new instance of the MainActivity would be created. Ours would, I think, fall under the scenario of a an Intent being created in a new task (that therefore does not contain an instance of our Activity), and I imagine there are a bunch of use cases where this could happen to other apps:

Screenshot 2024-06-14 at 10 21 30

IIRC, each Flutter Engine by default is scoped to the MainActivity so if you have 2 instances of that Activity you'd have 2 flutter engines. Sans a singleton, this in theory should work fine but a singleton within the same process (without a workaround to avoid double-initialization) would I think mean running into this issue.

As a disclaimer, this is all theoretical and it's been a while since I've really been in the weeds on any Android code so please take what I'm saying with a hefty grain of salt.

@feinstein
Copy link

Just to add a bit more information, my app has no custom native code from us, but we do have deeplinks, so you might be on something there...

@fuzzybinary
Copy link
Member

I do agree the singleton migration was likely the cause, though it solved other issues related to engine initialization / shutdown.

I have a potential fix in mind that I'm going to look into implementing this week, but I'll likely release it as a preview release to get some feedback before mainlining the changes. I'll post to this thread when the preview version is available.

fuzzybinary added a commit that referenced this issue Jun 24, 2024
Under certain circumstances, Flutter can create two FlutterEngines which each have their own method channels. If this happens, we could end up in a situation where RUM and Logs were not properly attached to a MethodChannel, resulting in error messages and lost calls.

We're removing the `DatadogRumPlugin` and `DatadogLogsPlugin` singletons and replacing them with instances to avoid this.  Both plugins will attach to existing Datadog Logs and RUM instances during initialization if they have already been inititalized.

However, this breaks our current mapper implementation. Since Datadog expects the mapper during initialization, and holds it during the life of the application, we have to move the mappers to a companion object of the `*Plugin` classes to avoid issues with multiple or disconnected `MethodChannels`.

This potentially also helps us with multiple isolate tracking, but is just the first step of that.

refs: #596 #580 RUM-491 RUM-4438
@fuzzybinary
Copy link
Member

The newest version (2.6.0) no longer uses singletons to manage channel connections. Can some folks upgrade and see if this solves the issue?

@fuzzybinary fuzzybinary added the awaiting response Waiting for response from the reporter of the issue label Jul 1, 2024
@btrautmann
Copy link

Thanks @fuzzybinary. I put up a PR today to bump to 2.6.0 and will report back based on our Sentry numbers :) It should be fairly obvious whether this resolved the issue.

@androidmitry
Copy link
Author

@fuzzybinary I no longer see the issue a week after releasing an app with the new sdk . Thank you!

@fuzzybinary
Copy link
Member

Excellent! Thanks so much for letting me know. I'm going to close this but if anyone sees this issue again please reach out.

@btrautmann
Copy link

Confirming that we're seeing the same, a decrease in instances of this issue. Thanks for the work on this @fuzzybinary!

Screenshot 2024-09-09 at 09 59 36

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting response Waiting for response from the reporter of the issue crash Crashes caused by the SDK
Projects
None yet
Development

No branches or pull requests

6 participants