Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many OutOfMemory reports happen with plenty of memory #2399

Closed
eric opened this issue Nov 17, 2022 · 10 comments
Closed

Many OutOfMemory reports happen with plenty of memory #2399

eric opened this issue Nov 17, 2022 · 10 comments

Comments

@eric
Copy link

eric commented Nov 17, 2022

Platform

tvOS

Installed

CocoaPods

Version

7.31.1

Steps to Reproduce

  1. Wait for an OutOfMemory alert
  2. Observe the breadcrums that report memory usage

Expected Result

I expect that OutOfMemory alerts are only for OutOfMemory issues.

Actual Result

We report used and available memory for all breadcrumb entries:

  extras[@"memory.used"] = [NSString stringWithFormat:@"%0.1fMB", LTCurrentMemoryUsed() / 1024.0 / 1024.0];
  long memoryFree = LTCurrentMemoryFree();
  if (memoryFree > 0) {
    extras[@"memory.free"] = [NSString stringWithFormat:@"%0.1fMB", memoryFree / 1024.0 / 1024.0];
  }

and when I look at recent OutOfMemory crashes, I see results like this:

{
memory.free: 1865.9MB, 
memory.used: 232.1MB
}

or

{
memory.free: 2069.9MB, 
memory.used: 28.1MB
}

right before the exception.

It is possible that I have some run-away memory leak that happens right at that moment, but it seems suspicious that the free memory is so high.

@philipphofmann
Copy link
Member

That's why we added breadcrumbs for OOMs with #2347. Finally, we can better understand what happens before OOMs.

What exactly do you mean by

Observe the breadcrums that report memory usage

and by?

I expect that OutOfMemory alerts are only for OutOfMemory issues.

It is possible that I have some run-away memory leak that happens right at that moment, but it seems suspicious that the free memory is so high.

It doesn't have to be a memory leak. If your application tries to allocate plenty of memory, we also detect it as an OOM. I think the last breadcrumb is a bit confusing.

We actually don't know when the OOM happened. It could be 17:43:02 or also a couple of seconds after that. Maybe we should remove the exception breadcrumb, as this could raise false expectations.
Screen Shot 2022-11-18 at 08 59 51

Instead of setting the timestamp #2394 we should drop it I think, @kevinrenskers.

@philipphofmann
Copy link
Member

Can you think of anything your app does that could lead to a sudden termination by the watchdog, @eric?

@eric
Copy link
Author

eric commented Nov 18, 2022

Not really, we've never fully understood why we have so many of these "OutOfMemory" reports from Sentry and haven't known how to go about diagnosing them further.

@philipphofmann
Copy link
Member

I see. @eric, could please reply to my questions above?

@eric
Copy link
Author

eric commented Nov 18, 2022

What exactly do you mean by

Observe the breadcrums that report memory usage

We have the logging we use in the app also create breadcrumb entries. When we submit those entries, we also add memory usage as additional data:

- (void)log:(NSString *)message {
  SentryBreadcrumb *crumb = [[SentryBreadcrumb alloc] initWithLevel:kSentryLevelInfo category:@"log"];
  crumb.message = message;

  NSMutableDictionary *data = [NSMutableDictionary dictionary];
  data[@"memory.used"] = [NSString stringWithFormat:@"%0.1fMB", LTCurrentMemoryUsed() / 1024.0 / 1024.0];
  long memoryFree = LTCurrentMemoryFree();
  if (memoryFree > 0) {
    data[@"memory.free"] = [NSString stringWithFormat:@"%0.1fMB", memoryFree / 1024.0 / 1024.0];
  }
  [crumb setData:data];

  dispatch_async(self.logQueue, ^{
    [SentrySDK addBreadcrumb:crumb];
  });
}

This allows me to look at the breadcrumbs that happened just before the crash and get a feeling for how much memory was being used by the process during the time of the breadcrumbs and of the one right before the crash.

CleanShot 2022-11-18 at 00 56 16@2x

and by?

I expect that OutOfMemory alerts are only for OutOfMemory issues.

I think I meant to say "event" instead of "alert". Right now it seems that OutOfMemory really means "Unusual Application Termination That We Didn't Get Any Signals About That Could Be Memory Related But It Could Be Something Else".

It would be nice if the exceptions that were labeled OutOfMemory actually had additional signals that were used to decide the likelihood that the issue was actually related to running out of memory (like based on how much memory was used right before the app crashed) instead of just being a catch-all.

@philipphofmann
Copy link
Member

"Unusual Application Termination That We Didn't Get Any Signals About That Could Be Memory Related But It Could Be Something Else"

Yes indeed. Do you think we should be clearer about that in the docs? https://docs.sentry.io/platforms/apple/configuration/out-of-memory/

It would be nice if the exceptions that were labeled OutOfMemory actually had additional signals that were used to decide the likelihood that the issue was actually related to running out of memory (like based on how much memory was used right before the app crashed) instead of just being a catch-all.

That's a great idea. We already got plenty of feedback from various customers that OOM doesn't really work for them. You can also disable it with options.enableOutOfMemoryTracking = false. On the other hand, we also got positive feedback.

We are currently working on a MetricKit integration, which doesn't work for tvOS, I know. I'm currently trying to determine if the MXDiagnosticPayload also reports OOMs. If it does, I would like to replace our OOM logic with MetricKit. I'm unsure yet, what we would do for tvOS, but maybe it's better to remove the feature there if it's not working reliably.

@eric
Copy link
Author

eric commented Nov 22, 2022

Yes indeed. Do you think we should be clearer about that in the docs?

No, I think the name of the error should be renamed to something that is more generic. What is the generic description of what these things are? "OS Forced Termination"? "OS Termination"? "Forced Termination"?

That's a great idea. We already got plenty of feedback from various customers that OOM doesn't really work for them. You can also disable it with options.enableOutOfMemoryTracking = false. On the other hand, we also got positive feedback.

I don't love the idea of disabling it, because they appear to be valid crashes, just not because of that reason. Disabling it means those crashes are invisible to the metrics.

I'm unsure yet, what we would do for tvOS, but maybe it's better to remove the feature there if it's not working reliably.

I think regularly recording memory usage locally on disk and using that to determine the likelihood that a crash was due to a memory issue would be a good step in the right direction. If that isn't something you're willing to do, at least providing hooks for us to control the name and description of the report when these crashes are detected would be nice, so I could have a 1s timer that writes the current memory stats and use that to inform what gets reported.

@philipphofmann
Copy link
Member

I think you are referring to watchdog terminations. For these, we can't create a crash report because the OS kills the process. We can't access the OS crash report either. I agree instead of calling all of these OOM crashes, we could discuss of renaming them to "OS Forced Termination", "OS Termination", "Forced Termination", or "Watchdog termination", or something.

I think regularly recording memory usage locally on disk and using that to determine the likelihood that a crash was due to a memory issue would be a good step in the right direction.

We could periodically collect more context for such crashes, indeed.

As 8.0.0 is around the corner, we will discuss if we should rename the OOM errors to what you proposed.

@philipphofmann philipphofmann added this to the 8.0.0 milestone Nov 24, 2022
@philipphofmann
Copy link
Member

We agreed that WatchdogTerminations should be a better name than OOM. After this change, we need to adapt the docs to explain which types of errors WatchdogTerminations could be.

@kahest
Copy link
Member

kahest commented Dec 7, 2022

We will rename OOM reports to Watchdog terminations to better reflect the possible root causes, as described above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

3 participants