RUMM-610 Hotfix memory usage when doing intensive logging #185

ncreated · 2020-07-16T10:13:56Z

What and why?

🧪 As reported in #178, the data upload memory leaks fixed in #181 are not solving out-of-memory issue on very intensive logging. This PR adds necessary optimisation to keep the allocations graph flat.

How?

I added this benchmarking code in Example app to stress-test logging:

internal class SendLogsFixtureViewController: UIViewController {
   var objcLogger = DDLogger.builder().build()
   let queue1 = DispatchQueue(label: "intensive-logging-1")
   let queue2 = DispatchQueue(label: "intensive-logging-2")
   let queue3 = DispatchQueue(label: "intensive-logging-3")

   override func viewDidLoad() {
      super.viewDidLoad()

      keepLogging(delay: 0.01, on: queue1)
      keepLogging(delay: 0.02, on: queue2)
      keepLogging(delay: 0.03, on: queue3)
   }

   func keepLogging(delay: TimeInterval, on queue: DispatchQueue) {
      queue.asyncAfter(deadline: .now() + delay) { [weak self] in
         self?.objcLogger.info(createBigMessage())
         self?.keepLogging(delay: 0.01, on: queue)
      }
   }
}

private func createBigMessage() -> String {
    let seed = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 "
    return (0..<1_024).map({ _ in seed }).joined(separator: "")
}

And noticed crazy high number of system allocations coming from try purgeFilesDirectoryIfNeeded() in FilesOrchestrator. This method is iterating through the list of files in data directory, doing a lot of OS-internal _FileCache and _NSFastEnumerationEnumerator allocations (out of our control).

Above benchmark results with this allocations graph, leading to out-of-memory crash when iOS process exceeds 1.8GB limit:

To mitigate this impact, I tuned the FilesOrchestrator to call try purgeFilesDirectoryIfNeeded() only when necessary - if it knows that a new file will be created (vs. each time when asking for a writable file). This keeps the memory graph flat for the same benchmark:

IMHO, by doing this, we give the OS enough time to reuse _FileCache things in a memory-performant way.

Review checklist

~~- [ ] Feature or bugfix MUST have appropriate tests (unit, integration)~~

Make sure each commit and the PR mention the Issue number or JIRA reference

buranmert

when you wait idle for a while after peaking at ~1.5Gb memory, it goes down to normal, correct?

and does that mean that now if we log intensively, SDK may exceed max file size limit with one single file?

ncreated · 2020-07-16T16:42:54Z

@buranmert

when you wait idle for a while after peaking at ~1.5Gb memory, it goes down to normal, correct?

Good question. I did a test - if we stress the _FileCache only a bit, it goes back to normal - however, I'd expect a gradual slope, but instead it makes a significant dump:

When we stress it too much (~>1GB) it results with random crashes and:

2020-07-16 18:32:02.729981+0200 Example[57917:40426048] [ServicesDaemonManager] interruptionHandler is called. -[FontServicesDaemonManager connection]_block_invoke
Example(57917,0x16d61f000) malloc: can't allocate region securely
Example(57917,0x16d61f000) malloc: *** set a breakpoint in malloc_error_break to debug
Example(57917,0x16d61f000) malloc: can't allocate region securely
Example(57917,0x16d61f000) malloc: *** set a breakpoint in malloc_error_break to debug
Example(57917,0x16d61f000) malloc: can't allocate region securely
Example(57917,0x16d61f000) malloc: *** set a breakpoint in malloc_error_break to debug
Example(57917,0x16d61f000) malloc: can't allocate region securely
Example(57917,0x16d61f000) malloc: *** set a breakpoint in malloc_error_break to debug

With the optimisation applied in this PR, all remains functional even a long while after excessive logging ends.

and does that mean that now if we log intensively, SDK may exceed max file size limit with one single file?

SDK starts a new file if the recently used one exceeds 4MB, so it's fine 👌.

buranmert

good 👍 there is no memory leak as it goes back to normal after a while: i was afraid there was a deeper problem within filesystem 😅

hyling · 2020-07-17T10:46:58Z

@ncreated If it's not too much trouble would you mind describing how you tracked it down to try purgeFilesDirectoryIfNeeded() 🙏 ? I'm trying to get better at debugging these types of issues. Which Instruments views did you use? Did you just use the Allocations instrument? How did you connect the code in the background thread to your own code. Instruments doesn't show who started the thread. 🤔

ncreated · 2020-07-17T11:34:06Z

@hyling I used Allocations instrument. When running the benchmark snippet, you can see that _FileCache is one of the most popular and consuming allocation on the list. When going into details it gives the stack trace. For most of this allocations it is try purgeFilesDirectoryIfNeeded():

…en-doing-intensive-logging # Conflicts: # DatadogSDK.podspec # DatadogSDKObjc.podspec # Sources/Datadog/Datadog.swift

hyling · 2020-07-17T15:11:45Z

@hyling I used Allocations instrument. When running the benchmark snippet, you can see that _FileCache is one of the most popular and consuming allocation on the list. When going into details it gives the stack trace. For most of this allocations it is try purgeFilesDirectoryIfNeeded():

@ncreated Ah thanks for the explanation and the screenshot, they help me realize that debugging symbols for datadog didn't get loaded into Instruments. After I fixed that the Allocations instrument was much more helpful. 😄

ncreated self-assigned this Jul 16, 2020

ncreated marked this pull request as ready for review July 16, 2020 10:14

ncreated requested a review from a team as a code owner July 16, 2020 10:14

ncreated mentioned this pull request Jul 16, 2020

Extremely high memory usage resulting in process termination #178

Closed

nachoBonafonte approved these changes Jul 16, 2020

View reviewed changes

buranmert reviewed Jul 16, 2020

View reviewed changes

RUMM-610 Optimize memory usage when doing intensive logging

499b27d

ncreated force-pushed the ncreated/RUMM-610-optimize-memory-usage-when-doing-intensive-logging branch from ddf2620 to 499b27d Compare July 16, 2020 16:18

buranmert approved these changes Jul 16, 2020

View reviewed changes

ncreated added 2 commits July 17, 2020 14:44

RUMM-610 Bump the hotfix version

122bc42

Merge branch 'master' into ncreated/RUMM-610-optimize-memory-usage-wh…

b887984

…en-doing-intensive-logging # Conflicts: # DatadogSDK.podspec # DatadogSDKObjc.podspec # Sources/Datadog/Datadog.swift

ncreated merged commit 969c2d4 into master Jul 17, 2020

ncreated deleted the ncreated/RUMM-610-optimize-memory-usage-when-doing-intensive-logging branch July 17, 2020 15:25

ncreated mentioned this pull request Nov 10, 2023

RUM-1837 Update logic to send N batches sequentially in each cycle #1531

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RUMM-610 Hotfix memory usage when doing intensive logging #185

RUMM-610 Hotfix memory usage when doing intensive logging #185

ncreated commented Jul 16, 2020

buranmert left a comment

ncreated commented Jul 16, 2020

buranmert left a comment

hyling commented Jul 17, 2020

ncreated commented Jul 17, 2020

hyling commented Jul 17, 2020

RUMM-610 Hotfix memory usage when doing intensive logging #185

RUMM-610 Hotfix memory usage when doing intensive logging #185

Conversation

ncreated commented Jul 16, 2020

What and why?

How?

Review checklist

buranmert left a comment

Choose a reason for hiding this comment

ncreated commented Jul 16, 2020

buranmert left a comment

Choose a reason for hiding this comment

hyling commented Jul 17, 2020

ncreated commented Jul 17, 2020

hyling commented Jul 17, 2020