fix: Fix adblock memory leak #877

Joe-Degler · 2024-02-06T17:02:14Z

Fixed memory leaks caused by request resolve failure (e.G. due to adblocker). Thanks to community member @Joe-Degler for contributing this fix.

Overview

On our website ( playbook.com ), some users were noticing severe performance degredation when uploading big files. Memory usage grew, and cancelling of the upload did not resolve the issue, eventually causing in a crash of the upload or the website itself.

After investigation, we were able to pin down the issue to the New Relic Browser Agent. However, the issue only occured when adblock (in my case: uBlock Origin) was enabled.

We noticed that two URLs specifically were being blocked: https://js-agent.newrelic.com/nr-spa-1.245.0.min.js and https://bam.nr-data.net/.

(Note: Issue occured also with newest version of Browser Agent)

The issue itself is not actually related to uploading - this only made the symptoms much more obvious, due to the size of chunked ArrayBuffers. Even when using the website normally, a slow upwards creep of memory heap size happens.

The main problem is the buffer / backlog of the ee (event emitter?) holding on to every raw event, but in case of failure to load additional assets or submitting them it's never getting flushed. This results in the backlog growing indefinitely, slowly consuming the users memory. Upload makes this particularly noticable because the buffer (in this case, I think it was spa specifically) holds on to every ArrayBuffer reference.

We were able to work around the issue with some monkeypatching - it would be great to get these issues fixed upstream, instead.

We found 3 specific places that resulted in this problem:

Problem #1: send() method of harvest.js is not using the appropriate event listener for failing requests

Link to Change

The harvester attempts to submit data to https://bam.nr-data.net/.

The load event is not fired if the XHR is aborted client-side. This causes the callback to never be fired - in turn, never calling cbCallback (even with a 400+), which in turn again, never causes the abort here to be called.

Replacing this event with the loadend event seems appropriate. It fires even on an error, but returns an "failed" XHR object with status code 0.

I've adjusted the sent boolean to return false if the request has not been sent (eg, didn't leave the client). I'm not 100% sure on the implications here, and this change is irrelevant for the fix itself.

With this change, the callback is now successfully invoked for any successful, failed, or errored request.

Problem #2: The abort() function of the contextual-ee.js replaces the backlog object

Link to change

When abort() is called, it replaces globalInstance.backlog with a fresh new {} object. However, all "sub-features" (or emitters, I guess?) hold a direct reference to the original ee.backlog object. As they reference this object directly, two things happen:

Even when the main emitter is aborted, in some circumstances force was true, causing the emitter to continue trying to emit
Because the sub-emitter is using a direct reference to backlog (which has been stored on initialization), it accesses the "old" (already discarded by the main emitter) object indefinitely. On this object, the feature / group still exists, so it passes the if (bufferGroup) check and continues pushing events to the buffer. This also causes it to never be garbage collected, as references still exist, obviously.

Instead of replacing the main emitter with an empty object, instead iterate over its keys and delete all the existing groups. This clears the buffer, as expected.

Problem #3: Failure to import additional code does not drain buffers and does not unset handlers

Link to change

There's two distinct places I could find that try to initiate an additional import of code from https://js-agent.newrelic.com/ to import additional features. In both of these, if they fail, the corresponding buffers are not drained adequately, and keep emitting to the buffer.
The first case is in api.js, which, if it fails, only raises a warning that Downloading runtime APIs failed. No effort is made in draining and de-registering the api buffer/feature.

The second case is in instrument-base.js, in which it does start a drain in the exception handler for each drain that failed, but the drain is immediately short-circuiting because the feature is already registered in the registry for the agent.

I went with a "crude" fix here, still aligning with what I think is the spirit of the architecture, in the form of an force parameter, which forces the method to drain the group immediately.

All three of these fixes were necessary to banish the memory leak and make things work properly. They're separated neatly in their own commits.

Related Issue(s)

I found this unresolved issue from a year ago.

Testing

Replicating this is suprisingly simple. Install uBlock origin, or, alternatively, block both urls locally in your environments.
If you're testing locally, uBlock Origin will only block the egress traffic to https://bam.nr-data.net/, but allow loading more data through localhost:63342/build/nr-spa.min.js. Depending on what you're testing, you may have to block the import URL or not.

Create a simple index.html file similar to the github instructions, importing nr-loader-spa.min.js.

Evaluate newrelic.ee.backlog - you'll observe that it's filled with data. Depending on the activities you do on the website, you'll also see that the backlog keeps increasing.

If you've only blocked the data egress sink:

When applying the commit that fixes the harvest callback:
you'll notice that newrelic.ee.aborted == true, and that newrelic.ee.backlog == {}, but if you check out newrelic.initializedAgents[<foobar>].features[<barfoo>].ee.backlog, you'll see it's still referencing the old buffer.

Additionally, applying the commit that fixes abort:
You'll notice that now, the ee of the features of the initializedAgents are also empty, as the original newrelic.ee.backlog now got emptied, as opposed to replaced with an empty object.

If you've also blocked importing additional code from nr-spa.min.js:
Load the page, observe how newrelic.ee.backlog is filled with data. that may keep growing and isn't flushed.

Apply the commit that fixes the drain on import error.

Observe how newrelic.ee.backlog is now empty, and does not grow anymore.

…d as opposed to load

CLAassistant · 2024-02-06T17:02:20Z

All committers have signed the CLA.

patrickhousley · 2024-02-07T19:03:42Z

Thank you so much @Joe-Degler I am taking a look at this now.

codecov · 2024-02-07T19:41:24Z

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (0a3a980) 78.67% compared to head (de5ed00) 78.65%.
Report is 3 commits behind head on main.

Files	Patch %	Lines
src/loaders/api/api.js	0.00%	3 Missing ⚠️
src/features/page_view_event/aggregate/index.js	0.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #877      +/-   ##
==========================================
- Coverage   78.67%   78.65%   -0.03%     
==========================================
  Files         145      145              
  Lines        6519     6522       +3     
  Branches     1267     1267              
==========================================
+ Hits         5129     5130       +1     
- Misses       1181     1183       +2     
  Partials      209      209

Flag	Coverage Δ
unit-tests	`55.48% <63.63%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

patrickhousley · 2024-02-07T23:02:55Z

All unit and jil tests for all browsers are passing locally. One test failed in wdio but I think it's just flakiness.

[Safari iOS #16-28] Running: Safari on iOS
[Safari iOS #16-28] Session ID: 303df308a0a74ba59381d108bd8d0a83
[Safari iOS #16-28]
[Safari iOS #16-28] » /tests/specs/session-replay/ingest.e2e.js
[Safari iOS #16-28] Session Replay Ingest Behavior
[Safari iOS #16-28]    ✖ Should empty event buffer when sending (3 retries)
[Safari iOS #16-28]    ✓ Should stop recording if 429 response (1 retries)
[Safari iOS #16-28]
[Safari iOS #16-28] 1 passing (3m 22.9s)
[Safari iOS #16-28] 1 failing
[Safari iOS #16-28]
[Safari iOS #16-28] 1) Session Replay Ingest Behavior Should empty event buffer when sending
[Safari iOS #16-28] expect(received).toBeGreaterThan(expected)

Expected: > 0
Received:   0
[Safari iOS #16-28] Error: expect(received).toBeGreaterThan(expected)
[Safari iOS #16-28]
[Safari iOS #16-28] Expected: > 0
[Safari iOS #16-28] Received:   0
[Safari iOS #16-28]     at Context.<anonymous> (/Users/phousley/code/newrelic-browser-agent/tests/specs/session-replay/ingest.e2e.js:18:43)
[Safari iOS #16-28]     at processTicksAndRejections (node:internal/process/task_queues:95:5)
[Safari iOS #16-28]

Spec Files:	 1075 passed, 3 retries, 1 failed, 24 skipped, 1100 total (100% completed) in 01:01:47

I will run through the manual tests and work on building out e2e tests for the three specific issues tomorrow.

Joe-Degler · 2024-02-08T21:13:34Z

No Problem! Do you need me to do anything else?

Joe-Degler added 3 commits February 6, 2024 14:40

fix: Run harvest callback on aborted harvest requests by using loaden…

77accde

…d as opposed to load

fix: empty backlog on abort instead of replacing with an empty object

7dec931

fix: drain buffer forcefully when runtime api or feature failed to load

020c6f6

patrickhousley changed the title ~~Fix adblock memory leak~~ fix: Fix adblock memory leak Feb 7, 2024

adding e2e tests

de5ed00

patrickhousley approved these changes Feb 8, 2024

View reviewed changes

cwli24 merged commit 695415b into newrelic:main Feb 8, 2024
11 of 14 checks passed

newrelic-browser-agent-team mentioned this pull request Feb 8, 2024

chore(main): release 1.252.0 #880

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Fix adblock memory leak #877

fix: Fix adblock memory leak #877

Joe-Degler commented Feb 6, 2024 •

edited by patrickhousley

Loading

CLAassistant commented Feb 6, 2024 •

edited

Loading

patrickhousley commented Feb 7, 2024

codecov bot commented Feb 7, 2024 •

edited

Loading

patrickhousley commented Feb 7, 2024

Joe-Degler commented Feb 8, 2024

fix: Fix adblock memory leak #877

fix: Fix adblock memory leak #877

Conversation

Joe-Degler commented Feb 6, 2024 • edited by patrickhousley Loading

Fixed memory leaks caused by request resolve failure (e.G. due to adblocker). Thanks to community member @Joe-Degler for contributing this fix.

Overview

Related Issue(s)

Testing

CLAassistant commented Feb 6, 2024 • edited Loading

patrickhousley commented Feb 7, 2024

codecov bot commented Feb 7, 2024 • edited Loading

Codecov Report

patrickhousley commented Feb 7, 2024

Joe-Degler commented Feb 8, 2024

Joe-Degler commented Feb 6, 2024 •

edited by patrickhousley

Loading

CLAassistant commented Feb 6, 2024 •

edited

Loading

codecov bot commented Feb 7, 2024 •

edited

Loading