-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fix adblock memory leak #877
Conversation
Thank you so much @Joe-Degler I am taking a look at this now. |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #877 +/- ##
==========================================
- Coverage 78.67% 78.65% -0.03%
==========================================
Files 145 145
Lines 6519 6522 +3
Branches 1267 1267
==========================================
+ Hits 5129 5130 +1
- Misses 1181 1183 +2
Partials 209 209
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
All unit and jil tests for all browsers are passing locally. One test failed in wdio but I think it's just flakiness.
I will run through the manual tests and work on building out e2e tests for the three specific issues tomorrow. |
No Problem! Do you need me to do anything else? |
Fixed memory leaks caused by request resolve failure (e.G. due to adblocker). Thanks to community member @Joe-Degler for contributing this fix.
Overview
On our website ( playbook.com ), some users were noticing severe performance degredation when uploading big files. Memory usage grew, and cancelling of the upload did not resolve the issue, eventually causing in a crash of the upload or the website itself.
After investigation, we were able to pin down the issue to the New Relic Browser Agent. However, the issue only occured when adblock (in my case: uBlock Origin) was enabled.
We noticed that two URLs specifically were being blocked:
https://js-agent.newrelic.com/nr-spa-1.245.0.min.js
andhttps://bam.nr-data.net/
.(Note: Issue occured also with newest version of Browser Agent)
The issue itself is not actually related to uploading - this only made the symptoms much more obvious, due to the size of chunked ArrayBuffers. Even when using the website normally, a slow upwards creep of memory heap size happens.
The main problem is the
buffer
/backlog
of the ee (event emitter?) holding on to every raw event, but in case of failure to load additional assets or submitting them it's never getting flushed. This results in the backlog growing indefinitely, slowly consuming the users memory. Upload makes this particularly noticable because the buffer (in this case, I think it wasspa
specifically) holds on to every ArrayBuffer reference.We were able to work around the issue with some monkeypatching - it would be great to get these issues fixed upstream, instead.
We found 3 specific places that resulted in this problem:
Problem #1:
send()
method ofharvest.js
is not using the appropriate event listener for failing requestsLink to Change
The harvester attempts to submit data to
https://bam.nr-data.net/
.The
load
event is not fired if the XHR is aborted client-side. This causes the callback to never be fired - in turn, never callingcbCallback
(even with a 400+), which in turn again, never causes the abort here to be called.Replacing this event with the
loadend
event seems appropriate. It fires even on an error, but returns an "failed" XHR object with status code 0.I've adjusted the
sent
boolean to returnfalse
if the request has not been sent (eg, didn't leave the client). I'm not 100% sure on the implications here, and this change is irrelevant for the fix itself.With this change, the callback is now successfully invoked for any successful, failed, or errored request.
Problem #2: The
abort()
function of thecontextual-ee.js
replaces the backlog objectLink to change
When
abort()
is called, it replacesglobalInstance.backlog
with a fresh new{}
object. However, all "sub-features" (or emitters, I guess?) hold a direct reference to the originalee.backlog
object. As they reference this object directly, two things happen:force
was true, causing the emitter to continue trying to emitbacklog
(which has been stored on initialization), it accesses the "old" (already discarded by the main emitter) object indefinitely. On this object, the feature / group still exists, so it passes theif (bufferGroup)
check and continues pushing events to the buffer. This also causes it to never be garbage collected, as references still exist, obviously.Instead of replacing the main emitter with an empty object, instead iterate over its keys and delete all the existing groups. This clears the buffer, as expected.
Problem #3: Failure to import additional code does not drain buffers and does not unset handlers
Link to change
There's two distinct places I could find that try to initiate an additional import of code from
https://js-agent.newrelic.com/
to import additional features. In both of these, if they fail, the corresponding buffers are not drained adequately, and keep emitting to the buffer.The first case is in
api.js
, which, if it fails, only raises a warning thatDownloading runtime APIs failed
. No effort is made in draining and de-registering theapi
buffer/feature.The second case is in
instrument-base.js
, in which it does start a drain in the exception handler for each drain that failed, but the drain is immediately short-circuiting because the feature is already registered in the registry for the agent.I went with a "crude" fix here, still aligning with what I think is the spirit of the architecture, in the form of an
force
parameter, which forces the method to drain the group immediately.All three of these fixes were necessary to banish the memory leak and make things work properly. They're separated neatly in their own commits.
Related Issue(s)
I found this unresolved issue from a year ago.
Testing
Replicating this is suprisingly simple. Install
uBlock origin
, or, alternatively, block both urls locally in your environments.If you're testing locally,
uBlock Origin
will only block the egress traffic tohttps://bam.nr-data.net/
, but allow loading more data throughlocalhost:63342/build/nr-spa.min.js
. Depending on what you're testing, you may have to block the import URL or not.Create a simple index.html file similar to the github instructions, importing
nr-loader-spa.min.js
.Evaluate
newrelic.ee.backlog
- you'll observe that it's filled with data. Depending on the activities you do on the website, you'll also see that the backlog keeps increasing.If you've only blocked the data egress sink:
When applying the commit that fixes the harvest callback:
you'll notice that
newrelic.ee.aborted == true
, and thatnewrelic.ee.backlog == {}
, but if you check outnewrelic.initializedAgents[<foobar>].features[<barfoo>].ee.backlog
, you'll see it's still referencing the old buffer.Additionally, applying the commit that fixes abort:
You'll notice that now, the ee of the features of the initializedAgents are also empty, as the original
newrelic.ee.backlog
now got emptied, as opposed to replaced with an empty object.If you've also blocked importing additional code from
nr-spa.min.js
:Load the page, observe how
newrelic.ee.backlog
is filled with data. that may keep growing and isn't flushed.Apply the commit that fixes the drain on import error.
Observe how
newrelic.ee.backlog
is now empty, and does not grow anymore.