-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] RegExp uses undue amount of memory on Chromium-based browsers #3193
Comments
First step: the benchmark was revised to fix what I saw as flaws in it: |
Just going to throw this out this, and this may be more work than you're willing to do and/or use too much memory in-and-of itself, but maybe using one of these regex-trie libraries to generate just a single regex for hostname matching will work? I've only used a similar type of library in perl so I'm not sure how well either performs. |
All of the regexp seen in screenshot in opening comment are from FilterOriginHitSet objects which purpose is to implement the So by the look of the heap snapshots, HNTrie requires 10% of the memory required by RegExp, for the same underlying I added HNTrie to the benchmark. The creation time is slower when using HNTrie, but this is not a reason to not use it:
On my side, with Chromium, results are improved for small set. For medium and large sets, there is a small performance decrease observed relative to using regexp. The difference is not enough to worry given the gain in memory efficiency, and also to keep in mind that regexps may incur other costs not measured by the benchmark. For example, there is no memory churning with With Firefox (and Firefox for Android), there is performance improvement for all cases -- Firefox deals well with optimizing javascript code dealing with TypedArray. |
In commit bacf502, I refactored how hostnames as specified in the
domain=
option in a network static filter was implemented.As a result of the set-vs-regexp.html benchmark, I decided to use regexp to quickly lookup whether a hostname is part of a set of hostnames as specified in a
domain=
option.However, as revealed by the "Take heap snapshot" memory tool in Chromium, the amount of memory used by regexp instances on Chromium-based browsers is quite surprising. RegExp instances are internally lazily allocated in Chromium, meaning that internally memory is allocated only when the method
exec()
is called on a RegExp instance.However, as shown in the following screenshot, a lot of filters with the
domain=
option end up having their regexp executed earlier than expected. The heap snapshot was taken after launching uBO and visiting only the links on the front page ofhttps://news.ycombinator.com/news
:The top RegExp by memory use comes from the filter
$script,third-party,domain=123videos.tv|171gifs.com|1proxy.de|...
in EasyList. Such filter will always end up being executed because if applies to any network request of type script. The number of distinct hostnames in thedomain=
option of that specific filter is 732.As seen in the screenshot, even with a minimalist browsing session, all these RegExp instances add up to a good amount of memory. Pretty much all these memory-expensive RegExps are related to the
domain=
option in network static filtering.Even a small EasyList filter such as
|https://$script,third-party,xmlhttprequest,domain=candyreader.com|likesblog.com|projectfreetv.at|projectfreetv.sc|projectfreetv.us|projectwatchseries.com|shupebrothers.com|watchseriesonline.info
-- which also always end up executing -- will have a memory footprint of 6,880 bytes to represent just the eight distinct hostnames specified in its 144-character longdomain=
option.As shown in the benchmark, RegExp are reportedly quite faster than using Set when it comes to lookup whether a specific hostname is part of the set or not.
This issue is to document and address this
domain=
-related RegExp memory issue.The text was updated successfully, but these errors were encountered: