Skip to content

Commit

Permalink
Leverage whotracks.me's huge dataset of URLs for benchmark purpose
Browse files Browse the repository at this point in the history
As seen at:
  https://whotracks.me/blog/adblockers_performance_study.html

The requests.json.gz file can be downloaded from:
  https://cdn.cliqz.com/adblocking/requests_top500.json.gz

Copy the file into ./tmp/requests.json.gz

If the file is present when you build uBO using `make-[target].sh` from
the shell, the resulting package will contain `./assets/requests.json`,
which will be looked-up by the method below to launch a benchmark
session.

From uBO's dev console, launch the benchmark:
  µBlock.staticNetFilteringEngine.benchmark();

The usual browser dev tools can be used to obtain useful profiling
data, i.e. start the profiler, call the benchmark method from the
console, then stop the profiler when it completes.

Keep in mind that the measurements at the blog post above where obtained
with ONLY EasyList. The CPU reportedly used was:
  https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i7-6600U+%40+2.60GHz&id=2608

Rename ./tmp/requests.json.gz to something else if you no longer want
./assets/requests.json in the build.
  • Loading branch information
gorhill committed Feb 15, 2019
1 parent e589e28 commit 5733439
Show file tree
Hide file tree
Showing 2 changed files with 72 additions and 9 deletions.
75 changes: 67 additions & 8 deletions src/js/static-net-filtering.js
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ const typeNameToTypeValue = {
'object': 3 << 4,
'object_subrequest': 3 << 4,
'script': 4 << 4,
'fetch': 5 << 4,
'xmlhttprequest': 5 << 4,
'sub_frame': 6 << 4,
'font': 7 << 4,
Expand Down Expand Up @@ -2792,14 +2793,72 @@ FilterContainer.prototype.getFilterCount = function() {

/******************************************************************************/

FilterContainer.prototype.benchmark = function(contexts) {
const t0 = performance.now();
const results = [];
for ( const context of contexts ) {
results.push(this.matchString(context));
}
const t1 = performance.now();
return { t0, t1, duration: t1 - t0, results };
// The requests.json.gz file can be downloaded from:
// https://cdn.cliqz.com/adblocking/requests_top500.json.gz
//
// Which is linked from:
// https://whotracks.me/blog/adblockers_performance_study.html
//
// Copy the file into ./tmp/requests.json.gz
//
// If the file is present when you build uBO using `make-[target].sh` from
// the shell, the resulting package will have `./assets/requests.json`, which
// will be looked-up by the method below to launch a benchmark session.
//
// From uBO's dev console, launch the benchmark:
// µBlock.staticNetFilteringEngine.benchmark();
//
// The usual browser dev tools can be used to obtain useful profiling
// data, i.e. start the profiler, call the benchmark method from the
// console, then stop the profiler when it completes.
//
// Keep in mind that the measurements at the blog post above where obtained
// with ONLY EasyList. The CPU reportedly used was:
// https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i7-6600U+%40+2.60GHz&id=2608
//
// Rename ./tmp/requests.json.gz to something else if you no longer want
// ./assets/requests.json in the build.

FilterContainer.prototype.benchmark = function() {
new Promise(resolve => {
const url = vAPI.getURL('/assets/requests.json');
µb.assets.fetchText(vAPI.getURL('/assets/requests.json'), details => {
if ( details.error !== undefined ) {
console.info(`Not found: ${url}`);
resolve();
return;
}
const requests = [];
const lineIter = new µb.LineIterator(details.content);
while ( lineIter.eot() === false ) {
let request;
try {
request = JSON.parse(lineIter.next());
} catch(ex) {
}
if ( request instanceof Object === false ) { continue; }
requests.push(request);
}
resolve(requests);
});
}).then(requests => {
if ( Array.isArray(requests) === false || requests.length === 0 ) {
console.info('No requests found to benchmark');
return;
}
const fctxt = µb.filteringContext;
const t0 = self.performance.now();
for ( const request of requests ) {
fctxt.url = request.url;
fctxt.setDocOriginFromURL(request.frameUrl);
fctxt.setType(request.cpt);
void this.matchString(fctxt);
}
const t1 = self.performance.now();
const dur = t1 - t0;
console.info(`Evaluated ${requests.length} requests in ${dur.toFixed(0)} ms`);
console.info(`\tAverage: ${(dur / requests.length).toFixed(3)} ms per request`);
});
};

/******************************************************************************/
Expand Down
6 changes: 5 additions & 1 deletion tools/make-assets.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,11 @@ fi

rm -rf $DES
mkdir $DES
cp ./assets/assets.json $DES/
cp ./assets/assets.json $DES/

if [ -f ./tmp/requests.json.gz ]; then
gunzip -c ./tmp/requests.json.gz > $DES/requests.json
fi

mkdir $DES/thirdparties
cp -R ../uAssets/thirdparties/easylist-downloads.adblockplus.org $DES/thirdparties/
Expand Down

4 comments on commit 5733439

@uBlock-user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The report claims Ghostery to be a better performant in the benchmarks for the most part. Is it true ?

@gorhill
Copy link
Owner Author

@gorhill gorhill commented on 5733439 Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I didn't check myself -- but there is no reason to believe it's not true.

In general, if you can convert string-based processing into typed array processing, it essentially means you convert string processing into number crunching, which should typically be faster. They claim "Filters are stored in a very compact form, in typed arrays", so that would explain the better performance.

On my side this is what I am trying to do piecemeal (HNTrie, the new PSL), but to convert all of the static filtering engine to be typed array-based is more of a long term effort. In any case, I think that the ability to benchmark outside browsing web pages will allow me to identify places to improve (marginally really, there is a point where little can be done). I currently get an average of ~20µs per request on my side -- I consider this pretty knowing it's mostly JS code after all.

What surprise me more is the result concerning ABP, so I will try to harness the same benchmark method above inside ABP's extension code to see what's going on. I just downloaded and launched the latest ABP dev build from the Chrome store and there seems to be severe regression regarding memory usage. Whereas uBO sit at less than 35 MB with only EasyList, ABP was over 100 MB.

@gorhill
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. I finally hooked the benchmark to ABP's in-extension filtering engine and the results do not match what the blog post is reporting. I get an average of 45µs with ABP -- top=ABP, bottom=uBO:

a

@gorhill
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.