Skip to content

Commit

Permalink
perf(linter): apply small file optimization, up to 30% faster (#6600)
Browse files Browse the repository at this point in the history
Theory: iterating over the rules three times has slightly worse cache locality, because the prior iterations have pushed `rule` out of the cache by the time we iterate over it again. By iterating over each rule only once, we improve cache performance (hopefully). We also don't need to collect rules to a Vec, so it saves some CPU/memory there too.

In practice: the behavior here actually depends on the number of AST nodes that are in the program. If the number of nodes is large, then it's better to iterate over the nodes only once and iterate the rules multiple times. But if the number of nodes is small, then it's better to iterate over nodes multiple times and only iterate over the rules once. See this comment for more context: #6600 (comment), as well as the comment inside the PR: https://github.com/oxc-project/oxc/pull/6600/files#diff-207225884c5e031ffd802bb99e4fbacbd8364b1343a1cec5485bf50f29186300R131-R143.

In practice, this can make linting a file 1-45% faster, depending on the size of the file, number of AST nodes, number of files, CPU cache size, etc. To accommodate large and small files better, we have an explicit threshold of 200,000 AST nodes, which is an arbitrary number picked based on some benchmarks on my laptop. For large files, the linter behavior doesn't change. For small files, we switch to iterating over nodes in the inner loop and iterating over rules once in the outer loop.
  • Loading branch information
camchenry committed Oct 22, 2024
1 parent 619d06f commit 8387bac
Showing 1 changed file with 60 additions and 16 deletions.
76 changes: 60 additions & 16 deletions crates/oxc_linter/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -121,30 +121,74 @@ impl Linter {
.rules
.iter()
.filter(|rule| rule.should_run(&ctx_host))
.map(|rule| (rule, Rc::clone(&ctx_host).spawn(rule)))
.collect::<Vec<_>>();

for (rule, ctx) in &rules {
rule.run_once(ctx);
}
.map(|rule| (rule, Rc::clone(&ctx_host).spawn(rule)));

let semantic = ctx_host.semantic();
for symbol in semantic.symbols().symbol_ids() {

let should_run_on_jest_node =
self.options.plugins.has_test() && ctx_host.frameworks().is_test();

// IMPORTANT: We have two branches here for performance reasons:
//
// 1) Branch where we iterate over each node, then each rule
// 2) Branch where we iterate over each rule, then each node
//
// When the number of nodes is relatively small, most of them can fit
// in the cache and we can save iterating over the rules multiple times.
// But for large files, the number of nodes can be so large that it
// starts to not fit into the cache and pushes out other data, like the rules.
// So we end up thrashing the cache with each rule iteration. In this case,
// it's better to put rules in the inner loop, as the rules data is smaller
// and is more likely to fit in the cache.
//
// The threshold here is chosen to balance between performance improvement
// from not iterating over rules multiple times, but also ensuring that we
// don't thrash the cache too much. Feel free to tweak based on benchmarking.
//
// See https://github.com/oxc-project/oxc/pull/6600 for more context.
if semantic.stats().nodes > 200_000 {
// Collect rules into a Vec so that we can iterate over the rules multiple times
let rules = rules.collect::<Vec<_>>();

for (rule, ctx) in &rules {
rule.run_on_symbol(symbol, ctx);
rule.run_once(ctx);
}
}

for node in semantic.nodes() {
for (rule, ctx) in &rules {
rule.run(node, ctx);
for symbol in semantic.symbols().symbol_ids() {
for (rule, ctx) in &rules {
rule.run_on_symbol(symbol, ctx);
}
}
}

if ctx_host.frameworks().is_test() && self.options.plugins.has_test() {
for jest_node in iter_possible_jest_call_node(semantic) {
for node in semantic.nodes() {
for (rule, ctx) in &rules {
rule.run_on_jest_node(&jest_node, ctx);
rule.run(node, ctx);
}
}

if should_run_on_jest_node {
for jest_node in iter_possible_jest_call_node(semantic) {
for (rule, ctx) in &rules {
rule.run_on_jest_node(&jest_node, ctx);
}
}
}
} else {
for (rule, ref ctx) in rules {
rule.run_once(ctx);

for symbol in semantic.symbols().symbol_ids() {
rule.run_on_symbol(symbol, ctx);
}

for node in semantic.nodes() {
rule.run(node, ctx);
}

if should_run_on_jest_node {
for jest_node in iter_possible_jest_call_node(semantic) {
rule.run_on_jest_node(&jest_node, ctx);
}
}
}
}
Expand Down

0 comments on commit 8387bac

Please sign in to comment.