-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: RuleTable::any_enabled
#10971
perf: RuleTable::any_enabled
#10971
Conversation
CodSpeed Performance ReportMerging #10971 will improve performances by 20.14%Comparing Summary
Benchmarks breakdown
|
|
let mut i = 0; | ||
|
||
while i < rules.len() { | ||
any |= self.contains(rules[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intentionally use an bitwise OR here to avoid any branching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd bet it would compile down to the same code if you used an if
. :-) But I actually find this pretty readable as it is personally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acutally, it does not
The iter version has a jump
mov rax, qword ptr [rsp - 112]
movzx edx, byte ptr [rsp - 116]
mov cl, 1
bt rax, rdx
jb .LBB0_2
movzx ecx, byte ptr [rsp - 114]
bt rax, rcx
setb cl
The loop version does not (but it requires more instructions)
mov byte ptr [rsp - 117], cl
lea rax, [rsp - 117]
movzx ecx, byte ptr [rsp - 114]
mov edx, 1
shl edx, cl
movzx ecx, byte ptr [rsp - 116]
bts edx, ecx
test dword ptr [rsp - 112], edx
setne byte ptr [rsp - 117]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I just meant, if self.contains(rules[i]) { any = true }
.
It makes sense that the iter
version has an extra jump because of the short circuit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah, that probably compiles down to the same
let mut i = 0; | ||
|
||
while i < rules.len() { | ||
any |= self.contains(rules[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd bet it would compile down to the same code if you used an if
. :-) But I actually find this pretty readable as it is personally.
let mut any = false; | ||
let mut i = 0; | ||
|
||
while i < rules.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this entire function could just be written as rules.iter().any(|r| self.contains(r))
? Did you try that and it was slower? (I believe it also has the benefit of short circuiting, which may or may not help depending on the typical length of rules
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The typical length is like 1-8 rules (where 8 is rare). The downside is that the function can't be const anymore ;).
But the performance is about the same. So lets use any
as it is easier to understand.
Edit: Codspeed disagrees. The shift version is slightly faster (23% speedup instead of 20%)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I missed the const
requirement.
e94a8ee
to
bac6d21
Compare
Woo! |
Summary
This PR improves the performance of
RuleTable::any_enabled
which is called frequently in expression checking to determine if a specific set of rules is enabled.The old implementation used
enabled.intersects(RuleSet::from_iter(rules))
to test if the enabled set and the tested rules overlap.This worked fine when we had few rules but is now becoming a performance bottleneck when bumping
RuleSet
from 13 to 14 usizes because each call zero initializes a 14 * 8=112 bytes large array on the stack, sets the rule indexes and then computes if the sets overlap.The new implementation avoids constructing a
RuleSet
usingfrom_iter
based on the assumption that we mainly queryany_enabled
with a few rules. This avoids writing 112 bytes on each call.This should make the
any_enabled
check independent of the size of theRuleSet
.Test Plan
cargo test