Skip to content

Commit 90d0aa2

Browse files
committed
perf: Avoid re-canonicalizing the entire IntervalSet on push
Canonicalize is taking up a significant amount due to a regex with a huge amount of character ranges (generated by [lalrpop](https://github.com/lalrpop/lalrpop)'s lexer expanding multiple `\w` in a token). While this could perhaps be fixed in lalrpop I did notice the TODO in the code and after addressing this so we automatically union and compress on each push instead of re-canonicalizing on every push and that fixed the performance problem. I did see the earlier attempt at this rust-lang#1051 and it seems like that was reverted and regression tests were added so I hope that and the existing tests are enough (I don't have a clear idea on what tests might be missing).
1 parent 5ea3eb1 commit 90d0aa2

File tree

1 file changed

+38
-9
lines changed

1 file changed

+38
-9
lines changed

regex-syntax/src/hir/interval.rs

Lines changed: 38 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -80,15 +80,44 @@ impl<I: Interval> IntervalSet<I> {
8080
}
8181

8282
/// Add a new interval to this set.
83-
pub fn push(&mut self, interval: I) {
84-
// TODO: This could be faster. e.g., Push the interval such that
85-
// it preserves canonicalization.
86-
self.ranges.push(interval);
87-
self.canonicalize();
88-
// We don't know whether the new interval added here is considered
89-
// case folded, so we conservatively assume that the entire set is
90-
// no longer case folded if it was previously.
91-
self.folded = false;
83+
pub fn push(&mut self, mut interval: I) {
84+
match self.ranges.binary_search(&interval) {
85+
// Interval is an exact duplicate of one that exists
86+
Ok(_) => {}
87+
Err(i) => {
88+
// The search finds us the first index where the previous interval start is less
89+
// than the new interval start. Since the existing intervals are non-overlapping
90+
// we only need to try to union this single preceding interval
91+
let mut start = i;
92+
if let Some(before_i) = i.checked_sub(1) {
93+
let before = &self.ranges[before_i];
94+
if let Some(union) = before.union(&interval) {
95+
interval = union;
96+
start = before_i;
97+
}
98+
}
99+
// `interval` may overlap with any number of intervals following the insertion
100+
// point so we need to try to union with all of them until we reach the first
101+
// non-overlapping interval
102+
let mut end = i;
103+
for after_i in i..self.ranges.len() {
104+
let after = &self.ranges[after_i];
105+
if let Some(union) = interval.union(after) {
106+
interval = union;
107+
end = after_i + 1;
108+
} else {
109+
break;
110+
}
111+
}
112+
113+
self.ranges.splice(start..end, core::iter::once(interval));
114+
115+
// We don't know whether the new interval added here is considered
116+
// case folded, so we conservatively assume that the entire set is
117+
// no longer case folded if it was previously.
118+
self.folded = false;
119+
}
120+
}
92121
}
93122

94123
/// Return an iterator over all intervals in this set.

0 commit comments

Comments
 (0)