NonBacktracking Regex optimizations #102655

ieviev · 2024-05-24T11:16:33Z

Here's the initial list of changes i'd make to the nonbacktracking regex engine,
I haven't yet documented much because i made some design choices that i need to discuss.

Rebar baseline before changes:

After changes:

update: After more optimization:

Essentially i fixed the things that hurt the performance the most, there's many wins of 300% or more already.

Some fundamental design changes:

There is no BDD.Find() but rather a 64kb array lookup, this alone increases performance on unicode by ~3x. The initialization side of this is cheap as it's mostly zeroes, but having the array is important, this could be a span in the hot loop as well since the reference does not change.
NFA mode should not be thought of as "switching to second gear" but rather "stopping a tumor". It should be a last resort option which drops the performance from 500mb/s to 5mb/s to not blow up (search time performance goes from O(n) to O(m*n)). The symbolic automata framework here already provides an extremely small automaton and an unfair advantage over other engines. The one-time cost of potentially some megabytes allocated for state space is what you pay for this speed.
RegexFindOptimizations is often a lot slower than a DFA is, the only cases where AVX brings benefit are small character sets, ranges or string literals. Optimizing scenarios like [A-Z\n] or [A-Z0-9] with SearchValues would benefit a ton too, since they're very common predicates to skip to in the inner loop. Currently i disabled RegexFindOptimizations for the dfa engine for any char set over 4 chars that's not a single range.

There's low hanging fruit in the hot loop as well, i removed the IsDeadend flag entirely, since there is just one dead end state, which i stored the state ID for. The nullability and skipping checks got dedicated short-circuit lookups as having a nullable position at all (a valid match) is usually rare, something like beq or bgt could be used as well. Some if if-else branches in the hot
loop should be removed and resolved somewhere above.

There's a lot of performance still on the table, when compared to resharp, dedicated
DFA prefilter optimizations could help a lot as well.

Let me know what you think of the changes, i think the nonbacktracking engine could be made a lot faster than compiled in most scenarios and the symbolic automata benefits should be used a lot more aggressively.

ieviev · 2024-05-24T11:19:21Z

@dotnet-policy-service agree

ieviev · 2024-05-26T22:29:48Z

More overhead elimination got another 20% or so in some cases,

The all-english and all-russian benchmarks have too many one-off edge cases related to supporting all anchors that i cannot really change. higher performance there would require breaking something or bigger redesign. But for all other cases the engine is crazy fast already.

One cheap option available for the two slower ones is to remove \b;^;$ anchors where they change nothing semantically, \b\w+\b matches the exact same thing as \w+ and this would allow skipping the anchor checks with about ~30% more performance. I'll see what else is possible to improve match count overhead.

...ies/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/StateFlags.cs

ieviev · 2024-05-28T02:09:18Z

\b\w+\b matches the exact same thing as \w+

forgot the zero-width-non-joiner so this is not correct, this would otherwise almost double the performance

but making the conditional nullability checks cheaper did help by around 20%
i think this is about as far as i can get without sinking months into it

one thing i didnt figure out is where should tests for SymbolicRegexMatcher go?
it requires adding a ton of references to the unit test project

danmoseley · 2024-05-28T02:51:14Z

IIRC essentially all the tests for the non backtracking are also run against the other engines, and involve passing the pattern and text in (ie aren't testing the internals directly). Is that what you're looking to do?

BTW is any of this work potentially relevant to the other engines? (I don't own regex just observer here.)

ieviev · 2024-05-28T03:10:29Z

@danmoseley
I meant to write a couple tests for reversal optimization correctness but i suppose the end to end tests will immediately show when something is wrong as well. Or alternatively i could refactor it a bit and move it out of the matcher.

Perhaps Compiled might benefit from the same alphabet compression and character kind lookup. A source generated variant of the DFA would be very impressive though, it'd run circles around all other engines

kasperk81 · 2024-05-28T10:22:10Z

[03:29:12] info: [FAIL] System.Text.RegularExpressions.Tests.AttRegexTests.Test
[03:29:12] info: System.OutOfMemoryException : Out of memory
[03:29:12] info:    at System.Text.RegularExpressions.Symbolic.MintermClassifier..ctor(BDD[] )
[03:29:12] info:    at System.Text.RegularExpressions.Symbolic.UInt64Solver..ctor(BDD[] )

ieviev · 2024-05-28T11:35:09Z

System.OutOfMemoryException : Out of memory in wasm tests

runtime/src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/AttRegexTests.cs

Lines 47 to 48 in e8e553b

    
           (RegexOptions Options, string Pattern, string Input, string Expected)[] cases = Match_MemberData_Cases(engine).ToArray(); 
        
           Regex[] regexes = RegexHelpers.GetRegexes(engine, cases.Select(c => (c.Pattern, (CultureInfo?)null, (RegexOptions?)c.Options, (TimeSpan?)null)).ToArray());

Running thousands of regex instances with no GC inbetween while still maintaining low memory constraints in wasm tests is not easily solved, maybe some of the high memory annotations like <HighAotMemoryUsageAssembly> would do it. Or some sort of pooling / static dictionary that shares array instances based on ranges. Some memory could be shared but overall i think it's cleaner to have the memory associated to the engine instance itself.

ieviev · 2024-05-29T18:57:08Z

Now the engine has less per-match overhead as well,
all-english got about 50% faster and all-russian got 100% faster compared to baseline.

The ending Z anchor is a major pain preventing lots of optimizations,
which also means paying for something you don't use.
A lot of temptation to just throw an exception and not supporting \Z at all.

I added a slight indirection to ascii-only cases to save memory (128 B instead of 65536 B) but the
performance difference is insignificant (< 5%) and at least passes the tests.

Other than that the results look pretty good,
it's not yet clear to me which abstractions are free and which are not so i've been avoiding interfaces.

stephentoub · 2024-05-30T01:20:53Z

Thanks, @ieviev! I will take a look soon.

ieviev · 2024-06-05T21:16:03Z

There's still some things to do here

I'd still want to look at the INullabilityHandler and remove the dependency on StateFlags lookup and the state struct, these are currently causing some redundant work if the nullability is stored in an array.
NFA mode should be decoupled from the high-performance parts and only kept as a fallback mechanism, code sharing between DFA mode and NFA mode makes it harder to optimize things somewhere downstream for the DFA (in the sense that there's no safety guarantees attached to taking shortcuts).
Another thing is that perhaps it is possible to handle the \Z anchor before even starting the match, e.g. making the input span one character shorter if it ends with \n and then starting the match as usual. This could potentially remove a ton of complexity elsewhere in the code. I'm not certain it is always possible but there's many trivial cases where it would work

stephentoub

Sorry for the delay in getting to this. Much appreciated your submitting it.

A couple of comments/questions...

In our previous conversations, you mentioned you thought inner vectorization would be possible, where we could vectorize within a match rather than just finding the next starting place. I don't see that in this PR. Is that possible? Seems like that'd be an opportunity for some significant wins.
I understand from your comments that these change brought significant wins on some patterns, in particular those involving non-ASCII, which is great. I'm a bit concerned though that when running this on our own perf test suite, I'm seeing regressions in various places. You can find the tests in https://github.com/dotnet/performance/tree/main/src/benchmarks/micro/libraries/System.Text.RegularExpressions. Some of the more concerning ones were \p{Sm} and .{2,4}(Tom, which regressed throughput by ~50%, and Holmes.{0,25}(...).{0,25}Holmes..., which regressed throughput by ~25%. Thoughts?

Thanks!

src/libraries/System.Text.RegularExpressions/src/System.Text.RegularExpressions.csproj

.../System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/MatchingState.cs

....Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.cs

src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/NonBacktrackingTests.cs

...em.Text.RegularExpressions/tests/FunctionalTests/System.Text.RegularExpressions.Tests.csproj

src/libraries/System.Text.RegularExpressions/tests/UnitTests/SymbolicRegexTests.cs

ieviev · 2024-06-15T14:56:12Z

In our previous conversations, you mentioned you thought inner vectorization would be possible, where we could vectorize within a match rather than just finding the next starting place. I don't see that in this PR. Is that possible? Seems like that'd be an opportunity for some significant wins.

Yes, this is possible. Any pattern that contains .* could be a lot faster with longer matches. It'd be best to start with inner vectorization without anchors. The presence of anchors makes it more complicated and expensive but i still think it's possible with anchors as well when followed with an anchor context lookup, also it needs a bit of testing to see where the line between match time speedup and unwanted compile/construction-time overhead is.

I understand from your comments that these change brought significant wins on some patterns, in particular those involving non-ASCII, which is great. I'm a bit concerned though that when running this on our own perf test suite, I'm seeing regressions in various places. You can find the tests in https://github.com/dotnet/performance/tree/main/src/benchmarks/micro/libraries/System.Text.RegularExpressions. Some of the more concerning ones were \p{Sm} and .{2,4}(Tom, which regressed throughput by ~50%, and Holmes.{0,25}(...).{0,25}Holmes..., which regressed throughput by ~25%. Thoughts?

I'll definitely profile these as well. There is some overhead from the way edge cases are currently handled. \p{Sm} in particular could be made to skip the reversal part entirely along with other fixed length patterns. I'll follow up about this once i've done some more testing

ieviev · 2024-06-18T20:40:20Z

Now i've had a better look at the dotnet/performance benchmarks as well. Turns out i was wrong about the \p{Sm} benchmark, System.Buffers.ProbabilisticWithAsciiCharSearchValues is actually faster there even with 936 characters in \p{Sm}. But there is a trade-off depending on how frequent the characters are in the input. With <70 occurrences in 20mb of text it was about 2x faster, but it can be either slower or faster depending on frequency. I reenabled RegexFindOptimizations there for now since it's more of a statistics question when can you safely assume all the characters in the set are rare.

Try running the performance tests again - i think it should be faster across the board now.

I thought about inner vectorization as well, but it doesn't seem like there's many benchmarks including .* or other star loops and i think creating a separate implementation of RegexFindOptimizations for the automata engine would be far more beneficial. Derivatives have a very concise and useful way to infer prefixes, that'd work for any pattern, so this would replace a lot of custom logic in RegexPrefixAnalyzer.cs with something similar to this:

 // ... prefix start, looping back to this or any visited states is redundant
SymbolicRegexNode currentState = // ..
var mergedPrefix = solver.Empty
foreach (var minterm in minterms) {
	var derivative = createDerivative(state, minterm)
	if (isRedundant(derivative)) then continue;
	// O(1) bitwise merge relevant transitions for the prefix
	mergedPrefix = solver.Or(mergedPrefix, minterm) 
}
// .. continue breadth first search for longer prefixes
return mergedPrefix

There's many optimizations that currently go undetected/unused which could enable avx in more cases. The avx and transitions could be combined as well, e.g. when optimizing for a longer prefix like Tom.*River, the engine currently skips to the location of |Tom.*River when it could just skip the transitions and continue immediately at |.*River instead.
I think most of the regexes being used in benchmarks spend their time in RegexFindOptimizations territory so this would be immediately noticeable there as well. I'm unsure how this compares to the prefixes used for the Compiled engine, maybe just opting in to the same prefix optimizations Compiled gets and adjusting them for NonBacktracking works as well but i'd need to try that in practice.

I wonder if replacing the algorithm choices in the match with something like Func<..,int> created during construction would be an option. Essentially what i'd want to do is handle as many if-else choices during construction as possible. The ideal DFA would look something like this without any if-else conditions in between:

while (position != end) {
	// check for success
	// get next state
	position++;
}

An optimized DFA will give you a very consistent worst case throughput of somewhere around 500mb/s on any pattern which also accounts for utf16 losses on utf8 input (why the english benchmarks are slower), AVX on benchmarks is a bit of a gamble, which sometimes helps a lot and sometimes does not. High match count overhead can sometimes dominate the benchmark as well. There's massive gains from AVX in some cases but overall i'm not exactly sure where the line is as it depends on input text as well.

The worst case performance should be very good on both ascii and unicode by now, what kind of optimizations to do with prefixes depends on benchmarks used or https://github.com/dotnet/runtime-assets/blob/main/src/System.Text.RegularExpressions.TestData/Regex_RealWorldPatterns.json

I'll refactor and clean up the PR and comments for a bit now.

ieviev · 2024-06-18T21:07:58Z

Ah i left an exception to check if there's over 255 minterms in some pattern as well, a pattern from RegexPcreTests did reach it so a fallback mode definitely necessary. an int array would be a 256K allocation but a couple times faster than BDD so there's a decision there as well. Would pooling the array be an option?

Simplification, style consistency, dead code deletion, some bounds-check removal, etc.

stephentoub

I pushed a commit with a last round of changes.

Otherwise, LGTM.

Thanks!

stephentoub · 2024-07-11T19:29:26Z

I've got another round of cleanup / simplification (some of the code in this PR, some pre-existing) I'm working on and will submit as a separate PR.

veanes · 2024-07-11T20:13:31Z

I will be happy to provide comments or feedback if needed.

dotnet-issue-labeler bot added the area-System.Text.RegularExpressions label May 24, 2024

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label May 24, 2024

build-analysis bot mentioned this pull request May 24, 2024

WASM test failing with OOM #100111

Closed

build-analysis bot mentioned this pull request May 27, 2024

tracing/eventpipe/providervalidation/providervalidation/providervalidation.cmd test failure #101759

Closed

danmoseley reviewed May 27, 2024

View reviewed changes

...ies/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/StateFlags.cs Outdated Show resolved Hide resolved

build-analysis bot mentioned this pull request May 27, 2024

System.Net.Http.Functional.Tests.HttpMetricsTest_Http11_Async_HttpMessageInvoker Test failure #96407

Closed

This was referenced May 29, 2024

[browser][MT] Tests timed out #102559

Closed

Debug assertion failure in System.Net.Http.HttpConnection.SendAsync #100616

Closed

ieviev force-pushed the symbolic-automata branch from 93e2dd0 to 7858a2f Compare May 30, 2024 19:17

steveharter added the tenet-performance Performance related issue label Jun 5, 2024

stephentoub reviewed Jun 14, 2024

View reviewed changes

stephentoub added this to the 9.0.0 milestone Jun 14, 2024

This was referenced Jun 18, 2024

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

FileSystemWatcher_InternalBufferSize_SynchronizingObject test failed in CI #103495

Closed

ieviev added 9 commits July 10, 2024 17:24

getstateflags

764ded8

formatting

81d0dca

removing unused interface

38f28b9

local function typo

cce1188

temporarily removing minterms test

8b946da

re-adding minterms test

d3430b3

reenabling test for all engines

388c256

test bugfix

2704641

expected matches change

0abaabe

stephentoub force-pushed the symbolic-automata branch from b259937 to 6bb517a Compare July 10, 2024 21:24

Review and clean up some code

0a0f409

Simplification, style consistency, dead code deletion, some bounds-check removal, etc.

stephentoub force-pushed the symbolic-automata branch from 6bb517a to 0a0f409 Compare July 10, 2024 21:32

stephentoub approved these changes Jul 10, 2024

View reviewed changes

stephentoub mentioned this pull request Jul 11, 2024

Microsoft.Gen.OptionsValidation.Unit.Test.EmitterTests.ProduceDiagnosticFromOtherAssemblyTest failed in CI #104723

Open

stephentoub merged commit 0c38d80 into dotnet:main Jul 11, 2024
71 of 83 checks passed

stephentoub mentioned this pull request Jul 11, 2024

Some more cleanup to regex NonBacktracking #104766

Merged

DrewScoggins mentioned this pull request Jul 16, 2024

[Perf] Linux/x64: 2 Regressions in Regex on 7/11/2024 6:16:28 PM #104975

Closed

This was referenced Jul 18, 2024

[Perf] Windows/arm64: 21 Improvements on 7/11/2024 4:22:20 PM dotnet/perf-autofiling-issues#38605

Closed

[Perf] Linux/arm64: 25 Improvements on 7/11/2024 4:22:20 PM dotnet/perf-autofiling-issues#38493

Closed

This was referenced Aug 7, 2024

[Perf] Linux/x64: 33 Improvements on 7/11/2024 6:16:28 PM dotnet/perf-autofiling-issues#38403

Closed

[Perf] Linux/x64: 2 Improvements on 7/12/2024 1:00:12 PM dotnet/perf-autofiling-issues#38404

Closed

github-actions bot locked and limited conversation to collaborators Aug 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NonBacktracking Regex optimizations #102655

NonBacktracking Regex optimizations #102655

ieviev commented May 24, 2024 •

edited

Loading

ieviev commented May 24, 2024

ieviev commented May 26, 2024

ieviev commented May 28, 2024

danmoseley commented May 28, 2024

ieviev commented May 28, 2024

kasperk81 commented May 28, 2024

ieviev commented May 28, 2024

ieviev commented May 29, 2024

stephentoub commented May 30, 2024

ieviev commented Jun 5, 2024

stephentoub left a comment

ieviev commented Jun 15, 2024

ieviev commented Jun 18, 2024 •

edited

Loading

ieviev commented Jun 18, 2024 •

edited

Loading

stephentoub left a comment

stephentoub commented Jul 11, 2024

veanes commented Jul 11, 2024

NonBacktracking Regex optimizations #102655

NonBacktracking Regex optimizations #102655

Conversation

ieviev commented May 24, 2024 • edited Loading

ieviev commented May 24, 2024

ieviev commented May 26, 2024

ieviev commented May 28, 2024

danmoseley commented May 28, 2024

ieviev commented May 28, 2024

kasperk81 commented May 28, 2024

ieviev commented May 28, 2024

ieviev commented May 29, 2024

stephentoub commented May 30, 2024

ieviev commented Jun 5, 2024

stephentoub left a comment

Choose a reason for hiding this comment

ieviev commented Jun 15, 2024

ieviev commented Jun 18, 2024 • edited Loading

ieviev commented Jun 18, 2024 • edited Loading

stephentoub left a comment

Choose a reason for hiding this comment

stephentoub commented Jul 11, 2024

veanes commented Jul 11, 2024

ieviev commented May 24, 2024 •

edited

Loading

ieviev commented Jun 18, 2024 •

edited

Loading

ieviev commented Jun 18, 2024 •

edited

Loading