-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate capturing from normal states in NonBacktracking #65340
Separate capturing from normal states in NonBacktracking #65340
Conversation
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions Issue DetailsThis should address issue #65289 where a long literal string pattern caused an out-of-memory error. The capturing support added in PR #65129 effectively doubled memory usage by adding a second capturing-enabled transition array that grew in lockstep with the original one. This PR splits capturing states into a separately indexed set of states, which allows the two transition arrays to grow separately.
|
@stephentoub Is there a way to run the |
Yes there is. I can trigger that, just let me check real quick what the right one is. |
/azp run runtime-libraries-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
Cool, thank you Jose! |
Looks like the outerloop tests failed on the issue fixed by #65333 . I didn't see out of memory errors. I'll rebase onto main and re-run the outerloop tests though. |
This reduces memory usage.
c5499ca
to
0872730
Compare
/azp run runtime-libraries-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
{ | ||
int newsize = _statearray.Length + 1024; | ||
Array.Resize(ref _statearray, newsize); | ||
Array.Resize(ref _delta, newsize << _mintermsCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to suggest we use ref locals to avoid the duplication between the if/else branches, but the fact that _delta and _capturingDelta are of different types makes that more challenging. I guess the duplication is probably better than alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I tried to write it with ref locals before I realized the types don't match.
@@ -43,6 +43,9 @@ internal sealed class SymbolicRegexBuilder<TElement> where TElement : notnull | |||
// states that have been created | |||
internal HashSet<DfaMatchingState<TElement>> _stateCache = new(); | |||
|
|||
// capturing states that have been created | |||
internal HashSet<DfaMatchingState<TElement>> _capturingStateCache = new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: it'd be nice if these were readonly; we can clean that up subsequently, though.
@@ -64,6 +67,7 @@ internal sealed class SymbolicRegexBuilder<TElement> where TElement : notnull | |||
/// </summary> | |||
internal DfaMatchingState<TElement>[]? _statearray; | |||
internal DfaMatchingState<TElement>[]? _delta; | |||
internal DfaMatchingState<TElement>[]? _capturingStatearray; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: it'd be nice if these were _stateArray and _capturingStateArray, but we can clean those up subsequently
I'm trying to interpret the outerloop failure logs. The possibly relevant failures seem to be linux release and debug, which were killed for OOM, but don't specify during which test. Windows release did fail during Osx debug failed on the Anyone have any further insight? |
PatternsDataSet_ConstructRegexForAll_SourceGenerated shouldn't have anything to do with your changes, since that doesn't involve NonBacktracking. That test is quite intensive, though, effectively taking over the machine doing lots and lots of compilation. My guess is that it was running concurrently with StressTestDeepNestingOfLoops, which does use NonBacktracking, and the two of them in combination, with your changes, pushed it over the edge to the point where the whole suite timed out. Same in another run for Match_VaryingLengthStrings_Huge. |
These tests were failing before this PR as well, right? If we believe this PR is necessary but not sufficient, we could merge it and then move on to solving the next part of the puzzle. |
Indeed, this is necessary, but perhaps not sufficient (if that OOM was still from NonBacktracking). And the deep nesting of loops tests still seems to require more attention. I'll merge. |
This should address issue #65289 where a long literal string pattern caused an out-of-memory error. The capturing support added in PR #65129 effectively doubled memory usage by adding a second capturing-enabled transition array that grew in lockstep with the original one. This PR splits capturing states into a separately indexed set of states, which allows the two transition arrays to grow separately.
Edit: that branch name should say nonbacktracking.