-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Captures support for NonBacktracking (#65129)
* Initial version of captures in NonBacktracking * Working version of captures in NonBacktracking * State/transition priorities for NonBacktracking * Various fixes for capturing in NonBacktracking Re-enable replacement patterns. Fix eager derivative. Fix capture numbering to work with sparsely numbered groups. * Enable many subcapture tests for NonBacktracking Also take any subcaptures out of RegexExperiment intersection and negation tests, as capture semantics with these are not correct yet. * Use new eager derivative even without subcaptures This ensures the correct length matches always. * Enable more subcapture tests for NonBacktracking * Performance work for NonBacktracking captures * Bug fixes and comments * Fix for changes removing exclusive_end * Rename back to _lower * Fix beginning handling for captures * Resurrect deleted test * Remove debugging test * Provide effects semantics for extended combinators Also the unordered Or, which gets the semantics that all alternatives are visited. * Apply suggestions from code review to SparseIntMap Co-authored-by: Stephen Toub <stoub@microsoft.com> * Apply suggestions from code review Cleanup and volatile write Co-authored-by: Stephen Toub <stoub@microsoft.com> * Comments, fixes and cleanup * Disable tests for extended features Conjunction and complement are broken in the new capturing support. * Apply suggestions from code review Fixes for FindEndPositionCapturing Co-authored-by: Stephen Toub <stoub@microsoft.com> * Switch all phases to use eager derivative This allows avoiding capture tracking in third phase if there are no subcaptures. As a side effect of this change all derivatives produce OrderedOr nodes, which for now effectively disables the subsumption optimization. * Flatten ordered or and add subsumption Previously the loop subsumption optimization only worked in SymbolicRegexSet, but that is getting phased out with the order maintaining derivative. This reimplements a version of that for ordered ors. Also do actual canonicalization of ordered ors as we should. * Avoid some copies and overhead in capturing mode * Improved comment and a better assert * Move some per thread state into the runner This avoids some repeated allocations in the capturing mode. * Fix typo in SparseIntMap * Various cleanup * Fix a comment * Avoid copying captures when doing quick match Co-authored-by: Stephen Toub <stoub@microsoft.com>
- Loading branch information
1 parent
bda8c91
commit d78094e
Showing
26 changed files
with
1,731 additions
and
580 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
33 changes: 33 additions & 0 deletions
33
...m.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/DerivativeEffect.cs
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
// Licensed to the .NET Foundation under one or more agreements. | ||
// The .NET Foundation licenses this file to you under the MIT license. | ||
|
||
namespace System.Text.RegularExpressions.Symbolic | ||
{ | ||
/// <summary> | ||
/// Describes effects to record capture start and end points. | ||
/// </summary> | ||
/// <remarks> | ||
/// These are applied into registers (arrays of positions for all capture starts and ends) and amount to assignments | ||
/// of the current input position. Effects are generated and associated with transitions in effect-aware versions | ||
/// of MkDerivative in SymbolicRegexNode. | ||
/// </remarks> | ||
internal struct DerivativeEffect | ||
{ | ||
public enum EffectKind | ||
{ | ||
/// <summary>Effect to assign the current input position to an index in the capture starts array.</summary> | ||
CaptureStart, | ||
/// <summary>Effect to assign the current input position to an index in the capture ends array.</summary> | ||
CaptureEnd, | ||
}; | ||
|
||
public EffectKind Kind; | ||
public int CaptureNumber; | ||
|
||
public DerivativeEffect(EffectKind kind, int captureNumber) | ||
{ | ||
Kind = kind; | ||
CaptureNumber = captureNumber; | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.