Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NonBacktracking Regex optimizations #102655

Merged
merged 63 commits into from
Jul 11, 2024
Merged
Changes from 1 commit
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
34eba54
Regex automata optimizations
ieviev May 24, 2024
49607f4
off by one err
ieviev May 24, 2024
5ac29f3
wip reversal optimizations
ieviev May 26, 2024
e440dec
removing unnecessary overhead
ieviev May 26, 2024
627fd90
handle final position correctly
ieviev May 26, 2024
7ae6440
edge case workarounds, tests should be ok again
ieviev May 27, 2024
383f3e5
optimizing lookup initialization
ieviev May 27, 2024
5a2636c
more dfa overhead removed
ieviev May 28, 2024
57e5b8d
removed potential rewrite
ieviev May 28, 2024
4d275db
low memory variant
ieviev May 28, 2024
c35ed7e
some kind of compromise between speed and memory
ieviev May 28, 2024
868e02d
cheaper nullability checks
ieviev May 29, 2024
14afd18
nullability encoding
ieviev May 29, 2024
5f5ab55
nullability cached as bytes
ieviev May 29, 2024
dd121de
reverting some changes
ieviev May 30, 2024
723c5b6
testing nfa fallback
ieviev Jun 5, 2024
6bf4095
refactoring, work in progress
ieviev Jun 17, 2024
b10e600
refactoring to struct interfaces
ieviev Jun 18, 2024
d68bd3c
refactoring optimizations
ieviev Jun 18, 2024
153dfc3
fallback mode and bugfix
ieviev Jun 18, 2024
4aebe3e
reenable warnings
ieviev Jun 18, 2024
1e6f55c
anchor edge case
ieviev Jun 19, 2024
c6ad3ac
anchor edge cases
ieviev Jun 19, 2024
e10b43f
Apply suggestions from code review
ieviev Jun 19, 2024
f581755
Apply suggestions from code review
ieviev Jun 27, 2024
01a9684
rebased branch and some cleanup
ieviev Jun 27, 2024
341ce27
cleanup, removing unused features
ieviev Jun 27, 2024
1a28c69
cleanup
ieviev Jun 27, 2024
9bba84f
timeout limit changes
ieviev Jun 29, 2024
a957781
lookup allocation threshold and timeout limits
ieviev Jun 30, 2024
7e86855
char mapping
ieviev Jun 30, 2024
99b5717
empty array mapping
ieviev Jun 30, 2024
47c6b04
adding timeout check to create-derivative
ieviev Jun 30, 2024
22d23fa
some cleanup
ieviev Jun 30, 2024
761f897
comments and cleanup
ieviev Jun 30, 2024
53924eb
cleanup and comments
ieviev Jun 30, 2024
e66d3d3
reflecting new limits in tests
ieviev Jul 1, 2024
65c0b8b
rerunning tests
ieviev Jul 1, 2024
de085b4
retesting DFA timeout
ieviev Jul 1, 2024
5ef3b32
more precise regex memory limit for DFA mode
ieviev Jul 2, 2024
281446f
reverting change
ieviev Jul 2, 2024
8f78046
reverting reversal refactor
ieviev Jul 3, 2024
7157520
Apply suggestions from code review
ieviev Jul 3, 2024
931552d
variable naming
ieviev Jul 3, 2024
cc493f1
test for over 255 minterms
ieviev Jul 3, 2024
a0d2390
adding net directive around test
ieviev Jul 3, 2024
0691c58
all engines in minterms test
ieviev Jul 3, 2024
8ceb207
Apply suggestions from code review
ieviev Jul 3, 2024
379519b
Apply suggestions from code review
ieviev Jul 3, 2024
57c8f6d
simplifying code
ieviev Jul 3, 2024
2e57d42
state flag values down
ieviev Jul 3, 2024
60b1352
mintermclassifier changes
ieviev Jul 3, 2024
2900aad
reversal
ieviev Jul 4, 2024
764ded8
getstateflags
ieviev Jul 4, 2024
81d0dca
formatting
ieviev Jul 4, 2024
38f28b9
removing unused interface
ieviev Jul 4, 2024
cce1188
local function typo
ieviev Jul 4, 2024
8b946da
temporarily removing minterms test
ieviev Jul 5, 2024
d3430b3
re-adding minterms test
ieviev Jul 6, 2024
388c256
reenabling test for all engines
ieviev Jul 8, 2024
2704641
test bugfix
ieviev Jul 8, 2024
0abaabe
expected matches change
ieviev Jul 8, 2024
0a0f409
Review and clean up some code
stephentoub Jul 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
simplifying code
ieviev authored and stephentoub committed Jul 10, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
commit 57c8f6d41385de279752413743d3824da79a155b
Original file line number Diff line number Diff line change
@@ -54,13 +54,6 @@ public MintermClassifier(BDD[] minterms)
{
_maxChar = Math.Max(_maxChar, (int)BDDRangeConverter.ToRanges(minterms[mintermId])[^1].Item2);
}
// there is an opportunity to gain around 5% performance for allocating the
// full 64K, past a certain threshold where maxChar is already large.
// TODO: what should this threshold be?
if (_maxChar > 32_000)
{
_maxChar = ushort.MaxValue;
}

// It's incredibly rare for a regex to use more than a hundred or two minterms,
// but we need a fallback just in case.
@@ -125,14 +118,6 @@ public int GetMintermID(int c)
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public int[]? IntLookup() => _intLookup;

/// <summary>
/// Whether the full 64K char lookup is allocated.
/// This accelerates the minterm mapping by removing an if-else case,
/// and is only considered for the common &lt;= 255 minterms case
/// </summary>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public bool IsFullLookup() => _lookup is not null && _lookup.Length == ushort.MaxValue + 1;

/// <summary>
/// Maximum ordinal character for a non-0 minterm, used to conserve memory
/// </summary>
Loading