-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lock protect nullability cache of symbolic regex node #60942
lock protect nullability cache of symbolic regex node #60942
Conversation
Tagging subscribers to this area: @eerhardt, @dotnet/area-system-text-regularexpressions Issue DetailsAdded lock to protect
|
...tem.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexNode.cs
Outdated
Show resolved
Hide resolved
d681d61
to
6e4e4d8
Compare
@veanes, I tried running this with our perf tests in dotnet/performance, just with NonBacktracking subbed in for the options. There are a few notable regressions. Is that expected?
|
I would not expect any noticeable regressions, because I was expecting the change (locking) not to really affect the hot-path. If I understand correctly, 22% slower for example for |
OK, indeed they do indeed use |
Alternative would be to use a nested array nullability[][] of size 5x5 in each node.
is thread-safe without any locks. I believe that would be the more efficient solution that is also uniform for all cases and avoids locking if I'm right about thread safety above. |
Also, I forgot to add above, the nullability array would be |
nullability array could also be flat, then it would need size 64 (3 bits per kind) then the lookup would directly use context (that is exactly ( |
6e4e4d8
to
4564b04
Compare
@stephentoub, I updated the fix according to my last comment. I'm therefore asking for a re-review. It would be interesting to know the performance comparison after this change, assuming it is still thread-safe -- which it should be, as it replaces the earlier nullability dictionary with an array and gets rid of locking. I'm using |
4564b04
to
3052ed0
Compare
...aries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/CharKind.cs
Outdated
Show resolved
Hide resolved
...tem.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexNode.cs
Outdated
Show resolved
Hide resolved
...tem.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexNode.cs
Outdated
Show resolved
Hide resolved
...tem.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexNode.cs
Outdated
Show resolved
Hide resolved
...tem.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexNode.cs
Outdated
Show resolved
Hide resolved
b93264b
to
34b29ff
Compare
@stephentoub : I took care of the comments and also simplified the initial test in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
94e6f8d
to
0c1aa20
Compare
....Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.cs
Outdated
Show resolved
Hide resolved
....Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.cs
Outdated
Show resolved
Hide resolved
e8d37ae
to
970a4a8
Compare
Added lock to protect
SymbolicRegexNode._nullabilityCache
that stores conditional nullability of a node for a given context, for thread-safety. This computation is not the common case as it only applies when the node (regex) starts with an anchor and can potentially be nullable (accept the empty string). Initial thought was to special case nullability for context 0 using a field but this is already covered in most common cases when the regex is neither nullable nor can be nullable that is checked before. The cache could potentially be moved to the builder as a shared cache to avoid the caches in the nodes but would then create bigger probability of thread contention at the builder level.