Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex evaluation bug - discrepancy between compiled and non-compiled regex #97455

Closed
lahma opened this issue Jan 24, 2024 · 7 comments · Fixed by #97463
Closed

Regex evaluation bug - discrepancy between compiled and non-compiled regex #97455

lahma opened this issue Jan 24, 2024 · 7 comments · Fixed by #97463

Comments

@lahma
Copy link

lahma commented Jan 24, 2024

Description

Running same Regex as compiled and non-compiled version is producing different outcome.

Reproduction Steps

using System.Text.RegularExpressions;

namespace BugReport;

public class RegexTests
{
    [Fact]
    public void CompiledRegexShouldProduceSameResultAsNonCompiled()
    {
        const string Pattern = @"(.*?)a(?!(a+)b\2c)\2(.*)";
        var nonCompiled = new Regex(Pattern, RegexOptions.None);
        var compiled = new Regex(Pattern, RegexOptions.Compiled);

        const string Input = "baaabaac";
        const int GroupNumber = 2;

        Assert.Equal(nonCompiled.Match(Input).Groups[GroupNumber].Value, compiled.Match(Input).Groups[GroupNumber].Value);
    }
}

Expected behavior

Group capture is equal, in this case value should be empty.

Actual behavior

Xunit.Sdk.EqualException
Assert.Equal() Failure: Strings differ
Expected: ""
Actual:   "a"
           ↑ (pos 0)
   at CompiledRegexShouldProduceSameResultAsNonCompiled() in Test.cs:line 18
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)

Regression?

No response

Known Workarounds

No response

Configuration

Runnin latest .NET, Windows 11.

❯ dotnet --version
8.0.101

Other information

No response

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jan 24, 2024
@ghost
Copy link

ghost commented Jan 24, 2024

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Running same code as compiled and non-compiled version is producing different outcome.

Reproduction Steps

using System.Text.RegularExpressions;

namespace BugReport;

public class RegexTests
{
    [Fact]
    public void CompiledRegexShouldProduceSameResultAsNonCompiled()
    {
        const string Pattern = @"(.*?)a(?!(a+)b\2c)\2(.*)";
        var nonCompiled = new Regex(Pattern, RegexOptions.None);
        var compiled = new Regex(Pattern, RegexOptions.Compiled);

        const string Input = "baaabaac";
        const int GroupNumber = 2;

        Assert.Equal(nonCompiled.Match(Input).Groups[GroupNumber].Value, compiled.Match(Input).Groups[GroupNumber].Value);
    }
}

Expected behavior

Group capture is equal.

Actual behavior

Xunit.Sdk.EqualException
Assert.Equal() Failure: Strings differ
Expected: ""
Actual:   "a"
           ↑ (pos 0)
   at CompiledRegexShouldProduceSameResultAsNonCompiled() in Test.cs:line 18
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)

Regression?

No response

Known Workarounds

No response

Configuration

Runnin latest .NET, Windows 11.

❯ dotnet --version
8.0.101

Other information

No response

Author: lahma
Assignees: -
Labels:

area-System.Text.RegularExpressions

Milestone: -

@stephentoub stephentoub added this to the 9.0.0 milestone Jan 24, 2024
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Jan 24, 2024
@stephentoub
Copy link
Member

Thanks for the helpful repro.

Here's a slightly simpler one based on yours:

using System.Text.RegularExpressions;

const string Pattern = @"(?!(b)b)\1";
const string Input = "ba";

var nonCompiled = new Regex(Pattern, RegexOptions.None);
var compiled = new Regex(Pattern, RegexOptions.Compiled);

Console.WriteLine(nonCompiled.Match(Input).Success);
Console.WriteLine(compiled.Match(Input).Success);

The issue appears to be in how compiled (and source generated) regexes are handling capture groups inside of negative lookarounds. They're not uncapturing when exiting the construct, so whereas the backreference doesn't end up matching in the interpreter (because there's no capture to match), it does end up matching in the compiled regex because the capture is still there and matches.

@stephentoub
Copy link
Member

stephentoub commented Jan 24, 2024

@lahma, given that a backreference to a capture inside a negative lookahead from outside that lookahead will never match, can you speak to the pattern you were using that encountered this? Was it just a test, or was the pattern actually trying to do something useful with that backreference?

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 24, 2024
@lahma
Copy link
Author

lahma commented Jan 24, 2024

Well hard to say if we can call this a real world scenario, as we are talking about my arch enemy after all. I encountered this when running ECMAScript test suite and testing generated Regex instances in compiled mode when trying to optimize Jint. Here's the actual test case.

@stephentoub
Copy link
Member

Thanks. That's what I figured it was.

@lahma
Copy link
Author

lahma commented Jan 25, 2024

Adding a note that when run under net462 and net6.0 the compiled and non-compiled versions work the same way, both net7.0 and net8.0 have regressed.

@stephentoub
Copy link
Member

Adding a note that when run under net462 and net6.0 the compiled and non-compiled versions work the same way, both net7.0 and net8.0 have regressed.

Yes, we rewrote the compiler in 7.

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 25, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Feb 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants