Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Oct 11, 2025

Fix for GeneratedRegex fixer preserving multiline patterns

This PR addresses the issue where the GeneratedRegex code fixer converts multiline verbatim string literals into single-line strings with escape sequences, losing readability.

Problem

When a regex pattern contains actual newlines (from verbatim string literals or string concatenation), the fixer converts them to escape sequences like \r\n, making patterns with RegexOptions.IgnorePatternWhitespace unreadable.

Solution

  • Analyze the codebase and understand the issue
  • Modify UpgradeToGeneratedRegexCodeFixer.cs to detect newlines in pattern strings
  • Update logic to preserve verbatim string format when pattern contains newlines
  • Add comprehensive tests for various scenarios:
    • Verbatim string with actual newlines
    • String concatenation with newlines
  • Build and test the changes
  • Verify all existing tests still pass (29,289 tests total, all passing)
  • Verify no build warnings or errors
  • Address review feedback: Use IndexOfAny for better performance

Changes Made

  1. Added ShouldUseVerbatimString helper method that checks for backslashes, newlines (\n), or carriage returns (\r) using IndexOfAny for optimal performance
  2. Updated pattern handling in GetNode method to use verbatim string syntax when any of these characters are present
  3. Added two new tests to validate the fix works for both direct verbatim strings and string concatenation:
    • MultilineVerbatimStringPreservedByFixer
    • MultilineStringConcatenationPreservedByFixer

Test Results

✅ All 29,289 existing tests pass
✅ All 110 analyzer tests pass
✅ New tests specifically validate multiline pattern preservation
✅ No build warnings or errors
✅ Works for both direct verbatim strings and string concatenation scenarios

Impact

This fix greatly improves readability for regex patterns with RegexOptions.IgnorePatternWhitespace, particularly for complex patterns in projects like MSBuild that use string fragment composition. The fix preserves the original formatting intent from the source code.

Original prompt

This section details on the original issue you should resolve

<issue_title>GeneratedRegex fixer forces pattern onto a single line</issue_title>
<issue_description>
Consider

   private static Regex r = new Regex(@"a
             b
             c", RegexOptions.IgnorePatternWhitespace);

run the fixer, now I have

    [GeneratedRegex("a\r\n             b\r\n             c", RegexOptions.IgnorePatternWhitespace)]
    private static partial Regex MyRegex();

The semantics are the same, but the readability is gone. I would expect

    [GeneratedRegex(@"a
         b
         c", RegexOptions.IgnorePatternWhitespace)]
    private static partial Regex MyRegex();

I see #69616 which implies that whitespace is preserved but not comments. I don't see whitespace preserved, or at least not in the original form.</issue_description>

Comments on the Issue (you are @copilot in this section)

@ Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions See info in [area-owners.md](https://github.com/dotnet/runtime/blob/main/docs/area-owners.md) if you want to be subscribed.
Issue Details
Consider
 ```c#
   private static Regex r = new Regex(@"a
             b
             c", RegexOptions.IgnorePatternWhitespace);

run the fixer, now I have

    [GeneratedRegex("a\r\n             b\r\n             c", RegexOptions.IgnorePatternWhitespace)]
    private static partial Regex MyRegex();

The semantics are the same, but the readability is gone. I would expect

    [GeneratedRegex(@"a
         b
         c", RegexOptions.IgnorePatternWhitespace)]
    private static partial Regex MyRegex();

I see #69616 which implies that whitespace is preserved but not comments. I don't see whitespace preserved, or at least not in the original form.

Author: danmoseley
Assignees: -
Labels:

area-System.Text.RegularExpressions

Milestone: -
@danmoseley This comes up eg., in this MSBuild example ```c# const string itemMetadataSpecification = @"%\(\s*; (?(?[A-Za-z_][A-Za-z_0-9\-]*)\s*\.\s*)? (?[A-Za-z_][A-Za-z_0-9\-]*@ \s*\)"; private Regex s_itemMetadataPattern = new(itemMetadataSpecification, RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture); ``` nice and pretty, I run the fixer and it produces this ugly thing ```c# [GeneratedRegex("%\\(\\s*;\r\n (?(?[A-Za-z_][A-Za-z_0-9\\-]*)\\s*\\.\\s*)?\r\n (?[A-Za-z_][A-Za-z_0-9\\-]*@\r\n \\s*\\)", RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace)] private static partial Regex MyRegex(); ``` as an aside, by running the fixer I'm implicitly okaying it inlining the compound strings, which I think is fine (and inevitable) @danmoseley Built latest bits and found it's already fixed by #78172 @danmoseley Actually, it's only partially fixed. Consider ```c# private const string foo = "bar"; private static Regex r1 = new Regex(@"a " + foo + @" b c", RegexOptions.IgnorePatternWhitespace);
private static Regex r2 = new Regex(@"a        bar
                                        b
                                        c", RegexOptions.IgnorePatternWhitespace);
Both should produce identical results. However, in the first case, I lose the visible whitespace --
```c#
    [GeneratedRegex("a        bar\r\n                                            b\r\n                                            c", RegexOptions.IgnorePatternWhitespace)]
    private static partial Regex MyRegex();
   
   [GeneratedRegex(@"a        bar
                                            b
                                            c", RegexOptions.IgnorePatternWhitespace)]
    private static partial Regex MyRegex1();

Unfortunately many of the dotnet/msbuild regexes are built up by compounding reused string fragments, which in some cases are compounded other ones, and use IgnorePatternWhitespace. This means after running the generator, they need to be fixed by hand.</comment_new>

Fixes #79891


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
@Copilot Copilot AI changed the title [WIP] Fix GeneratedRegex fixer to preserve pattern formatting Fix GeneratedRegex fixer to preserve multiline verbatim string patterns Oct 11, 2025
@Copilot Copilot AI requested a review from stephentoub October 11, 2025 16:43
Copilot finished work on behalf of stephentoub October 11, 2025 16:43
{
// Use verbatim string syntax if the string contains backslashes or newlines
// to preserve readability, especially for patterns with RegexOptions.IgnorePatternWhitespace
return str.Contains('\\') || str.Contains('\n') || str.Contains('\r');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot, can this use IndexOfAny instead, so that there's a single call rather than three?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use IndexOfAny in commit 5dd7b7c

@stephentoub stephentoub marked this pull request as ready for review October 11, 2025 22:50
@Copilot Copilot AI review requested due to automatic review settings October 11, 2025 22:50
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the GeneratedRegex code fixer to preserve multiline verbatim string patterns instead of converting them to single-line strings with escape sequences. The fix addresses readability issues for regex patterns that use RegexOptions.IgnorePatternWhitespace, particularly in complex scenarios with string concatenation.

  • Added logic to detect patterns containing newlines or carriage returns in addition to backslashes
  • Enhanced pattern handling to preserve verbatim string syntax for better readability
  • Added comprehensive test coverage for both direct multiline patterns and string concatenation scenarios

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/libraries/System.Text.RegularExpressions/gen/UpgradeToGeneratedRegexCodeFixer.cs Modified the code fixer to detect newlines/carriage returns and preserve verbatim string syntax
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/UpgradeToGeneratedRegexAnalyzerTests.cs Added two comprehensive tests to verify multiline verbatim string preservation

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GeneratedRegex fixer forces pattern onto a single line

2 participants