Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Sep 25, 2025

  • Analyzed the issue: GeneratedRegex summaries show escaped patterns (e.g., \\w instead of \w)
  • Located the problematic code in RegexGenerator.Emitter.cs line 62: Literal(regexMethod.Pattern, quote: false)
  • Found that Literal uses SymbolDisplay.FormatLiteral(s, quote) which escapes backslashes
  • Identified the issue is in the XML documentation generation, not the pattern storage
  • Built baseline runtime+libs successfully
  • Confirmed existing tests expect the double-escaped behavior
  • Created focused test to demonstrate the issue
  • Fixed the escaping by replacing Literal(regexMethod.Pattern, quote: false) with just regexMethod.Pattern
  • Verified new test passes with the fix
  • Updated existing tests to match expected single-escaped behavior (fixed 2 failing tests)
  • Validated all RegexGeneratorOutputTests pass (4/4)
  • Validated all GeneratedRegex-related tests pass (113/113)
  • FIXED XML PARSING ERRORS: Added proper EscapeForXmlComment function to handle invalid XML characters
  • Created comprehensive XML-safe escaping that handles control characters without affecting backslashes
  • Added test to verify control characters are properly escaped as Unicode sequences
  • Verified fix resolves the build failures mentioned in PR feedback
  • REFACTORED: Enhanced existing EscapeXmlComment method instead of creating duplicate functionality
  • Consolidated XML escaping logic to benefit all XML comment generation throughout the generator
  • FIXED ALL TEST FAILURES: Enhanced XML escaping to handle whitespace and C1 control characters
  • Added comprehensive escaping for characters 0x00-0x1F (C0 controls), 0x80-0x9F (C1 controls), and whitespace
  • All 29,287 System.Text.RegularExpressions tests now pass

Summary: The issue was that Literal() function was double-escaping backslashes in regex patterns when generating XML documentation comments. The fix enhances the existing EscapeXmlComment() method to properly handle XML entities, invalid XML characters, whitespace control characters, and C1 control characters while preserving backslashes. This provides comprehensive XML-safe escaping for all generated comments and resolves the original double-escaping issue, XML parsing errors, and all test failures.

Original prompt

This section details on the original issue you should resolve

<issue_title>Escaped pattern in the summary of GeneratedRegex</issue_title>
<issue_description>See #88175, the screenshot was showing \. for the IP address regex, which is expected.

screenshot

But somehow I don't see the same behavior in VS17.8?

image

</issue_description>

Comments on the Issue (you are @copilot in this section)

@ Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions See info in [area-owners.md](https://github.com/dotnet/runtime/blob/main/docs/area-owners.md) if you want to be subscribed.
Issue Details

See #88175, the screenshot was showing \. for the IP address regex, which is expected.

screenshot

But somehow I don't see the same behavior in VS17.8?

image

Author: pawchen
Assignees: -
Labels:

area-System.Text.RegularExpressions, untriaged

Milestone: -
@stephentoub > is this a real issue?

Yes, it appears to be. We probably shouldn't be using the Literal(...) function, which I expect is what is doing that escaping.

I can't explain why, though, it wasn't escaped from my initial change, which included the use of this function, so it's possible my explanation is incorrect.</comment_new>
<comment_new>@danmoseley
@pawchen Want to try that change and see whether it fixes it for you? We'd welcome a PR if so</comment_new>

Fixes #94760

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits September 25, 2025 16:53
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copilot AI changed the title [WIP] Escaped pattern in the summary of GeneratedRegex Fix escaped pattern display in GeneratedRegex XML documentation Sep 25, 2025
Copilot finished work on behalf of stephentoub September 25, 2025 16:59
Copilot AI requested a review from stephentoub September 25, 2025 16:59
@stephentoub
Copy link
Member

@copilot, builds are failing with errors like:

/__w/1/s/artifacts/obj/System.Data.Common/Debug/net10.0/System.Text.RegularExpressions.Generator/System.Text.RegularExpressions.Generator.RegexGenerator/RegexGenerator.g.cs(11,12): error CS1570: XML comment has badly formed XML -- ''.', hexadecimal value 0x00, is an invalid character. Line 2901, position 122.' [/__w/1/s/src/libraries/System.Data.Common/src/System.Data.Common.csproj]
/__w/1/s/artifacts/obj/System.Data.Common/Debug/net10.0/System.Text.RegularExpressions.Generator/System.Text.RegularExpressions.Generator.RegexGenerator/RegexGenerator.g.cs(84,12): error CS1570: XML comment has badly formed XML -- ''.', hexadecimal value 0x00, is an invalid character. Line 3, position 110.' [/__w/1/s/src/libraries/System.Data.Common/src/System.Data.Common.csproj]
/__w/1/s/artifacts/obj/System.Data.Common/Debug/net10.0/System.Text.RegularExpressions.Generator/System.Text.RegularExpressions.Generator.RegexGenerator/RegexGenerator.g.cs(184,12): error CS1570: XML comment has badly formed XML -- ''.', hexadecimal value 0x00, is an invalid character. Line 27, position 27.' [/__w/1/s/src/libraries/System.Data.Common/src/System.Data.Common.csproj]

… pattern documentation

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Sep 25, 2025

@copilot, builds are failing with errors like:

/__w/1/s/artifacts/obj/System.Data.Common/Debug/net10.0/System.Text.RegularExpressions.Generator/System.Text.RegularExpressions.Generator.RegexGene...

Fixed the XML parsing errors in commit eb489a5. The issue was that my initial fix removed the Literal() function which was handling invalid XML characters like null bytes (0x00). I've now added a proper EscapeForXmlComment() function that:

  1. Handles all invalid XML characters (including control characters) by escaping them as Unicode sequences
  2. Preserves proper XML entity escaping for &, <, >
  3. Does NOT escape backslashes (fixing the original double-escaping issue)

The builds should now pass without XML parsing errors while maintaining the correct single-escaped pattern display in documentation.

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copilot AI requested a review from stephentoub September 25, 2025 19:51
Copilot finished work on behalf of stephentoub September 25, 2025 19:51
@danmoseley
Copy link
Member

@copilot there are multiple failures in the System.Text.RegularExpressions.Tests tests. Please run ALL those tests and get them all passing before returning.

Co-authored-by: danmoseley <6385855+danmoseley@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Sep 25, 2025

@copilot there are multiple failures in the System.Text.RegularExpressions.Tests tests. Please run ALL those tests and get them all passing before returning.

All System.Text.RegularExpressions.Tests are now passing! Fixed in commit c4141a2. The issue was that my XML escaping needed to handle C1 control characters (0x80-0x9F) and whitespace characters that were breaking XML comment structure. The enhanced EscapeXmlComment method now properly escapes all problematic characters while preserving backslashes.

Final results: 29,287 tests passed, 0 failed.

Copilot finished work on behalf of danmoseley September 25, 2025 22:44
Copilot AI requested a review from danmoseley September 25, 2025 22:44
@danmoseley danmoseley marked this pull request as ready for review September 25, 2025 23:51
Copilot AI review requested due to automatic review settings September 25, 2025 23:51
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an issue where GeneratedRegex XML documentation was showing double-escaped backslashes in regex patterns (e.g., \\w instead of \w). The root cause was using the Literal() function which performs C# string escaping when generating XML documentation comments.

  • Replaced Literal(regexMethod.Pattern, quote: false) with direct pattern usage in XML comments
  • Enhanced the existing EscapeXmlComment() method to handle XML entities, control characters, and whitespace characters
  • Added comprehensive test coverage for the escaping behavior

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs Enhanced EscapeXmlComment() method with comprehensive character escaping and fixed XML documentation generation to use raw pattern instead of C# string literal
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorOutputTests.cs Updated existing test expectations and added three new tests to verify proper escaping behavior

@stephentoub
Copy link
Member

/ba-g unrelated timeout in System.Runtime tests

@stephentoub stephentoub merged commit 252034f into main Sep 27, 2025
85 of 88 checks passed
@stephentoub stephentoub deleted the copilot/fix-635ce33c-7908-4d16-9a56-a45c8f60a29b branch September 27, 2025 11:46
case '>': sb.Append("&gt;"); break;

// Propagate all other valid XML characters as-is. Control chars are considered invalid.
case (>= 0x20 and <= 0x7F) or (>= 0xA0 and <= 0xD7FF) or (>= 0xE000 and <= 0xFFFD): sb.Append(c); break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use char.IsControl() ?

Not sure what you would want here with surrogate chars... Really depends on what you do with the regex. Let's hope that you escape these yourself with \ patterns of interesting.

@github-actions github-actions bot locked and limited conversation to collaborators Oct 29, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Escaped pattern in the summary of GeneratedRegex

4 participants