Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Oct 11, 2025

Fix: Regex parser to ignore vertical tabs in pattern with RegexOptions.IgnorePatternWhitespace

Plan:

  • Investigate the issue and reproduce the bug
  • Build baseline (clr+libs) successfully
  • Confirm the bug: vertical tab (\v, 0xB) is not ignored with IgnorePatternWhitespace
  • Fix the Category array in RegexParser.cs by changing position 0xB from 0 to W
  • Update s_metachars to include vertical tab character
  • Add test case to existing test file (Regex.Match.Tests.cs)
  • Build and test the fix
  • Verify all tests pass
  • Guard test for .NET Framework platform

Changes Made:

  1. RegexParser.cs (line 1932): Changed Category[0xB] from 0 to W to mark vertical tab as whitespace
  2. RegexParser.cs (line 1942): Added \v to s_metachars SearchValues
  3. Regex.Match.Tests.cs (line 584-588): Added test case for vertical tab with IgnorePatternWhitespace option, guarded with if (!PlatformDetection.IsNetFramework) to prevent failures on .NET Framework

Test Results:

FunctionalTests: 29,291 tests passed, 0 failures
UnitTests: 1,005 tests passed, 0 failures
New test case: Verified passing across all regex engines (Interpreter, Compiled, NonBacktracking, SourceGenerated) on .NET Core

Technical Details:

  • Position 0xB in the Category array corresponds to the vertical tab character ('\v')
  • The fix aligns with .NET's char.IsWhiteSpace('\v') which returns true
  • PCRE2 and other regex engines also treat vertical tab as whitespace in ignore pattern whitespace mode
  • The vertical tab character is now properly ignored when using RegexOptions.IgnorePatternWhitespace on .NET Core
  • Test is guarded to not run on .NET Framework where the fix is not applicable
Original prompt

This section details on the original issue you should resolve

<issue_title>Regex parser doesn't ignore vertical tabs in patter when using RegexOptions.IgnorePatternWhiteSpace</issue_title>
<issue_description>While porting PCRE2 tests suite, one of the failing tests is:

bool isMatch = Regex.IsMatch("ab", "a\vb", RegexOptions.IgnorePatternWhitespace);
Assert.True(isMatch);

This should probably be true and match other engines like PCRE as \v character should be ignored as whitespace and just have the pattern be ab

cc: @stephentoub </issue_description>

Comments on the Issue (you are @copilot in this section)

@ Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions See info in [area-owners.md](https://github.com/dotnet/runtime/blob/main/docs/area-owners.md) if you want to be subscribed.
Issue Details

While porting PCRE2 tests suite, one of the failing tests is:

bool isMatch = Regex.IsMatch("ab", "a\vb", RegexOptions.IgnorePatternWhitespace);
Assert.True(isMatch);

This should probably be true and match other engines like PCRE as \v character should be ignored as whitespace and just have the pattern be ab

cc: @stephentoub

Author: joperezr
Assignees: -
Labels:

area-System.Text.RegularExpressions

Milestone: -
@stephentoub This does seem like a bug; `char.IsWhitespace('\v')` is true, as is `Regex.IsMatch("\v", @"\s")`. Presumably it'd be as simple as changing the `Category[0xB]` value in the following table to be 'X': https://github.com/dotnet/runtime/blob/aac729ff906a31f327823587748687c0308a4043/src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs#L2060-L2062 We might want to dig a tad deeper, as it feels a little deliberate that cell was left as 0 rather than X.

Fixes #73206


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
@Copilot Copilot AI changed the title [WIP] Fix regex parser to ignore vertical tabs with IgnorePatternWhiteSpace Fix RegexParser to ignore vertical tabs with RegexOptions.IgnorePatternWhitespace Oct 11, 2025
@Copilot Copilot AI requested a review from stephentoub October 11, 2025 16:55
Copilot finished work on behalf of stephentoub October 11, 2025 16:55
@stephentoub stephentoub marked this pull request as ready for review October 11, 2025 18:00
@Copilot Copilot AI review requested due to automatic review settings October 11, 2025 18:00
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a bug in the regex parser where vertical tab (\v, character 0xB) was not being treated as whitespace when using RegexOptions.IgnorePatternWhitespace, making it inconsistent with other .NET APIs and regex engines like PCRE2.

  • Updated the ASCII character category table to mark vertical tab as whitespace
  • Added vertical tab to the metacharacters SearchValues for proper escaping
  • Added test case to verify vertical tab is properly ignored with IgnorePatternWhitespace option

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
RegexParser.cs Updated character category table to treat vertical tab as whitespace and added \v to metacharacters
Regex.Match.Tests.cs Added test case verifying vertical tab is ignored with IgnorePatternWhitespace option

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
@Copilot Copilot AI requested a review from stephentoub October 11, 2025 18:39
Copilot finished work on behalf of stephentoub October 11, 2025 18:39
@stephentoub stephentoub enabled auto-merge (squash) October 11, 2025 18:42
@stephentoub stephentoub requested a review from joperezr October 11, 2025 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regex parser doesn't ignore vertical tabs in patter when using RegexOptions.IgnorePatternWhiteSpace

2 participants