Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #133 regex dot in character class #135

Merged
merged 5 commits into from
Dec 19, 2020

Conversation

krsriq
Copy link

@krsriq krsriq commented Dec 11, 2020

What is the reason for this PR?

regexify produces a random character instead of a literal dot when called within a character class (e.g. [.]).

Author's checklist

Summary of changes

Fixes the issue, adds test to verify.

Review checklist

  • All checks have passed
  • Changes are approved by maintainer

Copy link

@IonBazan IonBazan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I tested with random seed and complex regex dataset and it works perfectly!

@@ -337,6 +337,7 @@ public function regexifyDataProvider()
['[a-z]{2,3}', 'brackets quantifiers on character class range'],
['(a|b){2,3}', 'brackets quantifiers on alternation'],
['\.\*\?\+', 'escaped characters'],
['[.]', 'literal dot'],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth to add a test asserting that calling regexify('[.]') will always return "."?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Daniel Schmelz <daniel@schmelz.org>
@pimjansen
Copy link

But a literal dot in regex always need an escape in the input? With this change we actually limit ourself in normal regex use with a catchall dot itself?

@krsriq krsriq changed the title Fix #133 regex dot in square brackets Fix #133 regex dot in character class Dec 12, 2020
Signed-off-by: Daniel Schmelz <daniel@schmelz.org>
@krsriq
Copy link
Author

krsriq commented Dec 12, 2020

But a literal dot in regex always need an escape in the input? With this change we actually limit ourself in normal regex use with a catchall dot itself?

In a bracketed character class a dot does not have to be escaped to be interpreted as a literal dot: https://perldoc.perl.org/perlrecharclass#Bracketed-Character-Classes

The problem here was, that the character class replacement logic did not take that into account. When called with [.] the character class character selector returns .

// All [ABC] become B (or A or C)
$regex = preg_replace_callback('/\[([^\]]+)\]/', function ($matches) {
return Base::randomElement(str_split($matches[1]));
}, $regex);

which is then erroneously replaced with a random ascii character here

$regex = preg_replace_callback('/(?<!\\\)\./', 'static::randomAscii', $regex);

When the character class character selector returns \. instead of . (as is done with this PR), the dot is not replaced by a random ascii character. The additional escape \ is then removed here:

$regex = str_replace('\\', '', $regex);

with the expected end result of a literal dot.

To verify that calling regexify with a normal dot works as expected, I've added ['.', 'catch-all dot'] to the testRegexifySupportedRegexSyntax dataProvider.

Btw thinking about this I realized that we probably have other related issues here, for example when regexify is called with /[*]/..

@pimjansen
Copy link

@krsriq shall we merge this one and raise a new issue for the other issue?

This way we already close this bug right away

@krsriq
Copy link
Author

krsriq commented Dec 19, 2020

@pimjansen - yes, good idea!

@pimjansen pimjansen merged commit 31c3326 into FakerPHP:main Dec 19, 2020
@krsriq
Copy link
Author

krsriq commented Dec 19, 2020

Thanks!

@krsriq krsriq deleted the fix_regex_dot_in_braces branch December 19, 2020 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regexify dot issue
3 participants