Skip to content

Ruby/Python: regex parser: group sequences of 'normal' characters #8166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 28, 2022

Conversation

aibaars
Copy link
Contributor

@aibaars aibaars commented Feb 22, 2022

The Javascript library groups sequences of normal characters, it would be good to make the behaviour of the regex libraries behave as similarly to each other as possible.

@aibaars aibaars force-pushed the regex-char-sequence-1 branch 2 times, most recently from 053a7d6 to 11c1633 Compare February 22, 2022 15:03
@aibaars aibaars force-pushed the regex-char-sequence-1 branch from 11c1633 to 69ed121 Compare February 22, 2022 15:15
@aibaars aibaars marked this pull request as ready for review February 23, 2022 12:27
@aibaars aibaars requested review from a team as code owners February 23, 2022 12:27
@aibaars
Copy link
Contributor Author

aibaars commented Feb 23, 2022

@yoff Could you have a look at this?

Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this, thanks for doing it!
I believe that what you are doing, is changing character from being either a simpleCharacter or an escapedCharacter to now being either a maximal simpleCharacter run or an escapedCharacter. I have therefor suggested to rename some of the predicates.

I also wonder if the computation of maximal runs is standard or if a more efficient one might exist, but if the performance looks fine, then we can punt this question.

| ax{01,3} | first | 0 | 1 |
| ax{01,3} | last | 1 | 2 |
| ax{01,3} | last | 1 | 8 |
| ax{01,3} | last | 7 | 8 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and others like it) looks like an existing bug. I think it is due to this line in simpleCharacter not considering qualifiers longer than one character (such as the multiples notation {n, m}).

Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments. Otherwise this looks good to me. 👍

Co-authored-by: yoff <lerchedahl@gmail.com>
@aibaars aibaars force-pushed the regex-char-sequence-1 branch 2 times, most recently from 4fd8b1a to ca0e26f Compare February 25, 2022 16:36
@aibaars aibaars force-pushed the regex-char-sequence-1 branch from ca0e26f to 0c23f58 Compare February 25, 2022 17:43
@yoff
Copy link
Contributor

yoff commented Feb 28, 2022

I see from the failed tests that something is off with using simple chars instead of normal chars. I would like to understand what that is, but it does not have to be during this PR.

@aibaars
Copy link
Contributor Author

aibaars commented Feb 28, 2022

I see from the failed tests that something is off with using simple chars instead of normal chars. I would like to understand what that is, but it does not have to be during this PR.

I think simpleCharacter does not exclude characters that are part of escape sequences.

@aibaars
Copy link
Contributor Author

aibaars commented Feb 28, 2022

@yoff @tausbn I think I addressed your comments.

Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, is the DCA run from earlier still valid?

@yoff
Copy link
Contributor

yoff commented Feb 28, 2022

I think simpleCharacter does not exclude characters that are part of escape sequences.
Ah, yes, that might be it :-)

@aibaars
Copy link
Contributor Author

aibaars commented Feb 28, 2022

LGTM, is the DCA run from earlier still valid?

Yes, the force-push only fixed a test case.

@aibaars aibaars merged commit 5ce6b84 into github:main Feb 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants