Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode characters support in regex #233

Merged
merged 4 commits into from
Sep 23, 2024

Conversation

kunchtler
Copy link

Linked to #232.

Changed the order in which the tokens are registered in the regex lexer to process the rule about recognizing letters last, and changed that rule to account for all non-blank characters (as specified per python's re library with \S).

Added a test to check for the support of non-ascii characters.

This is my very first pull request ever so feel free to guide me.

@coveralls
Copy link

coveralls commented Jul 18, 2024

Coverage Status

coverage: 99.612% (-0.001%) from 99.613%
when pulling 4fb91df on kunchtler:unicode-regex
into 9ab1a1c on caleb531:develop.

@eliotwrobson eliotwrobson linked an issue Jul 27, 2024 that may be closed by this pull request
Copy link
Collaborator

@eliotwrobson eliotwrobson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kunchtler thanks for this! One request to make this test a little more robust, but overall I think the change looks good.

tests/test_regex.py Show resolved Hide resolved
Copy link
Collaborator

@eliotwrobson eliotwrobson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with the new test!

automata/fa/nfa.py Outdated Show resolved Hide resolved
@eliotwrobson eliotwrobson merged commit 9993dd1 into caleb531:develop Sep 23, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unicode characters with regexp ?
3 participants