-
Notifications
You must be signed in to change notification settings - Fork 887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Support regex lookarounds #3100
Comments
As a designing choice, we currently use a NFA engine for regex matching. |
Thanks for linking those pages, @fynv. Will give some of the articles a read to better understand. |
@fynv @davidwendt I'm confused the "features like backreferences lookarounds are not possible to be supported" statement. rapidsai/custrings#94 suggests backreferences are already supported, at least in some cases. Can you comment? |
Sorry for the confusion. The "backreferences" in rapidsai/custrings#94 refers to the '\1' '\2' elements in the replace expression ('repl') of nvstrings.replace_with_backrefs(), not in the regular expression, although it does have to work with a regular expression.
Here 'pat' cannot include '\1', '\2' while 'repl' can include them. |
I have a similar issue with my use case where I'm getting around by doing a workaround. So I would like to know which of all strings don't start with a word(say
Output:
But the expected output is:
Since negative lookaheads aren't supported my work-around code is to match strings that start with word
Output:
cc: @beckernick |
I'm closing this as "wontfix" because lookarounds are not compatible with the current NFA regex engine. We would need a new engine to support lookarounds. If there is more demand for lookarounds, we recommend dispatching to a CPU regex engine - something that will become much faster in next gen hardware such as the Grace Hopper Superchip. |
Description
As a user, I would like to be able to include positive and negative lookaheads / lookbehinds in my regular expressions.
Example Behavior
As an example of a negative lookahead, see: https://regex101.com/r/0275Hq/1
Given
The regex pattern
q(?!u)
should match the q inqate
and inqatar
, but not inquit
. In words, "Find the q, but only if it's not followed by a u."For a nice description of lookarounds, see: https://www.regular-expressions.info/lookaround.html
The text was updated successfully, but these errors were encountered: