-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible index off by one in matches by the ZERO_PLUS operator #766
Comments
Hi, Sorry for the delay getting to this. Two issues here:
Matt |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Hi,
@mbatchkarov and I found a bug in the matcher when using a
ZERO_PLUS
operator. There is also a possible inconsistency in the matches, which may or may not be true. Take a look at the following code example:Output:
The obvious bug is related to the index that is passed to the list of matches. We are not sure if this is due to a faulty index passed by the matcher or by a faulty match. The fact that it matches any token after what is the match means it is probably a bad index.
Apart from the index, it is not quite clear what the behaviour of the
ZERO_PLUS
operator should be. In the case above we see two interpretations:['Philippe Philippe', 'Philippe']
to match a greedy matching behaviour (likere.findall('(P+)', 'PP of P')
),['Philippe', 'Philippe Philippe', 'Philippe', 'Philippe']
to produce all possible matches consistent with how matches from different rules behave.It is not clear what the logic of the current output is, so maybe it's just the manifestation of another bug.
Here is another test case that doesn't work at all:
Output:
The text was updated successfully, but these errors were encountered: