-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
match bug #1319
Comments
Oh, forgot to mention that this was originally reported in this StackOverflow question: https://stackoverflow.com/questions/56906725/ripgrep-missing-character-class-repetions |
Replacing the line: ripgrep/grep-regex/src/literal.rs Line 217 in 4858267
with if max.map_or(true, |max| min <= max) { or just dropping the whole |
I wrote a fuzzer that uses the regex_generate crate to produce matching inputs for a given fuzz-generated regular expression and then checks if |
@jakubadamw That's awesome! Do you want to submit a PR? If not, I'll get to this eventually! |
@BurntSushi, sure! 🙂 |
This appears to be another transcription bug from copying this code from the prefix literal detection from inside the regex crate. Namely, when it comes to inner literals, we only want to treat counted repetition as two separate cases: the case when the minimum match is 0 and the case when the minimum match is more than 0. In the former case, we treat `e{0,n}` as `e*` and in the latter we treat `e{m,n}` where `m >= 1` as just `e`. We could definitely do better here. e.g., This means regexes like `(foo){10}` will only have `foo` extracted as a literal, where searching for the full literal would likely be faster. The actual bug here was that we were not implementing this logic correctly. Namely, we weren't always "cutting" the literals in the second case to prevent them from being expanded. Fixes #1319, Closes #1367
This matches:
But this doesn't:
To minimize, this doesn't match:
But this does:
The only difference between the latter two is that the latter removes the first
T
from the regex.From inspecting the
--trace
output, I note that from the former regex, itsays this:
But in the latter regex (the one that works), we have this:
Therefore, this is almost certainly a bug in literal extraction. Moreover,
this Rust program correctly prints
true
:Which points the finger at
grep-regex
's inner literal extraction. Sigh.The text was updated successfully, but these errors were encountered: