-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-syntactic way to match from the beginning of text, or "implicit \A
".
#974
Comments
All righty, I think I'm going to try and respond to this from the bottom up,
So I really do not want to do this because it makes the API surface bigger. One could say, "no we should just have Finally, I'd also say that I specifically did not like how Python setup their
This is probably the path I would be least unhappy with, but I don't really
This can absolutely never happen. It is absolutely critical that crate features
For some added context, it is worth mentioning that "anchored search" and
I don't really get why "Prepend Now the second reason against it is interesting, and having to re-compile
Usually the |
Hi @BurntSushi and thanks a lot for your detailed answer, which makes your position very clear indeed :) I think I do understand all of it. And I am convinced. I have learned about this subtle difference between In a nutshell, you are pointing me towards #656 again, and I cannot but wave at this very cool initiative. This would solve this particular issue and also so much more. Regarding that you are also reassuring me with the cost of pattern strings copying vs. regex compilation itself, I am now reconsidering the idea to prepend Just in case this is of interest to anyone, my current plan to enforce this while still avoiding useless regex recompilation and paving the way towards future leveraging of #656 is to newtype So.. I can now move forward. Thank you for your patience :) I am closing this to temper the noise around #656. |
From the first lines in the docs, I have learned this:
I have always been happy with this so far, and I am accustomed to use
\A
to anchor my searches to the beginning of input and elide the implicit.*?
.Today, I am designing a regex-based, general purpose "lexing/tokenizing" API supposed to help users consume input bit by bit from left to right with their own regexes. User can write their own patterns, but then the semantics of these patterns differ from the above semantics: When user writes
"a+b*c"
within this context, what they mean isr"\Aa+b*c"
and not".*?a+b*c"
.To work around this, I have considered then dismissed a few options:
\A
.\A
.\A
to all the patterns handed out by user.Strings
so as to prepend\A
.Regex::new(&format!(r"\A{}", their_regex.as_str()))
.As a consequence, and unless I am missing an obvious way to work around this, I am suggesting that maybe "implicit
\A
prefix" vs. "implicit.*?
prefix" could be something configurable within theregex
crate itself? This could take either of several forms, sorted here from the most comfortable/flexible one to the less comfy, but I guess the best fit would eventually depend on the actual implementation ofregex
:🌞 Extend the
Regex
interface with e.g.Regex::find_at_start(haystack)
in addition toRegex::find(haystack)
. The former would match from the beginning of the input only, in a way similar to Python'sre.match
/re.search
duo. The same would go forRegex::is_match_at_start(haystack)
,Regex::captures_at_start(haystack)
etc.☁️ Extend the
RegexBuilder
interface with e.g.RegexBuilder::matches_from_start(bool)
. The semantics ofRegexBuilder::new("pattern").matches_from_start(false).build()
would be the today's default, whileRegexBuilder::new("pattern").matches_from_start(true).build()
would introduce implicit\A
instead of implicit.*?
.🌨️ Extend control over the "implicit
.*?
" semantics at the crate level with a crate feature, so that in my cargo[dependencies]
, I would ratherregex = {version = "1.7", features = "implicit-start-anchor"}
instead of the current default.Of course, if this is something the
regex
crate would be willing to feature, then I assume the same issue could be addressed for\z
in addition to\A
.The text was updated successfully, but these errors were encountered: