-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement capture group references in substitution strings #11849
Conversation
Maybe you could support the limit parameter by constructing a match context and calling |
That's not what match_limit does:
|
Huh, I would have thought there'd be some way to do this (or at least to replace only the first match), but it's very possible there's not. |
It is possible to replace the only first match by not setting the |
What do you think, @vtjnash ? If we want to support a limit parameter besides 1 or infinity, we'll either have to patch pcre's substitute function (it's a pretty trivial change) or port the whole function to julia (it's about 200 lines of C code in pcre2_substitute.c). |
Seems like a good candidate for a patch that we can submit upstream. |
I'm skeptical they'd accept it since it would seem to necessarily involve changing the PCRE API. |
OTOH if even PCRE doesn't support that option, is it really needed in Julia? I'd say remove it, and if it turns out it's really needed it can always be added back later. |
Since the Julia-native |
That would be cool but it seems like you'd need a special string literal type for that. Maybe |
😀 and then, if you want to combine that syntax with string interpolation, you could use the Swift (and others) |
Sounds good @StefanKarpinski . IMO, we might as well use the more traditional Perl and Python backreference substitution syntax (https://docs.python.org/2/library/re.html#re.sub) instead of the ad-hoc once introduced by PCRE2:
|
The functionality is implemented now, mimicking the Python API:
If this API looks good to people, I'll update this PR to include a new section describing it in the manual. |
end | ||
|
||
# Backcapture reference in substitution string | ||
@test replace("abcde", r"(..)(?P<byname>d)", s"\g<byname>xy\1") == "adxybce" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to add a test for when the groupname doesn't exist in the regex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@StefanKarpinski why does this need a special string literal? why wouldn't general-purpose "literal" string literal be sufficient? |
Maybe to avoid API breaking and performance regression? |
bump @StefanKarpinski |
Sorry, I'm traveling and can't take a look at this (writing this on a plane On Wednesday, July 8, 2015, Jonathan Malmaud notifications@github.com
|
Sorry to keep bugging you @StefanKarpinski , but I'm unbelievably excited to reference capture groups in my substitution strings. |
It's a good change. Sorry my travels are causing a delay here!
|
No problem. I think the only real question is the one that @vtjnash brings up - should we introduce the |
+1 -- I like this. But I agree with @vtjnash that |
On Friday, July 10, 2015, Kevin Squire notifications@github.com wrote:
|
On Tuesday, July 7, 2015, Jonathan Malmaud notifications@github.com wrote:
Nice. I like the API. Good reason to use a custom string literal for |
I think the question is whether |
My understanding is that raw strings wouldn't modify the string at all, and would simply be a generalization of the My main concern about adding julia> Base.r"test"
ERROR: UndefVarError: r not defined Would we really want to interpolate into a |
I'd be on board with using raw strings here to avoid backslash hell, and just using |
I think that always substituting plain strings completely literally – i.e. no |
@github, could you finally fix this horrible UI bug? ... didn't want to substitute. You also have the issue that computed substitutions run the risk of accidentally substituting. |
So just to be clear @StefanKarpinski , you are in favor of a |
if we are going to add a SubstitutionString(raw"a\1b") single character string literals are just too unreadable, and too valuable for packages |
If you're already using On Tue, Jul 14, 2015 at 1:07 PM, Jameson Nash notifications@github.com
|
What about It's quite difficult to find help for single letter things, atm even if your knew it was
|
@malmaud wrote:
Yes, I'm all in favor of this – I actually couldn't see the patch before since I was reading email offline and composing messages to be sent later (have been traveling until recently). I like this PR as it is and I also like
Yes, that's entirely worth it. How many Python programs are buggy because someone didn't call |
Probably fine, but it's worth a grep through registered packages to make sure none define |
@kmsquire Is there an easy way to do that? |
Just clone all the packages in METADATA with a script, or ask someone who has already done so, to grep for you. I think there is great karma to be earned if someone have the skill to properly secure such a search service and provide it to the community. |
At any rate, I added documentation and rebased, so this is g2g once they're consensus on this SubstitutionString/s-prefix business. |
Might be nice to have this in before the .4 RC. |
Clever. It looks like https://github.com/andrewcooke/ParserCombinator.jl On Tue, Jul 21, 2015 at 10:52 AM, Iain Dunning notifications@github.com
|
Strings isn't registered as well |
Still, maybe @vtjnash is right and we should use |
Personally, I like the fact that |
I like ParserCombinator.jl exports julia> s""
Warning: both Bar and Foo export "@s_str"; uses of it in module Main must be qualified
ERROR: UndefVarError: @s_str not defined |
|
Don't let Strings.jl get in the way of using |
@andrewcooke Would it be problematic for you if Base exports |
well it conflicts with ParserCombinator, but i can change it to some other letter (i have all of one user that i know of right now). and also, isn't there some way to choose which you want on conflicts? if not, i assume there will be at some point... (so, no, nothing terrible, but thanks for asking!) |
OK then. Checks pass, should be g2g. |
Implement capture group references in substitution strings
Closes #1820.
Use http://www.pcre.org/current/doc/html/pcre2_substitute.html instead of Julia's native
replace
when the search pattern is a regex. This allows for capture reference in the substitution string.Incidentally, it is ~4x faster than Julia's native
replace
implementation and should probably be used even when the search pattern is a not a regex, although I'll save that for another PR if people are interested.Once potentially breaking change is how empty matches are handled:
Before,
replace("abcd", r"b?", "^") == "^a^c^d^"
.Now,
replace("abcd", r"b?", "^") == "^a^^c^d^"
.There are no set of options to PCRE that could cause it to replicate the old behavior.
Another breaking change is that PCRE doesn't support an option to limit how many replacements are made.