-
Notifications
You must be signed in to change notification settings - Fork 7.9k
[PHP7] adds support for named captures to mb_ereg*
functions
#2044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
`mb_ereg`, `mb_ereg_search_regs` & `mb_ereg_search_getregs` returned only numbered capturing groups. Now they return both numbered and named capturing groups. Fixes Bug #72704.
af8b549
to
0595f37
Compare
mb_ereg*
functions
ping @hirokawa ? |
Named subpatterns are now passed to `mb_ereg_replace_callback`. This commit also adds a subset of the oniguruma back-reference syntax for replacements: * `\k<name>` and `\k'name'` for named subpatterns. * `\k<n>` and `\k'n'` for numbered subpatterns These last two notations allow referencing numbered groups where n > 9.
0595f37
to
2792f29
Compare
@cmb69 In addition, considering that Oniguruma turns off numbered capturing groups when named capturing groups are used, and disallows mixing numbered and named backreferences, it might be a good idea to reconsider whether filling the IMO it would be better to reflect the behavior of the engine: mb_ereg('(a)(b)(c)', 'abc', $matches);
//=> [0 => "abc", 1 => "a", 2 => "b", 3 => "c"]
mb_ereg('(?<a>a)(?<b>b)(?<c>c)', 'abc', $matches);
//=> [0 => "abc", "a" => "a", "b" => "b", "c" => "c"] So that things like: count($matches); //=> 4
array_values($matches); //=> ["abc", "a", "b", "c"] would yield more sensible results. What do you think? |
I agree that would make sense. However, AIUI that would even cause a greater BC break. Not sure, if we should do that for PHP 7.x. :-/ |
Nice patch. I would like to see this in 7.x. |
Since this changes an existing test, it would appear to have BC concerns, as such nobody is able to merge this without consensus. I request that you start an internals discussion to gather that consensus, alternatively, you may try to introduce the enhancement without BC implications. If you consider this work abandoned, please close this PR. |
@ju1ius bump, was a discussion started ? please link back to any discussion that was started. This looks abandoned to me, and will be closed a month hence if no activity is seen on this PR. |
Having waited more than a month for feedback on this issue, and since no discussion seems to have materialized, I'm closing this PR. Please take this action as encouragement to open a clean PR and start the discussion as requested. |
Hello, Rui |
I think there is little BC concern here for a master target -- at least if the current implementation rather than #2044 (comment) is used. Assigning myself to review and land. |
Thanks @nikic for taking the time to merge and document this feature ! |
Fixes Bug #72704.
Changes to
mb_ereg
The third parameters passed to
mb_ereg
now contains both numbered and named references to the capturing groups.Changes to
mb_ereg_search_regs
andmb_ereg_search_getregs
The two functions now return both numbered and named references to the capturing groups. (see above)
Changes to
mb_ereg_replace_callback
Named references to the capturing groups are now passed to
mb_ereg_replace_callback
.Changes to
mb_ereg_replace
This PR adds a subset of the oniguruma back-reference syntax for replacement strings in
mb_ereg_replace
:\k<name>
and\k'name'
for named subpatterns.\k<n>
and\k'n'
for numbered subpatternsThese last two notations allow referencing numbered groups where n > 9, which is not currently implemented with the
\n
notation (and isn't implemented by Ruby either).Examples:
Note that if the pattern contains named subpatterns, numbered references in the replacement string will be ignored (except the '\0' reference):
This behavior is for consistency with Oniguruma, which does not allow numbered backreferences in a pattern using named subpatterns: