Skip to content

Conversation

ju1ius
Copy link
Contributor

@ju1ius ju1ius commented Jul 29, 2016

Fixes Bug #72704.

Changes to mb_ereg

The third parameters passed to mb_ereg now contains both numbered and named references to the capturing groups.

mb_ereg('(?<wsp>\s*)(?<word>\w+)', '', $m);
$m == [0 => "", 1 => "  ", 2 => "", "wsp" => "  ", "word" => ""];

Changes to mb_ereg_search_regs and mb_ereg_search_getregs

The two functions now return both numbered and named references to the capturing groups. (see above)

Changes to mb_ereg_replace_callback

Named references to the capturing groups are now passed to mb_ereg_replace_callback.

mb_ereg_replace_callback('(?<word>\w+)', function($m){ return $m['word']; }, '国国');
//=> "国国"

Changes to mb_ereg_replace

This PR adds a subset of the oniguruma back-reference syntax for replacement strings in mb_ereg_replace:

  • \k<name> and \k'name' for named subpatterns.
  • \k<n> and \k'n' for numbered subpatterns

These last two notations allow referencing numbered groups where n > 9, which is not currently implemented with the \n notation (and isn't implemented by Ruby either).

Examples:

mb_ereg_replace('\s*(?<word>\w+)\s*', "_\k<word>_\k'word'_", '  foo  ' );
// => "_foo_foo_"
mb_ereg_replace('(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(\10)', "_\k<10>_\k'11'_", 'abcdefghijj');
// => "_j_j_"

Note that if the pattern contains named subpatterns, numbered references in the replacement string will be ignored (except the '\0' reference):

mb_ereg_replace('(?<a>A)\k<a>', '-\1-\0', 'AA');
//=> "-\1-AA"

This behavior is for consistency with Oniguruma, which does not allow numbered backreferences in a pattern using named subpatterns:

mb_ereg('(?<a>A)\1', 'AA');
//=> PHP warning:  mb_ereg(): mbregex compile err: numbered backref/call is not allowed. (use name) on line 1

`mb_ereg`, `mb_ereg_search_regs` & `mb_ereg_search_getregs`
returned only numbered capturing groups.
Now they return both numbered and named capturing groups.
Fixes Bug #72704.
@ju1ius ju1ius force-pushed the php7/onig-named-captures branch 4 times, most recently from af8b549 to 0595f37 Compare July 31, 2016 01:06
@ju1ius ju1ius changed the title [PHP7] adds support for named captures to mb_ereg & mb_ereg_search [PHP7] adds support for named captures to mb_ereg* functions Jul 31, 2016
@ju1ius
Copy link
Contributor Author

ju1ius commented Aug 1, 2016

ping @hirokawa ?

Named subpatterns are now passed to `mb_ereg_replace_callback`.

This commit also adds a subset of the oniguruma back-reference syntax
for replacements:
* `\k<name>` and `\k'name'` for named subpatterns.
* `\k<n>` and `\k'n'` for numbered subpatterns
These last two notations allow referencing numbered groups where n > 9.
@ju1ius ju1ius force-pushed the php7/onig-named-captures branch from 0595f37 to 2792f29 Compare August 1, 2016 04:35
@ju1ius
Copy link
Contributor Author

ju1ius commented Aug 4, 2016

@cmb69 In addition, considering that Oniguruma turns off numbered capturing groups when named capturing groups are used, and disallows mixing numbered and named backreferences, it might be a good idea to reconsider whether filling the $regs array with both numbered and named groups is actually the right thing to do.

IMO it would be better to reflect the behavior of the engine:

mb_ereg('(a)(b)(c)', 'abc', $matches);
//=> [0 => "abc", 1 => "a", 2 => "b", 3 => "c"]
mb_ereg('(?<a>a)(?<b>b)(?<c>c)', 'abc', $matches);
//=> [0 => "abc", "a" => "a", "b" => "b", "c" => "c"]

So that things like:

count($matches); //=> 4
array_values($matches); //=> ["abc", "a", "b", "c"]

would yield more sensible results.

What do you think?

@cmb69
Copy link
Member

cmb69 commented Aug 4, 2016

I agree that would make sense. However, AIUI that would even cause a greater BC break. Not sure, if we should do that for PHP 7.x. :-/

@yohgaki
Copy link
Contributor

yohgaki commented Aug 27, 2016

Nice patch. I would like to see this in 7.x.
One possible option without BC is http://php.net/manual/ja/function.mb-regex-set-options.php
Another is use of INI. I'm ok with either one.

@krakjoe
Copy link
Member

krakjoe commented Jan 3, 2017

Since this changes an existing test, it would appear to have BC concerns, as such nobody is able to merge this without consensus.

I request that you start an internals discussion to gather that consensus, alternatively, you may try to introduce the enhancement without BC implications.

If you consider this work abandoned, please close this PR.

@krakjoe
Copy link
Member

krakjoe commented Mar 1, 2017

@ju1ius bump, was a discussion started ? please link back to any discussion that was started.

This looks abandoned to me, and will be closed a month hence if no activity is seen on this PR.

@krakjoe
Copy link
Member

krakjoe commented Apr 3, 2017

Having waited more than a month for feedback on this issue, and since no discussion seems to have materialized, I'm closing this PR.

Please take this action as encouragement to open a clean PR and start the discussion as requested.

@hirokawa
Copy link
Contributor

hirokawa commented Apr 3, 2017

Hello,
Sorry for the delayed response.
I think that this is a great improvement for next release.

Rui

@nikic nikic self-assigned this Apr 3, 2017
@nikic
Copy link
Member

nikic commented Apr 3, 2017

I think there is little BC concern here for a master target -- at least if the current implementation rather than #2044 (comment) is used. Assigning myself to review and land.

@nikic nikic reopened this Apr 3, 2017
@nikic
Copy link
Member

nikic commented Jul 7, 2018

Merged as 212f56b, 8f17826, 41a6625 into master, for the PHP 7.3 release.

Thanks for this great feature and sorry it took so long to merge it!

@nikic nikic closed this Jul 7, 2018
@ju1ius
Copy link
Contributor Author

ju1ius commented Jul 9, 2018

Thanks @nikic for taking the time to merge and document this feature !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants