Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How are backreferences adjusted? #16

Open
bergus opened this issue Aug 12, 2015 · 7 comments
Open

How are backreferences adjusted? #16

bergus opened this issue Aug 12, 2015 · 7 comments

Comments

@bergus
Copy link
Collaborator

bergus commented Aug 12, 2015

When regex instances are interpolated in blocks, the comment mentions "With back-references adjusted". What does that mean?

The tests don't really help me to understand this:

RegExp.make `^(#+)([^#\r\n]*)${ /\1/ }`                  == /^(#+)([^#\r\n]*)(?:\1)/
RegExp.make `(fo(o))${ /(x)\1(?:\2)/ }bar${ /\1/ }(baz)` == /(fo(o))(?:(x)\3(?:\2))bar(?:\1)(baz)/
RegExp.make `^(${ /(.*)/ }\n(#+)\n${ /(.*)/ }\n\2)\n`    == /^((?:(.*))\n(#+)\n(?:(.*))\n\3)\n/
RegExp.make `${ /\1/ }`                                  == /(?:(?:))/

First of all, I don't understand what /\1/ is. If I read the spec (ES6, ES5) right, then this should throw a SyntaxError, as there are not enough NcapturingParens in the regex. If I test it in my browser (old Opera, FF), this is a valid expression however, which happily matches "\1" (yes, that's String.fromCharCode(1)).
Neither of these behaviours is reflected in the tests, though. Instead, they do expect

  • "Back-reference not scoped to containing RegExp" but instead referencing a group in the result regexp
  • "un-bindable back-reference" to be rewritten to a simple consume-nothing (?:)

which imo both collide with the goal that

RegExp instances are treated like the set of substrings they match

The rewriting of backreferences (both from the template, when "interrupted", and from the interpolation value, to reference the same group as before) seem to reasonable in contrast.

@bergus
Copy link
Collaborator Author

bergus commented Aug 12, 2015

Ah, I just came across this by chance: They are octal escape sequences, just like the ones in string literals. \0 to \7, \00 to \77, and \000 to \377 make single characters, not backreferences. A thing like /\9/ does however fail to match any input.

@mikesamuel
Copy link
Owner

Are you satisfied with the handling of back-references?

Would more tests or changes to documentation help others avoid your intitial confusion?

What are your thoughts about the semantic gap between

/\1/.exec('\u0001')

and

RegExp.make`()${/\1/}`.exec('\u0001')

@mikesamuel
Copy link
Owner

Closing. I don't think there's a point of disagreement or change requested here.

@bergus
Copy link
Collaborator Author

bergus commented Oct 13, 2015

OK, I'd like to request a change for the tests to match the draft goals.
Or otherwise get an explanation how RegExp.make did behave in the current tests.

RegExp.make `^(#+)([^#\r\n]*)${ /\1/ }` /* should imo
 become  */ /^(#+)([^#\r\n]*)(?:\x01)/ /*
 not     */ /^(#+)([^#\r\n]*)(?:\1)/

RegExp.make `(fo(o))${ /(x)\1(?:\2)/ }bar${ /\1/ }(baz)` /* should imo
 become  */ /(fo(o))(?:(x)\3(?:\x02))bar(?:\x01)(baz)/ /*
 not     */ /(fo(o))(?:(x)\3(?:\2))bar(?:\1)(baz)/

RegExp.make `${ /\1/ }` /* should imo
 become  */ /(?:\x01)/ /*
 not     */ /(?:(?:))/

In short: capturing groups and backreferences should only refer to each other within the same regex or template, and not clash with interpolation.
"unbindable backreferences" should either lead to a SyntaxError or be treated like an octal escape.

@mikesamuel
Copy link
Owner

Fair enough. I'll not it in the doc. I think I probably agree with you, but there's no other way of specifying capturing groups right now in strict mode code since

`\1`

is an octal escape.

@mikesamuel mikesamuel reopened this Oct 14, 2015
@bergus
Copy link
Collaborator Author

bergus commented Oct 14, 2015

Oh, I didn't realize this was done because RegExp.make (1) \1`` doesn't work. Maybe we'd need to relax the template string syntax to allow this, and throw only when the string values are accessed (but now when only .raw is used)?

I'd guess that /\1/ should actually throw as well in strict mode instead of being interpreted as an octal escape.

@mikesamuel
Copy link
Owner

That wasn't the original reason I did it. I didn't read the spec closely enough to realize that /\1/ is equivalent to /\x01/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants