You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This fixes some corner cases in the HIR printer where it would print the
concrete syntax of a regex that does not match the natural
interpretation of the HIR. One such example of this is:
concat(a, alt(b, c))
This would get printed as
ab|c
But clearly, it should be printed as:
a(?:b|c)
The issue here is that the printer only considers the current HirKind
when determining how to print it. Sometimes a group is needed to print
an alt (and even a concat, in the case of 'rep(+, concat(a, b))'), but
sometimes it isn't.
We could address this in a few different ways:
1) Always print concats and alts inside a non-capturing group.
2) Make the printer detect precisely the cases where a non-capturing
group is needed.
3) Make the HIR smart constructors insert non-capturing groups when
needed.
4) Do some other thing to change the HIR to prevent these sorts of
things by construction.
This patch goes with (1). The reason in favor of it is that HIR printer
was always about printing an equivalent regex and never about trying to
print a "nice" regex. Indeed, the HIR printer can't print a nice regex,
because the HIR represents a rigorously simplifed view of a regex to
make analysis easier. (The most obvious such example are Unicode
character classes. For example, the HIR printer never prints '\w'.) So
inserting some extra groups (which it already does) even when they
aren't strictly needed is perfectly okay.
But still, it's useful to say why we didn't do the other choices:
2) Modifying the printer to only print groups when they're actually
needed is pretty difficult. I tried this briefly, and handling this
case requires some notion of what the parent expression is. This
winds up being a possible but hairy change.
3) Making the HIR more complicated to make the printer correct seems
like it's optimizing for the wrong thing. Inserting extra groups in
places just obfuscates HIR values that already have clear semantics.
That is, use concat(a, alt(b, c)) over concat(a, group(alt(b, c))).
4) It's not clear how we would change the HIR to guarantee this sort of
thing wouldn't happen. At the very least, it seems likely it would
require a more complex data type.
At first, I had thought (1) seemed inelegant. But the more I thought
about it, the more it seemed quite consistent with how the HIR printer
already worked. So that's the path I took here.
Closes#516, Closes#731
0 commit comments