Skip to content
This repository has been archived by the owner on Jan 26, 2022. It is now read-only.

Spec does not parse 1 + ^==x #13

Closed
js-choi opened this issue Aug 27, 2021 · 10 comments
Closed

Spec does not parse 1 + ^==x #13

js-choi opened this issue Aug 27, 2021 · 10 comments
Assignees

Comments

@js-choi
Copy link
Collaborator

js-choi commented Aug 27, 2021

The specification currently treats %== and %=== as single punctuators (in order to disambiguate with the %= assignment operator in the longest-matching tokenizer; see #2). It essentially uses %== and %=== as prefix expression operators. @waldemarhorwat has pointed out that this does not address expressions containing %== or %=== with which the % is inside a preceding expression.

For example, 1 + %==x should be (1 + %)==x but the specification parses it as 1 + (%==x).

I’m not yet sure what is the best way to go about fixing this. (The Babel plugin will also need a test for 1 + (%==x).)

CC @tabatkins, @nicolo-ribaudo, @JLHwung.

@tabatkins
Copy link
Collaborator

I'm not very familiar with JS's syntax details, but I have to assume that the "correct" solution here is going to be to remove the %= token from the tokenizer entirely and switch to handling that in the grammar as two tokens, rather than leaving it in and hacking around it with more tokenization tweaks. Tokenization precedence just has no particular reason to be consistent with operator precedence, so it's always going to be problematic to try and address this at that level.

@js-choi
Copy link
Collaborator Author

js-choi commented Aug 27, 2021

In other words, the best way forward might be to replace all instances of the %= punctuator with % [contiguous] =, where [contiguous] is a new syntactic-grammar annotation, which would be defined in The Syntactic Grammar:

If the phrase “[contiguous]” appears in the right-hand side of a production of the syntactic grammar, it indicates that the production may not be used if there are any discarded tokens (i.e., simple white space, single-line comments, and any |MultiLineComment| that contains no line terminator) at the indicated position. In other words, a production with “[contiguous]” may be used only if the production immediately preceding the indicated position is, in the original source text, contiguous with the production immediately following the indicated position.

I wonder if the decorator proposal could use a similar [contiguous] annotation for @decorator.

Anyways, my bandwidth is limited this week, but I’ll see about changing the proposal specification as soon as I can…probably after the upcoming meeting.

@tabatkins
Copy link
Collaborator

That's fine, we don't need to fix everything up for the stage advancement so long as there's a plausible plan for fixing it in stage 3.

Your approach sounds plausible to me, tho. I wonder if it's actually needed, tho? It makes sense, certainly, but is it a problem to have x %/**/= 5; be a valid way to modulo-assign? That would be invalid code today anyway.

I'm def not opposed, tho; exposing weird parsing details to authors is usually bad, and excessive flexibility can make conflicts between proposals accidentally more likely.

@ljharb
Copy link
Member

ljharb commented Aug 27, 2021

I’m a bit confused, what does %= syntax do in hack pipes?

@tabatkins
Copy link
Collaborator

You can assign in a pipe body, since assignments are expression-level. So you can write val |> (x %= %) just fine.

The problem is that, since %= is currently a token in the tokenizer, it means that expressions like val |> %==2 are parsed, uh, badly. Current spec tries to fix that by handling the == and === cases explicitly with %== and %=== tokens, making them essentially prefix operators that execute a comparison, but that meshes badly with code like val |> 1 + %==2, which should be equivalent to val |> (1 + %)==2 but isn't with this approach.

@js-choi
Copy link
Collaborator Author

js-choi commented Aug 27, 2021

@ljharb: Yes, basically the big problem is that v|>%==0 would incorrectly get tokenized into v |> %= = 0 because the tokenizer would longest-match %= before %. It’s still theoretically unambiguous, but the tokenizer’s longest matching doesn’t do the right thing in this case.

The specification tries to work around this by adding %== and %=== as tokens, so that v|>%==0 gets tokenized into v |> %== 0, where the %== token is meant to be equivalent to % ==.

But @waldemarhorwat pointed out that this workaround does not work either with v|>1+ %==0.

So in the end we might have to handle this syntactically rather than lexically: to remove %= from the tokenizer and turn the modulo operator into % [contiguous] =.

@js-choi
Copy link
Collaborator Author

js-choi commented Aug 27, 2021

@tabatkins: Yes, I don’t know if allowing x %/**/= 5 would be such a terrible thing, but it would be an inconsistency with the other assignment operators—yet another wart on the language, and one we can avoid.

(I think that the same is probably true for decorators after @—if we force decorator expressions to be contiguous with their @, and prohibit @ foo class {} or @{{line separator}}foo class {}, then that lessens the space for weirdness in case @ might also become an infix operator in the future.)

@js-choi js-choi self-assigned this Aug 27, 2021
@JLHwung
Copy link

JLHwung commented Aug 27, 2021

Note that currently the tokenizer's longest match is not always right: For example both Babel and V8 tokenizes /=2/g into /= 2 / g and then reinterprets them to a regex literal /=2/g. The case is unambiguous but the longest match /= does makes parser a bit more complicated: we can't tell if /= can form an operator without looking ahead, same for %= here.

I agree with @tabatkins that tokenizer precedence is not necessarily relevant to operator precedence, so it is reasonable to handle it in the syntactic level.

js-choi added a commit that referenced this issue Aug 31, 2021
In anticipation of changing `%=` to `%` [contiguous] `=`.
See #13.
@js-choi js-choi changed the title Spec does not parse 1 + %==x Spec does not parse 1 + ^==x Sep 3, 2021
js-choi added a commit that referenced this issue Sep 3, 2021
@js-choi js-choi closed this as completed in 4a17ee8 Sep 3, 2021
@js-choi
Copy link
Collaborator Author

js-choi commented Sep 3, 2021

The spec has been patched, and this parsing issue should now be fixed.

Replacing ^= with ^ [contiguous] = was surprisingly easy. The semantics of the assignment operators match based on “the sequence of Unicode code points associated with assignmentOpText” in a table.

ecmarkup didn’t like [contiguous], though; looks like its annotations are hard-coded. I had to add [contiguous] manually back to the spec’s compiled HTML…

@js-choi
Copy link
Collaborator Author

js-choi commented Sep 4, 2021

There might be a better way to solve this than the new [contiguous] annotation (see tc39/ecmarkup#356 (comment)), but I’ll leave that change for later.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants