-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Number value literal lookahead restrictions #601
Conversation
I staged this PR atop the editorial-greedy-lexer branch since it depends on that fix. I'll update once that's reviewed and landed, but I wanted to get this up for discussion. |
This essentially means that we have to lookahead multiple characters in case of
Disallow any name NameStart character except |
You should only ever have to look one character ahead. It’s exactly this reason why This proposes that both Int and Float tokens cannot be followed by a letter. So in the case of 123efg: At position |
100d8fe
to
1ebfc40
Compare
73c0786
to
50ac83c
Compare
Updated to rebase on #599 and include some additional prose and non-normative notes explaining the impact of this restriction and the rationale. Suggested test cases (thanks @robzhu for suggesting to document these): These test cases document scenarios which have a change in behavior after applying this RFC (except for the last clarification test case for each set). Don't forget to also look at the test cases from #599 (see this comment).
|
Implements and adds the tests described by graphql/graphql-spec#601
Implements and adds the tests described by graphql/graphql-spec#601
Implements and adds the tests described by graphql/graphql-spec#601
Implements and adds the tests described by graphql/graphql-spec#601 Replicates graphql/graphql-js@ca1c1df
61050cb
to
8248e62
Compare
This RFC proposes adding a lookahead restriction to the IntValue and FloatValue lexical grammars to not allow following a number with a letter. **Problem:** Currently there are some language ambiguities and underspecification for lexing numbers which each implementation has handled slightly differently. Because commas are optional and white space isn't required between tokens, these two snippets are equivalent: `[123, abc]`, `[123abc]`. This may be confusing to read, but it should parse correctly. However the opposite is not true, since digits may belong in a Name, the following two are *not* equivalent: `[abc, 123]`, `[abc123]`. This could lead to mistakes. Ambiguity and underspecification enter when the Name starts with "e", since "e" indicats the beginning of an exponent in a FloatValue. `123efg` is a lexical error in GraphQL.js which greedily starts to lex a FloatValue when it encounters the "e", however you might also expect it to validly lex (`123`, `efg`) and some implementations might do this. Further, other languages offer variations of numeric literals which GraphQL does not support, such as hexidecimal literals. The input `0x1F` properly lexes as (`0`, `x`, `1`, `F`) however this is very likely a confusing syntax error. A similar issue exists for some languages which allow underscores in numbers for readability, `1_000` lexes a `1` and `_` but fails when `000` is not a valid number. **Proposed Solution:** Add a lookahead restriction to IntValue and FloatValue to disallow any NameStart character (including letters and `_`) to follow. This makes it clear that `1e5` can only possibly be one FloatValue and not three tokens, makes lexer errors specified clearly to remove ambiguity, and provides clear errors for mistaken input. **Precedent** Javascript applies this same restriction for similar reasons, I believe originally to produce an early error if C-style typed literals were used in a Javascript program. https://www.ecma-international.org/ecma-262/10.0/index.html#sec-literals-numeric-literals **Cost of change** While this is *technically* a breaking change to the language grammar, it seeks to restrict cases that are almost certainly already producing either syntax or validation errors. This is different from the current implementation of GraphQL.js and I believe other parsers, and will require minor implementation updates.
50ac83c
to
4e3c343
Compare
Some edge cases around numbers were not handled as expected. This commit adds test cases from the 2 RFCs clarifying the expected behaviour ( graphql/graphql-spec#601, graphql/graphql-spec#599) and updates the Lexer to match. This is technically a breaking change but most cases were likely to lead to validation errors (e.g. "0xF1" being parsed as [0, xF1] when expecting a list of integers).
Some edge cases around numbers were not handled as expected. This commit adds test cases from the 2 RFCs clarifying the expected behaviour ( graphql/graphql-spec#601, graphql/graphql-spec#599) and updates the Lexer to match. This is technically a breaking change but most cases were likely to lead to validation errors (e.g. "0xF1" being parsed as [0, xF1] when expecting a list of integers).
This RFC proposes adding a lookahead restriction to the IntValue and FloatValue lexical grammars to not allow following a number with a letter.
Problem:
Currently there are some language ambiguities and underspecification for lexing numbers which each implementation has handled slightly differently.
Because commas are optional and white space isn't required between tokens, these two snippets are equivalent:
[123, abc]
,[123abc]
. This may be confusing to read, but it should parse correctly. However the opposite is not true, since digits may belong in a Name, the following two are not equivalent:[abc, 123]
,[abc123]
. This could lead to mistakes.Ambiguity and underspecification enter when the Name starts with "e", since "e" indicats the beginning of an exponent in a FloatValue.
123efg
is a lexical error in GraphQL.js which greedily starts to lex a FloatValue when it encounters the "e", however you might also expect it to validly lex (123
,efg
) and some implementations might do this.Further, other languages offer variations of numeric literals which GraphQL does not support, such as hexidecimal literals. The input
0x1F
properly lexes as (0
,x
,1
,F
) however this is very likely a confusing syntax error. A similar issue exists for some languages which allow underscores in numbers for readability,1_000
lexes a1
and_
but fails when000
is not a valid number.Proposed Solution:
Add a lookahead restriction to IntValue and FloatValue to disallow any NameStart character (including letters and
_
) to follow.This makes it clear that
1e5
can only possibly be one FloatValue and not three tokens, makes lexer errors specified clearly to remove ambiguity, and provides clear errors for mistaken input.Precedent
Javascript applies this same restriction for similar reasons, I believe originally to produce an early error if C-style typed literals were used in a Javascript program.
https://www.ecma-international.org/ecma-262/10.0/index.html#sec-literals-numeric-literals
Cost of change
While this is technically a breaking change to the language grammar, it seeks to restrict cases that are almost certainly already producing either syntax or validation errors.
This is different from the current implementation of GraphQL.js and I believe other parsers, and will require minor implementation updates.