From 09cdaecf3a4d08c9c53092ce83b0dede40adcba6 Mon Sep 17 00:00:00 2001 From: Lee Byron Date: Fri, 10 Jan 2020 13:24:34 -0800 Subject: [PATCH] RFC: Number value literal lookahead restrictions (#601) This RFC proposes adding a lookahead restriction to the IntValue and FloatValue lexical grammars to not allow following a number with a letter. **Problem:** Currently there are some language ambiguities and underspecification for lexing numbers which each implementation has handled slightly differently. Because commas are optional and white space isn't required between tokens, these two snippets are equivalent: `[123, abc]`, `[123abc]`. This may be confusing to read, but it should parse correctly. However the opposite is not true, since digits may belong in a Name, the following two are *not* equivalent: `[abc, 123]`, `[abc123]`. This could lead to mistakes. Ambiguity and underspecification enter when the Name starts with "e", since "e" indicats the beginning of an exponent in a FloatValue. `123efg` is a lexical error in GraphQL.js which greedily starts to lex a FloatValue when it encounters the "e", however you might also expect it to validly lex (`123`, `efg`) and some implementations might do this. Further, other languages offer variations of numeric literals which GraphQL does not support, such as hexidecimal literals. The input `0x1F` properly lexes as (`0`, `x`, `1`, `F`) however this is very likely a confusing syntax error. A similar issue exists for some languages which allow underscores in numbers for readability, `1_000` lexes a `1` and `_` but fails when `000` is not a valid number. **Proposed Solution:** Add a lookahead restriction to IntValue and FloatValue to disallow any NameStart character (including letters and `_`) to follow. This makes it clear that `1e5` can only possibly be one FloatValue and not three tokens, makes lexer errors specified clearly to remove ambiguity, and provides clear errors for mistaken input. **Precedent** Javascript applies this same restriction for similar reasons, I believe originally to produce an early error if C-style typed literals were used in a Javascript program. https://www.ecma-international.org/ecma-262/10.0/index.html#sec-literals-numeric-literals **Cost of change** While this is *technically* a breaking change to the language grammar, it seeks to restrict cases that are almost certainly already producing either syntax or validation errors. This is different from the current implementation of GraphQL.js and I believe other parsers, and will require minor implementation updates. --- spec/Appendix B -- Grammar Summary.md | 8 ++++---- spec/Section 2 -- Language.md | 26 +++++++++++++++++--------- 2 files changed, 21 insertions(+), 13 deletions(-) diff --git a/spec/Appendix B -- Grammar Summary.md b/spec/Appendix B -- Grammar Summary.md index a0308e79c..48bb66227 100644 --- a/spec/Appendix B -- Grammar Summary.md +++ b/spec/Appendix B -- Grammar Summary.md @@ -68,7 +68,7 @@ Letter :: one of Digit :: one of `0` `1` `2` `3` `4` `5` `6` `7` `8` `9` -IntValue :: IntegerPart [lookahead != {Digit, `.`, ExponentPart}] +IntValue :: IntegerPart [lookahead != {Digit, `.`, NameStart}] IntegerPart :: - NegativeSign? 0 @@ -79,9 +79,9 @@ NegativeSign :: - NonZeroDigit :: Digit but not `0` FloatValue :: - - IntegerPart FractionalPart ExponentPart [lookahead != {Digit, `.`, ExponentIndicator}] - - IntegerPart FractionalPart [lookahead != {Digit, `.`, ExponentIndicator}] - - IntegerPart ExponentPart [lookahead != {Digit, `.`, ExponentIndicator}] + - IntegerPart FractionalPart ExponentPart [lookahead != {Digit, `.`, NameStart}] + - IntegerPart FractionalPart [lookahead != {Digit, `.`, NameStart}] + - IntegerPart ExponentPart [lookahead != {Digit, `.`, NameStart}] FractionalPart :: . Digit+ diff --git a/spec/Section 2 -- Language.md b/spec/Section 2 -- Language.md index 69b07f0c4..e9ec5bfd2 100644 --- a/spec/Section 2 -- Language.md +++ b/spec/Section 2 -- Language.md @@ -725,7 +725,7 @@ specified as a variable. List and inputs objects may also contain variables (unl ### Int Value -IntValue :: IntegerPart [lookahead != {Digit, `.`, ExponentIndicator}] +IntValue :: IntegerPart [lookahead != {Digit, `.`, NameStart}] IntegerPart :: - NegativeSign? 0 @@ -744,16 +744,18 @@ token is always the longest possible valid sequence. The source characters {2}. This also means the source {00} is invalid since it can neither be interpreted as a single token nor two {0} tokens. -An {IntValue} must not be followed by a {.} or {ExponentIndicator}. If either -follows then the token must only be interpreted as a possible {FloatValue}. +An {IntValue} must not be followed by a {`.`} or {NameStart}. If either {`.`} or +{ExponentIndicator} follows then the token must only be interpreted as a +possible {FloatValue}. No other {NameStart} character can follow. For example +the sequences `0x123` and `123L` have no valid lexical representations. ### Float Value FloatValue :: - - IntegerPart FractionalPart ExponentPart [lookahead != {Digit, `.`, ExponentIndicator}] - - IntegerPart FractionalPart [lookahead != {Digit, `.`, ExponentIndicator}] - - IntegerPart ExponentPart [lookahead != {Digit, `.`, ExponentIndicator}] + - IntegerPart FractionalPart ExponentPart [lookahead != {Digit, `.`, NameStart}] + - IntegerPart FractionalPart [lookahead != {Digit, `.`, NameStart}] + - IntegerPart ExponentPart [lookahead != {Digit, `.`, NameStart}] FractionalPart :: . Digit+ @@ -772,9 +774,15 @@ token is always the longest possible valid sequence. The source characters {1.23} cannot be interpreted as two tokens since {1.2} is followed by the {Digit} {3}. -A {FloatValue} must not be followed by a {.} or {ExponentIndicator}. If either -follows then a parse error occurs. For example, the sequence {1.23.4} cannot be -interpreted as two tokens ({1.2}, {3.4}). +A {FloatValue} must not be followed by a {.}. For example, the sequence {1.23.4} +cannot be interpreted as two tokens ({1.2}, {3.4}). + +A {FloatValue} must not be followed by a {NameStart}. For example the sequence +`0x1.2p3` has no valid lexical representation. + +Note: The numeric literals {IntValue} and {FloatValue} both restrict being +immediately followed by a letter (or other {NameStart}) to reduce confusion +or unexpected behavior since GraphQL only supports decimal numbers. ### Boolean Value