-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Editorial: Simplify parsing UTC offset strings #1917
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1917 +/- ##
=======================================
Coverage 94.91% 94.91%
=======================================
Files 19 19
Lines 10990 10990
Branches 1596 1596
=======================================
Hits 10431 10431
Misses 541 541
Partials 18 18
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes appear to be improvements, but I have several followup questions.
spec/abstractops.html
Outdated
1. Throw a *RangeError* exception. | ||
1. Let _z_, _sign_, _hours_, _minutes_, _seconds_, _fraction_ and _name_ be the parts of _isoString_ produced respectively by the |UTCDesignator|, |TimeZoneUTCOffsetSign|, |TimeZoneUTCOffsetHour|, |TimeZoneUTCOffsetMinute|, |TimeZoneUTCOffsetSecond|, |TimeZoneUTCOffsetFractionalPart|, and |TimeZoneIANAName| productions, or *undefined* if not present. | ||
1. If _z_ is not *undefined*, then | ||
1. Let each of _z_, _offsetString_, and _name_ be the source text matched by the |UTCDesignator|, |TimeZoneNumericUTCOffset|, and |TimeZoneIANAName| Parse Node enclosed by _parseResult_, or an empty sequence of code points if not present. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Let each of _z_, _offsetString_, and _name_ be the source text matched by the |UTCDesignator|, |TimeZoneNumericUTCOffset|, and |TimeZoneIANAName| Parse Node enclosed by _parseResult_, or an empty sequence of code points if not present. | |
1. Let each of _z_, _offsetString_, and _name_ be the source text matched by the respective |UTCDesignator|, |TimeZoneNumericUTCOffset|, and |TimeZoneIANAName| Parse Node enclosed by _parseResult_, or an empty sequence of code points if not present. |
Also, I finally dug into this part of the grammar and found some potential issues:
- "Z" and "z" are valid expansions of TimeZoneIANAName (relevant for e.g.
Temporal.TimeZone.from
). - "Etc/GMT+hour" strings are not matched by TimeZoneIANAName while "Etc/GMT-hour" strings are, despite both being in the IANA Time Zone Database.
- TimeZoneBracketedAnnotation matches "Etc/GMT±0digit" strings, which are not in the IANA Time Zone Database (the current range is actually asymmetric—Etc/GMT+1 to Etc/GMT+12 and Etc/GMT-1 to Etc/GMT-14 plus Etc/GMT±0, but I don't see a problem with requiring Temporal implementations to support everything up to Etc/GMT±23).
- The result of the preceding two points is ambiguous parsing for "Etc/GMT±0digit" strings from TimeZoneBracketedName. Their special-case treatment should probably be either removed or moved into TimeZoneIANAName with its existing TimeZoneIANANameTail production prohibited to match e.g. TimeZoneGMTName.
- There also seems to be ambiguous parsing for "…±offset[±offset]" strings from TimeZone (matched by both TimeZoneOffsetRequired and TimeZoneNameRequired), although I'm not 100% certain that's a problem.
- The whole tree is so deep and some productions are so long that it's quite difficult to comprehend. I think it could be improved by condensing productions, in some cases by parameterization like replacing
TimeZoneNumericUTCOffset
withTimeZoneUTCOffset[+onlyNumeric]
andASCIISign
withSign[+onlyASCII]
and in others with early errors like "if the source text matched by {TimeZoneIANANameComponent,CalendarNameComponent,FractionalPart} contains more than <N> code points". - What is the reason for distinguishing TimeZoneNumericUTCOffset and TimeZoneUTCOffsetName and their subproductions, all of which seem to match exactly the same input?
- Relatedly, how does this code handle an expansion of TimeZone that includes both TimeZoneNumericUTCOffset and TimeZoneUTCOffsetName (the former through TimeZoneUTCOffset and the latter through TimeZoneBracketedAnnotation)? It seems to discard the TimeZoneUTCOffsetName data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed review, I'm just going through to make sure that these things are captured in issues. I think they are out of scope for this PR.
- "Z" and "z" are valid expansions of TimeZoneIANAName (relevant for e.g.
Temporal.TimeZone.from
).
I don't think this is a problem — Temporal.TimeZone.from('Z')
will fail because there's no such IANA named time zone, just like Temporal.TimeZone.from('Y')
. Ditto for Temporal.ZonedDateTime.from('2022-01-11T19:43Z[Z]')
.
"Etc/GMT+hour" strings are not matched by TimeZoneIANAName while "Etc/GMT-hour" strings are, despite both being in the IANA Time Zone Database.
TimeZoneBracketedAnnotation matches "Etc/GMT±0_digit_" strings, which are not in the IANA Time Zone Database (the current range is actually asymmetric—Etc/GMT+1 to Etc/GMT+12 and Etc/GMT-1 to Etc/GMT-14 plus Etc/GMT±0, but I don't see a problem with requiring Temporal implementations to support everything up to Etc/GMT±23).
The result of the preceding two points is ambiguous parsing for "Etc/GMT±0_digit_" strings from TimeZoneBracketedName. Their special-case treatment should probably be either removed or moved into TimeZoneIANAName with its existing TimeZoneIANANameTail production prohibited to match e.g. TimeZoneGMTName.
The above points are covered in #1993, I think.
- There also seems to be ambiguous parsing for "…±offset[±offset]" strings from TimeZone (matched by both TimeZoneOffsetRequired and TimeZoneNameRequired), although I'm not 100% certain that's a problem.
I don't really think this is a problem either, but for the sake of caution we could fix it by rewriting TimeZone:
TimeZone :
- TimeZoneUTCOffset TimeZoneBracketedAnnotation(opt)
- TimeZoneBracketedAnnotation
- The whole tree is so deep and some productions are so long that it's quite difficult to comprehend. I think it could be improved by condensing productions, in some cases by parameterization like replacing
TimeZoneNumericUTCOffset
withTimeZoneUTCOffset[+onlyNumeric]
andASCIISign
withSign[+onlyASCII]
and in others with early errors like "if the source text matched by {TimeZoneIANANameComponent,CalendarNameComponent,FractionalPart} contains more than code points".
I think this is covered in #1984.
- What is the reason for distinguishing TimeZoneNumericUTCOffset and TimeZoneUTCOffsetName and their subproductions, all of which seem to match exactly the same input?
This is to avoid ambiguity in strings like 1976-11-18T15:23:30+01:00[+01:00]
where TimeZoneNumericUTCOffset matches the first +01:00
and TimeZoneUTCOffsetName matches the second +01:00
. Otherwise the code using these productions in ParseZonedDateTimeString can't refer unambiguously to the correct parse node.
- Relatedly, how does this code handle an expansion of TimeZone that includes both TimeZoneNumericUTCOffset and TimeZoneUTCOffsetName (the former through TimeZoneUTCOffset and the latter through TimeZoneBracketedAnnotation)? It seems to discard the TimeZoneUTCOffsetName data.
It depends on which from()
function is parsing the string. Off the top of my head — in the case of Instant, the bracketed annotation is discarded. In the case of ZonedDateTime and TimeZone, the numeric offset is discarded (in ZonedDateTime, only after verifying that they match). In other cases, both are discarded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise the code using these productions in ParseZonedDateTimeString can't refer unambiguously to the correct parse node.
That should be possible without introducing distinct nonterminals, see ECMA-262 The if
Statement for an example ("the first Statement" vs. "the second Statement").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Including if either of them are optional? (so "first" and "second" are not always the case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've captured this as #2006 to address later. Mind if I go on and merge this?
This fixes one instance of _timeZoneResult_.[[Offset]] which was not a slot on the Record returned by that operation. It should be [[OffsetString]]. Use the name "offset string" more consistently when referring to offset strings, rather than "offset", to avoid confusion with the offset option or offset in nanoseconds.
There is no need to parse the offset string and format it again, only to pass it to CreateTemporalTimeZone which parses it again. Also use the ParseText parsing machinery instead of the handwavy "satisfies the syntax of" language.
…meZoneString We don't need to canonicalize the offset string in this operation, because all callers pass the resulting offset string to ParseTimeZoneOffsetString. Also use the ParseText parsing machinery instead of the handwavy "satisfies the syntax of" language.
I don't think this is a problem, but this removes the possible ambiguity where both TimeZoneOffsetRequired and TimeZoneNameRequired could match the same string.
…rithmetic These operations cannot fail because they values they use as input already come from valid durations (sometimes negated).
tc39/proposal-temporal#1917 tc39/proposal-temporal#1953 Bug: v8:11544 Change-Id: I667980e312248ccbaf826d4e3104fb1ddabef890 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3721464 Commit-Queue: Frank Tang <ftang@chromium.org> Reviewed-by: Shu-yu Guo <syg@chromium.org> Cr-Commit-Position: refs/heads/main@{#81453}
I noticed that in a few places we parse a UTC offset string into a nanosecond offset, only to turn it back into an offset string later. This can be simplified with an editorial change.
While I was working on this, I changed the UTC offset parsing algorithms to use ParseText like @gibson042 did in #1907.