Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editorial: Simplify parsing UTC offset strings #1917

Merged
merged 5 commits into from
Jan 19, 2022
Merged

Editorial: Simplify parsing UTC offset strings #1917

merged 5 commits into from
Jan 19, 2022

Conversation

ptomato
Copy link
Collaborator

@ptomato ptomato commented Nov 13, 2021

I noticed that in a few places we parse a UTC offset string into a nanosecond offset, only to turn it back into an offset string later. This can be simplified with an editorial change.

While I was working on this, I changed the UTC offset parsing algorithms to use ParseText like @gibson042 did in #1907.

@ptomato ptomato added the spec-text Specification text involved label Nov 13, 2021
@codecov
Copy link

codecov bot commented Nov 13, 2021

Codecov Report

Merging #1917 (58e30a6) into main (a16fcfa) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1917   +/-   ##
=======================================
  Coverage   94.91%   94.91%           
=======================================
  Files          19       19           
  Lines       10990    10990           
  Branches     1596     1596           
=======================================
  Hits        10431    10431           
  Misses        541      541           
  Partials       18       18           
Flag Coverage Δ
test262 79.42% <ø> (ø)
tests 89.65% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a16fcfa...58e30a6. Read the comment docs.

Copy link
Collaborator

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes appear to be improvements, but I have several followup questions.

1. Throw a *RangeError* exception.
1. Let _z_, _sign_, _hours_, _minutes_, _seconds_, _fraction_ and _name_ be the parts of _isoString_ produced respectively by the |UTCDesignator|, |TimeZoneUTCOffsetSign|, |TimeZoneUTCOffsetHour|, |TimeZoneUTCOffsetMinute|, |TimeZoneUTCOffsetSecond|, |TimeZoneUTCOffsetFractionalPart|, and |TimeZoneIANAName| productions, or *undefined* if not present.
1. If _z_ is not *undefined*, then
1. Let each of _z_, _offsetString_, and _name_ be the source text matched by the |UTCDesignator|, |TimeZoneNumericUTCOffset|, and |TimeZoneIANAName| Parse Node enclosed by _parseResult_, or an empty sequence of code points if not present.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Let each of _z_, _offsetString_, and _name_ be the source text matched by the |UTCDesignator|, |TimeZoneNumericUTCOffset|, and |TimeZoneIANAName| Parse Node enclosed by _parseResult_, or an empty sequence of code points if not present.
1. Let each of _z_, _offsetString_, and _name_ be the source text matched by the respective |UTCDesignator|, |TimeZoneNumericUTCOffset|, and |TimeZoneIANAName| Parse Node enclosed by _parseResult_, or an empty sequence of code points if not present.

Also, I finally dug into this part of the grammar and found some potential issues:

  • "Z" and "z" are valid expansions of TimeZoneIANAName (relevant for e.g. Temporal.TimeZone.from).
  • "Etc/GMT+hour" strings are not matched by TimeZoneIANAName while "Etc/GMT-hour" strings are, despite both being in the IANA Time Zone Database.
  • TimeZoneBracketedAnnotation matches "Etc/GMT±0digit" strings, which are not in the IANA Time Zone Database (the current range is actually asymmetric—Etc/GMT+1 to Etc/GMT+12 and Etc/GMT-1 to Etc/GMT-14 plus Etc/GMT±0, but I don't see a problem with requiring Temporal implementations to support everything up to Etc/GMT±23).
  • The result of the preceding two points is ambiguous parsing for "Etc/GMT±0digit" strings from TimeZoneBracketedName. Their special-case treatment should probably be either removed or moved into TimeZoneIANAName with its existing TimeZoneIANANameTail production prohibited to match e.g. TimeZoneGMTName.
  • There also seems to be ambiguous parsing for "…±offset[±offset]" strings from TimeZone (matched by both TimeZoneOffsetRequired and TimeZoneNameRequired), although I'm not 100% certain that's a problem.
  • The whole tree is so deep and some productions are so long that it's quite difficult to comprehend. I think it could be improved by condensing productions, in some cases by parameterization like replacing TimeZoneNumericUTCOffset with TimeZoneUTCOffset[+onlyNumeric] and ASCIISign with Sign[+onlyASCII] and in others with early errors like "if the source text matched by {TimeZoneIANANameComponent,CalendarNameComponent,FractionalPart} contains more than <N> code points".
  • What is the reason for distinguishing TimeZoneNumericUTCOffset and TimeZoneUTCOffsetName and their subproductions, all of which seem to match exactly the same input?
  • Relatedly, how does this code handle an expansion of TimeZone that includes both TimeZoneNumericUTCOffset and TimeZoneUTCOffsetName (the former through TimeZoneUTCOffset and the latter through TimeZoneBracketedAnnotation)? It seems to discard the TimeZoneUTCOffsetName data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed review, I'm just going through to make sure that these things are captured in issues. I think they are out of scope for this PR.

  • "Z" and "z" are valid expansions of TimeZoneIANAName (relevant for e.g. Temporal.TimeZone.from).

I don't think this is a problem — Temporal.TimeZone.from('Z') will fail because there's no such IANA named time zone, just like Temporal.TimeZone.from('Y'). Ditto for Temporal.ZonedDateTime.from('2022-01-11T19:43Z[Z]').

  • "Etc/GMT+hour" strings are not matched by TimeZoneIANAName while "Etc/GMT-hour" strings are, despite both being in the IANA Time Zone Database.

  • TimeZoneBracketedAnnotation matches "Etc/GMT±0_digit_" strings, which are not in the IANA Time Zone Database (the current range is actually asymmetric—Etc/GMT+1 to Etc/GMT+12 and Etc/GMT-1 to Etc/GMT-14 plus Etc/GMT±0, but I don't see a problem with requiring Temporal implementations to support everything up to Etc/GMT±23).

  • The result of the preceding two points is ambiguous parsing for "Etc/GMT±0_digit_" strings from TimeZoneBracketedName. Their special-case treatment should probably be either removed or moved into TimeZoneIANAName with its existing TimeZoneIANANameTail production prohibited to match e.g. TimeZoneGMTName.

The above points are covered in #1993, I think.

  • There also seems to be ambiguous parsing for "…±offset[±offset]" strings from TimeZone (matched by both TimeZoneOffsetRequired and TimeZoneNameRequired), although I'm not 100% certain that's a problem.

I don't really think this is a problem either, but for the sake of caution we could fix it by rewriting TimeZone:

TimeZone :

  • TimeZoneUTCOffset TimeZoneBracketedAnnotation(opt)
  • TimeZoneBracketedAnnotation
  • The whole tree is so deep and some productions are so long that it's quite difficult to comprehend. I think it could be improved by condensing productions, in some cases by parameterization like replacing TimeZoneNumericUTCOffset with TimeZoneUTCOffset[+onlyNumeric] and ASCIISign with Sign[+onlyASCII] and in others with early errors like "if the source text matched by {TimeZoneIANANameComponent,CalendarNameComponent,FractionalPart} contains more than code points".

I think this is covered in #1984.

  • What is the reason for distinguishing TimeZoneNumericUTCOffset and TimeZoneUTCOffsetName and their subproductions, all of which seem to match exactly the same input?

This is to avoid ambiguity in strings like 1976-11-18T15:23:30+01:00[+01:00] where TimeZoneNumericUTCOffset matches the first +01:00 and TimeZoneUTCOffsetName matches the second +01:00. Otherwise the code using these productions in ParseZonedDateTimeString can't refer unambiguously to the correct parse node.

  • Relatedly, how does this code handle an expansion of TimeZone that includes both TimeZoneNumericUTCOffset and TimeZoneUTCOffsetName (the former through TimeZoneUTCOffset and the latter through TimeZoneBracketedAnnotation)? It seems to discard the TimeZoneUTCOffsetName data.

It depends on which from() function is parsing the string. Off the top of my head — in the case of Instant, the bracketed annotation is discarded. In the case of ZonedDateTime and TimeZone, the numeric offset is discarded (in ZonedDateTime, only after verifying that they match). In other cases, both are discarded.

Copy link
Collaborator

@gibson042 gibson042 Jan 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise the code using these productions in ParseZonedDateTimeString can't refer unambiguously to the correct parse node.

That should be possible without introducing distinct nonterminals, see ECMA-262 The if Statement for an example ("the first Statement" vs. "the second Statement").

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including if either of them are optional? (so "first" and "second" are not always the case)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've captured this as #2006 to address later. Mind if I go on and merge this?

This fixes one instance of _timeZoneResult_.[[Offset]] which was not a
slot on the Record returned by that operation. It should be
[[OffsetString]].

Use the name "offset string" more consistently when referring to offset
strings, rather than "offset", to avoid confusion with the offset option
or offset in nanoseconds.
There is no need to parse the offset string and format it again, only to
pass it to CreateTemporalTimeZone which parses it again.

Also use the ParseText parsing machinery instead of the handwavy
"satisfies the syntax of" language.
…meZoneString

We don't need to canonicalize the offset string in this operation, because
all callers pass the resulting offset string to ParseTimeZoneOffsetString.

Also use the ParseText parsing machinery instead of the handwavy
"satisfies the syntax of" language.
I don't think this is a problem, but this removes the possible ambiguity
where both TimeZoneOffsetRequired and TimeZoneNameRequired could match the
same string.
…rithmetic

These operations cannot fail because they values they use as input already
come from valid durations (sometimes negated).
@ptomato ptomato merged commit 05793a8 into main Jan 19, 2022
@ptomato ptomato deleted the editorial branch January 19, 2022 21:51
luyahan pushed a commit to luyahan/v8 that referenced this pull request Jun 30, 2022
tc39/proposal-temporal#1917
tc39/proposal-temporal#1953

Bug: v8:11544
Change-Id: I667980e312248ccbaf826d4e3104fb1ddabef890
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3721464
Commit-Queue: Frank Tang <ftang@chromium.org>
Reviewed-by: Shu-yu Guo <syg@chromium.org>
Cr-Commit-Position: refs/heads/main@{#81453}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec-text Specification text involved
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants