Skip to content

Commit

Permalink
Normative: Revert U+2212 in timezone offsets (#3334)
Browse files Browse the repository at this point in the history
Following ISO-8601, #2781 introduced U+2212 (Unicode minus) as an alias
for the regular ASCII minus sign for use in time zone offsets.

There's two new data that lead me to believe that this was a mistake,
and that we should revert this change.

The first is that the newly-released RFC 9557 (the string format
standard that Temporal uses) disallows non-ASCII characters. Its
predecessor RFC 3339 also disallows non-ASCII characters. So
strings that follow the current (since 2022) ECMAScript spec
could be rejected by RFC 9557 clients.

The second new data is feedback from implementers of a Rust version of
Temporal that this single obscure character in the grammar will incur a
performance cost because they must now use Rust strings instead
of plain U8 ASCII data. See
tc39/proposal-temporal#2843 (comment)

This performance issue doesn't seem to be limited to Rust. Any
native implementation would likely benefit from being able to know that
valid date/time input (both Date and Temporal) is always ASCII-only.

I don't know whether all engines have actually implemented this 2022
grammar change. But it's also a safe bet that real-world usage of this
Unicode character is likely minimal. So the web-compat risk seems small.

If this PR is accepted, then we'll follow up with a normative Temporal
PR to remove this character from Temporal as well.
  • Loading branch information
justingrant authored and ljharb committed Jul 17, 2024
1 parent 7a23e0e commit e7f1e5d
Showing 1 changed file with 6 additions and 41 deletions.
47 changes: 6 additions & 41 deletions spec.html
Original file line number Diff line number Diff line change
Expand Up @@ -33183,48 +33183,14 @@ <h1>Time Zone Offset String Format</h1>
<p>
ECMAScript defines a string interchange format for UTC offsets, derived from ISO 8601.
The format is described by the following grammar.
The usage of Unicode code points in this grammar is listed in <emu-xref href="#table-time-zone-offset-string-code-points"></emu-xref>.
</p>

<emu-table id="table-time-zone-offset-string-code-points" caption="Time Zone Offset String Code Points">
<table>
<thead>
<tr>
<th>
Code Point
</th>
<th>
Unicode Name
</th>
<th>
Abbreviation
</th>
</tr>
</thead>
<tr>
<td>
`U+2212`
</td>
<td>
MINUS SIGN
</td>
<td>
&lt;MINUS>
</td>
</tr>
</table>
</emu-table>

<h2>Syntax</h2>
<emu-grammar type="definition">
UTCOffset :::
TemporalSign Hour
TemporalSign Hour HourSubcomponents[+Extended]
TemporalSign Hour HourSubcomponents[~Extended]

TemporalSign :::
ASCIISign
&lt;MINUS&gt;
ASCIISign Hour
ASCIISign Hour HourSubcomponents[+Extended]
ASCIISign Hour HourSubcomponents[~Extended]

ASCIISign ::: one of
`+` `-`
Expand Down Expand Up @@ -33298,9 +33264,9 @@ <h1>
<emu-alg>
1. Let _parseResult_ be ParseText(_offsetString_, |UTCOffset|).
1. Assert: _parseResult_ is not a List of errors.
1. Assert: _parseResult_ contains a |TemporalSign| Parse Node.
1. Let _parsedSign_ be the source text matched by the |TemporalSign| Parse Node contained within _parseResult_.
1. If _parsedSign_ is the single code point U+002D (HYPHEN-MINUS) or U+2212 (MINUS SIGN), then
1. Assert: _parseResult_ contains a |ASCIISign| Parse Node.
1. Let _parsedSign_ be the source text matched by the |ASCIISign| Parse Node contained within _parseResult_.
1. If _parsedSign_ is the single code point U+002D (HYPHEN-MINUS), then
1. Let _sign_ be -1.
1. Else,
1. Let _sign_ be 1.
Expand Down Expand Up @@ -50091,7 +50057,6 @@ <h1>Number Conversions</h1>
<emu-annex id="sec-time-zone-offset-string-format">
<h1>Time Zone Offset String Format</h1>
<emu-prodref name="UTCOffset"></emu-prodref>
<emu-prodref name="TemporalSign"></emu-prodref>
<emu-prodref name="ASCIISign"></emu-prodref>
<emu-prodref name="Hour"></emu-prodref>
<emu-prodref name="HourSubcomponents"></emu-prodref>
Expand Down

0 comments on commit e7f1e5d

Please sign in to comment.