Skip to content

Commit

Permalink
fixup: code points instead of code units
Browse files Browse the repository at this point in the history
  • Loading branch information
ljharb committed Mar 23, 2024
1 parent 6fc628f commit 6db8764
Showing 1 changed file with 19 additions and 13 deletions.
32 changes: 19 additions & 13 deletions spec.emu
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,12 @@ contributors: Jordan Harband
1. Let _escapedList_ be a new empty List.
1. For each code point _c_ in _cpList_, do
1. If _escapedList_ is empty and _c_ is matched by |DecimalDigit|, then
1. Append code unit U+005C (REVERSE SOLIDUS) to _escapedList_.
1. Append code unit U+0078 (LATIN SMALL LETTER X) to _escapedList_.
1. Append code unit U+0033 (DIGIT THREE) to _escapedList_.
1. Append code point U+005C (REVERSE SOLIDUS) to _escapedList_.
1. Append code point U+0078 (LATIN SMALL LETTER X) to _escapedList_.
1. Append code point U+0033 (DIGIT THREE) to _escapedList_.
1. Append _c_ to _escapedList_.
1. Else,
1. Append the code units in EncodeForRegExpEscape(_c_) to _escapedList_.
1. Append the code points in EncodeForRegExpEscape(_c_) to _escapedList_.
1. Return CodePointsToString(_escapedList_).
</emu-alg>

Expand All @@ -47,32 +47,38 @@ contributors: Jordan Harband
<emu-clause id="sec-encode" type="abstract operation">
<h1>
EncodeForRegExpEscape (
_c_: a code unit,
): a List of code units
_c_: a code point,
): a List of code points
</h1>
<dl class="header">
<dt>description</dt>
<dd>If the code unit represents a RegExp punctuator that needs escaping, or ASCII whitespace, it produces the code units for *"\x"* followed by the relevant escape code. If the code unit represents non-ASCII white space, it produces the code units for *"\u"* followed by the relevant escape code. Otherwise, it returns a List containing the original code unit.</dd>
</dl>

<emu-alg>
1. Let _codeUnits_ be a new empty List.
1. Let _codePoints_ be a new empty List.
1. Let _punctuators_ be the following String, which consists of every ASCII punctuator except U+005F (LOW LINE): *"(){}[]|,.?\*+-^$=<>\/#&!%:;@~'"`"*.
1. Let _toEscape_ be StringToCodePoints(_punctuators_).
1. If _toEscape_ contains _c_ or _c_ is matched by |WhiteSpace|, then
1. Append code unit U+005C (REVERSE SOLIDUS) to _codeUnits_.
1. Append code point U+005C (REVERSE SOLIDUS) to _codePoints_.
1. Let _hex_ be Number::toString(𝔽(_c_), 16).
1. If the length of _hex_ is 1 or 2, then
1. Set _hex_ to StringPad(_hex_, 2, *"0"*, ~start~).
1. Append code unit U+0078 (LATIN SMALL LETTER X) to _codeUnits_.
1. Append code point U+0078 (LATIN SMALL LETTER X) to _codePoints_.
1. Append the code points in StringToCodePoints(_hex_) to _codePoints_.
1. Else if the length of _hex_ is > 4, then
1. Append code point U+0075 (LATIN SMALL LETTER U) to _codePoints_.
1. Append code point U+007B (LEFT CURLY BRACKET) to _codePoints_.
1. Append the code points in StringToCodePoints(_hex_) to _codePoints_.
1. Append code point U+007D (RIGHT CURLY BRACKET) to _codePoints_.
1. Else,
1. Assert: The length of _hex_ is at most 4.
1. Set _hex_ to StringPad(_hex_, 4, *"0"*, ~start~).
1. Append code unit U+0075 (LATIN SMALL LETTER U) to _codeUnits_.
1. Append the code units in _hex_ to _codeUnits_.
1. Append code point U+0075 (LATIN SMALL LETTER U) to _codePoints_.
1. Append the code points in StringToCodePoints(_hex_) to _codePoints_.
1. Else,
1. Append _c_ to _codeUnits_.
1. Return _codeUnits_.
1. Append _c_ to _codePoints_.
1. Return _codePoints_.
</emu-alg>
</emu-clause>
</ins>
Expand Down

0 comments on commit 6db8764

Please sign in to comment.