Support document.write #6

kmcallister · 2014-07-31T20:47:05Z

The argument to document.write is a sequence of UCS-2 code units and we need a way to interface this with the UTF-8 parser. My plan is:

(Edit: Largely superseded by this proposal)

Convert to UTF-8 as soon as possible.
Convert invalid surrogate sequences to U+FFFD 'REPLACEMENT CHARACTER'. This is a deviation from the spec, but nobody has objected strongly in the course of various discussions. There was even talk of amending the spec to allow this behavior, since it's currently written under the assumption that all parsers use UCS-2 natively.
If a document.write input ends with a leading surrogate, we can't convert it yet, so save this single u16 in the BufferQueue alongside the UTF-8 buffers.
If a document.write input starts with a trailing surrogate, and there's a saved leading surrogate in the BufferQueue, then replace both with the appropriate Unicode character as UTF-8.
If the parser receives any other input and there's a saved leading surrogate, drop the saved surrogate and prepend U+FFFD to the input. (This means that a script split an invalid surrogate sequence across multiple document.write calls, or wrote a lone leading surrogate and then finished.)

The text was updated successfully, but these errors were encountered:

SimonSapin · 2014-09-24T18:48:24Z

As much as I’d like to, I don’t know that we can convince other implementations to replace lone surrogates with U+FFFD. For those that use UCS-2 internally (every one but us), this is pure overhead and has a performance cost.

And it’s not just document.write. Lone surrogates can end up anywhere in the DOM through APIs, and other browsers happily keep them there.

Another solution could be WTF-8: rust-lang/rust#12056 (comment). It’s a superset of UTF-8 (like UTF-8 is a superset of ASCII) that allows surrogates, but only if they’re unpaired. (Concatenating two WTF-8 strings is not just concatenating the bytes, but also needs to check for newly-paired surrogates at the boundary and converts them to the UTF-8 representation of a single code point.)

kmcallister · 2014-09-24T22:10:23Z

Is it out of the question that the spec would allow but not mandate U+FFFD replacement? When I brought this up before people seemed to think it was enough of a corner case that we could get away with it (spec wording changes or no)

SimonSapin · 2014-09-24T22:28:38Z

“Allow but not mandate” sounds bad for interop on principle, though I don’t know how much it really matters here. But even if we replace in document.write, surrogates can still get in through DOM or CSSOM APIs.

When this was brought up in CSS WG to replace in CSSOM, the conclusion was "no change". (Though it’s not clear to me the arguments for change were well represented then. I was in the meeting remotely in audio only with very bad sound quality.)

SimonSapin · 2014-10-05T16:36:21Z

WTF-8 is a thing now: http://www.mail-archive.com/dev-servo@lists.mozilla.org/msg00921.html

SimonSapin · 2014-10-06T16:04:33Z

I’ve changed my mind on the above. I’d like Servo to try UTF-8 everywhere in the DOM and what you first suggested here for document.write.

http://www.mail-archive.com/dev-servo@lists.mozilla.org/msg00934.html

kmcallister · 2015-03-25T01:01:32Z

https://github.com/kmcallister/tendril encompasses my latest proposal.

nox · 2016-11-29T19:00:22Z

document.write landed.

kmcallister mentioned this issue Jul 31, 2014

Provide hooks Servo needs to implement the document lifecycle #7

Closed

kmcallister added for-servo and removed for-servo labels Jul 31, 2014

kmcallister mentioned this issue Sep 20, 2014

Accept raw bytes #34

Closed

kmcallister mentioned this issue Oct 16, 2014

Implement document.write servo/servo#3704

Closed

kmcallister mentioned this issue Mar 25, 2015

Implement the "sixth era" string proposal #115

Closed

nox closed this as completed Nov 29, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support document.write #6

Support document.write #6

kmcallister commented Jul 31, 2014

SimonSapin commented Sep 24, 2014

kmcallister commented Sep 24, 2014

SimonSapin commented Sep 24, 2014

SimonSapin commented Oct 5, 2014

SimonSapin commented Oct 6, 2014

kmcallister commented Mar 25, 2015

nox commented Nov 29, 2016

Support document.write #6

Support document.write #6

Comments

kmcallister commented Jul 31, 2014

SimonSapin commented Sep 24, 2014

kmcallister commented Sep 24, 2014

SimonSapin commented Sep 24, 2014

SimonSapin commented Oct 5, 2014

SimonSapin commented Oct 6, 2014

kmcallister commented Mar 25, 2015

nox commented Nov 29, 2016