-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #301: detail how javascript: return values become response bodies #1107
Conversation
This also does some minor cleanup to clarify that "run a classic script" returns undefined when scripting is disabled.
data-x="concept-response-status">status</span> is <code data-x="">204</code>.</p> | ||
|
||
<p>Otherwise, the result of obtaining the resource for the URL is a <span | ||
data-x="concept-response">response</span> whose <span | ||
data-x="concept-response-header-list">header list</span> consists of <code | ||
data-x="">Content-Type</code>/<code>text/html</code> and whose <span | ||
data-x="concept-response-body">body</span> is the return value converted to a string | ||
value.</p> | ||
data-x="concept-response-body">body</span> is <span>ToString</span>(<var>result</var>).</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems browsers are not really interoperable around this (doh). E.g., http://software.hixie.ch/utilities/js/live-dom-viewer/saved/4099. Firefox seems to serialize the object somehow without invoking ToString()
. Chrome refuses to navigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edge also refuses to navigate. Amazing.
Note that the undefined
case also refuses to navigate. (Is that also true for normal links that return 204s, or is there some special casing that I don't understand?)
It seems like throwing doesn't actually make a difference. http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4100 also refuses to navigate in Chrome and Edge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so there was an issue with our examples: the JavaScript program
{toString: () => { return 5; }}
is actually a block with a label named toString
, whose completion value is () => { return 5; }
. So Firefox's answer is consistent with ToString().
If you try e.g. http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4101 in Firefox, it gives 5
, as expected from ToString()
on the result. So that's good.
But if you try it in Chrome or Edge, they both refuse to render again, so that's not great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, 204 not navigating is normal I think. Not sure how accurately that is defined overall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DigiTec could you fill us in on how Edge converts javascript:
URL completion values into strings to use as response bodies? So far Gecko seems straightforward but Blink and EdgeHTML are mysterious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: only Firefox reports uncaught exceptions to the console (http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4104), and presumably to window.onerror.
Should we spec swallowing them to go with the majority? That feels strictly worse, which is sad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What Gecko does in terms of conversion to bytes is that it examines the returned string to see whether all charCodeAt()
values are 255 or less. If so, the string is treated as byte-inflated ISO-8859-1 data (and a response is synthesized which has "ISO-8859-1" as its encoding, with the byte-deflated bytes as data). This allows generation of non-text data, dating back to when we supported javascript: in <img>
, say.
Otherwise the return value is treated as a sequence of UTF-16 code units encoding a Unicode string and converted to UTF-8 bytes (insert handwaving about what happens to unpaired surrogates here). The synthesized response has "UTF-8" as its encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What Blink does is https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/bindings/core/v8/V8StringResource.cpp&sq=package:chromium&l=101&rcl=1461583037 which I haven't tried to wrap my head around.
I did try http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4105 in every browser and they all gave me back the string verbatim, so that didn't reveal much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Edge we specify two code pages for transformation. The first is the calculated code page which is always CP_UCS_2 which translates to Unicode, ISO 10646 according to comments. We then specify as the source code page CPSRC_NATIVEDATA which means native data, known to be CP_UCS_2 so don't allow any sort of fallbacks.
I also confirmed that our current behavior in Edge for the handling of the return value was put in place to match WebKit behavior and that legacy IE does have much more complicated behaviors still (handling of boolean return values, etc...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@domenic The code you point to in V8StringResource seems to be about converting V8 strings to WebCore strings, not conversion to bytes.
I don't think Blink does conversion to bytes at all here. See the FIXME comment in https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/core/loader/DocumentWriter.cpp&l=75&ct=xref_jump_to_def&cl=GROK&gsn=appendReplacingData as called from https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/core/loader/DocumentLoader.cpp&sq=package:chromium&l=684&rcl=1461583037 (DocumentLoader::replaceDocumentWhileExecutingJavaScriptURL).
In terms of test matrix, if we need to determine this in a black-box way, it seems to me that the following are somewhat useful cases to test:
- Return string is all ASCII (charCodeAt() < 128 for all indices).
- Return string has charCodeAt() < 256 for all indices, but does not fall into case 1.
- Return string has does not have any charCodeAt values corresponding to UTF-16 surrogate code unit values, but does not fall into cases 1 or 2.
- Return string has surrogate code units which are all paired properly.
- Return string has unpaired surrogate code units.
Each of these should be tested in situations in which the source of the javascript: URL is either UTF-8 or ISO-8859-1/Windows-1252. That is, either an iframe in a document with that encoding with src pointing to a javascript: URL, or a link in a document with that encoding with href pointing to a javascript: URL. Probably test both scenarios.
The tests should look for the following things:
- What is the
document.body.textContent
of the resulting document? - What is the
document.charset
of the resulting document?
I've uploaded new changes that bring this more in line with the browser majority and also adds a warning about the JS string -> byte conversion, saying we're still looking into that. I'd kind of like to merge that and work on figuring out the string -> byte business in a new PR. |
Just to check, that's basically just adding a Type() == string guard and not saying anything about how exceptions during evaluation are handled, right? |
No, exceptions are handled by "If evaluation was unsuccessful, let result be undefined instead." |
Ah, in terms of the navigation behavior. OK, good, I think that matches all UAs. What about the onerror/reporting behavior? In Gecko, we basically report the exception then pretend like execution succeeded and returned undefined. |
Oh, and not reporting the exception would be doable. Reporting it to the console but not firing onerror would be a lot more annoying. |
I realized I was not testing the right thing previously with regard to error reporting. http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4107 shows that Chrome, Edge, and Firefox all report the error. Which is good. That is also what the current patch set does, automatically; it calls "run a classic script" which, in the absence of flags, will report any thrown exceptions before returning undefined. So I think the current patch matches everyone on error reporting behavior and everyone-but-Gecko on non-string behavior. |
And the current patch doesn't address the string-or-byte question, though I guess if nobody but Gecko does that we might want to leave that out. It's also a lot less useful now you can no longer use |
the bytes that comprise a <span data-x="concept-response-body">response body</span> is | ||
not yet specified, pending further investigation into user agent behavior. See <a | ||
href="https://github.com/whatwg/html/pull/1107/files/8784407eea43171053a5fe75890b779edaabac20#r60947977">discussion | ||
in pull request #1107</a>.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to merge this but I think that if we want to investigate aligning with Gecko here (not sure why if Gecko is the only browser that does this) we should have a new issue for that and point there. Accountability is harder to track otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opened #1129. Will merge now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Just curious, not objecting anything here, but do we know why non-Gecko engines don't do the obvious thing and convert the return value effectively to string? Like did anyone try to first file implementation bugs before this spec bug? |
I kind of assumed such reasons were lost in the mists of time. It's not clear to me any particular behavior is more obvious than any other. In JS algorithms, ToString() logic is prevalent, but for transporting JS values into response bodies in the network stack, there's no real precedent. For such an edge case, changing 1 browser seems like a better path to interop than changing 3. |
Or just retrain your fingers to type Ctrl+Shift+J, which is less work than Ctrl+T + "javascript:". ^_^ |
FWIW, this is why we shouldn't be using github for issue tracking: all the relevant discussion and data here was in the line comments on the PR, and github happily clobbered that once an updated version was pushed. :( |
@bzbarsky you can still get to those comments quite easily by expanding the relevant thread. It's a rather great feature when the line comments are nitpicks. It's not super useful if substantive discussion happens there, but perhaps we should learn to not have substantive discussion in line comment threads. |
Uh, I totally missed all the relevant discussion here. /me finds which github link to click to see the interesting comments about this issue... |
I think if you start at #1107 (comment) you should be able to read the relevant bits. |
I'll add a wpt for this in https://bugzilla.mozilla.org/show_bug.cgi?id=1268047 |
@domenic I just did some testing, and at least Chrome and Safari do in fact convert all values to strings (but possibly to "" if the original value is not a string). This includes the undefined value. So this pull request specified something that matches at best only Edge and breaks sites, as we discovered when we implemented it in Gecko. See #1895 |
This also does some minor cleanup to clarify that "run a classic script"
returns undefined when scripting is disabled.