-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
javascript URL: define JS string-to-byte conversion better #1129
Comments
Fixes #301, by aligning with the 3/4 browser majority and checking the type of the completion value, turning non-strings and thrown errors into 204s. (Thrown errors are still reported, however.) While working on this algorithm, we fix #945 by copying the HTTPS state to response. This also does some minor cleanup to clarify that "run a classic script" returns undefined when scripting is disabled. #1129 was opened to track a remaining open issue discovered, which is exactly how the JS string completion value becomes a response body. For now the spec includes a warning saying that this is underspecified.
Here's a WPT for this (you can replace the encoding windows-1252 for the other test): <meta charset=UTF-8>
<title>javascript URL string return values</title>
<script src=/resources/testharness.js></script>
<script src=/resources/testharnessreport.js></script>
<div id=log></div>
<script>
const testInputs = [
[0x41],
[0x80,0xFF],
[0x80,0xFF,0x100],
[0xD83D,0xDE0D],
[0xDE0D,0x41]
];
testInputs.forEach(input => {
async_test(t => {
const frame = document.createElement("iframe");
//t.add_cleanup(() => frame.remove());
frame.src = "javascript:[" + input + "].map(b => String.fromCharCode(b)).join('')";
t.step_timeout(() => {
assert_equals(frame.contentDocument.body.textContent, input.map(b => String.fromCharCode(b)).join(""));
assert_equals(document.charset, frame.contentDocument.charset);
t.done();
}, 200);
document.body.appendChild(frame);
});
});
</script> Chrome and Safari pass these. (That means lone surrogates remain lone surrogates.) Firefox doesn't:
Edge doesn't:
I'll try add the |
For those I get similar results in Firefox and Edge, but both Chrome and Safari cannot be tested in this way because they do not respect the <meta charset=UTF-8>
<title>javascript URL string return values</title>
<script src=/resources/testharness.js></script>
<script src=/resources/testharnessreport.js></script>
<div id=log></div>
<script>
const testInputs = [
[0x41],
[0x80,0xFF],
[0x80,0xFF,0x100],
[0xD83D,0xDE0D],
[0xDE0D,0x41]
];
testInputs.forEach(input => {
const javascriptURL = "javascript:[" + input + "].map(b => String.fromCharCode(b)).join('')",
output = input.map(b => String.fromCharCode(b)).join("");
async_test(t => {
const frame = document.createElement("iframe");
t.add_cleanup(() => frame.remove());
frame.src = javascriptURL;
t.step_timeout(() => {
assert_equals(frame.contentDocument.body.textContent, output);
assert_equals(frame.contentDocument.charset, document.charset);
t.done();
}, 200);
document.body.appendChild(frame);
});
async_test(t => {
const frame = document.createElement("iframe"),
href = document.createElement("a");
t.add_cleanup(() => { frame.remove(); href.remove(); });
frame.name = "hi" + input;
href.target = "hi" + input;
href.href = javascriptURL;
t.step_timeout(() => {
assert_equals(frame.contentDocument.body.textContent, output);
assert_equals(frame.contentDocument.charset, document.charset);
t.done();
}, 200)
document.body.appendChild(frame);
document.body.appendChild(href);
href.click();
});
});
</script> |
By the way, if we actually need this to be about bytes, my recommendation would be always UTF-8 and UTF-8 encode. If we can somehow deal with a response containing a JavaScript string, we could just match Chrome/WebKit. (Possible, but probably ugly. Much better to require UTF-8 encode I think.) |
Issues filed for |
Some updates on the Blink front. After @natechapin's chromium/chromium@9d459e0 (scheduled for Chrome 77), Chrome now always does UTF-8 encoding/decoding for javascript URLs rather than using the JavaScript string directly, which is the best case scenario. So the last test case (unpaired surrogates) in #1129 (comment) now fails. Additionally, the |
Additionally, the |
So it looks like @annevk's tests were committed to WPT at some point, yay: results. However I'm trying to spec this and I think I must be confused on the desired behavior for the If we spec the body as doing a UTF-8 encode of the JS string But when you feed that body into a UTF-8 decoder, which is what I presume would happen (we'd force the charset to UTF-8 via Content-Type), then the result would be the JS string However Chrome seems to instead produce the JS string Is my understanding correct? |
@domenic This looks potentially like a Blink bug. When we set a breakpoint at where UTF-16 JavaScript string is converted to UTF-8, we can see that the converted result is actually ED B8 8D 41, where the unpaired surrogate is translated to UTF-8 directly without any checking. This is probably an artifact of kLenientUTF8Conversion, where kStrictUTF8ConversionReplacingUnpairedSurrogatesWithFFFD would do the right thing. Filed https://crbug.com/1221018. |
Response bodies are bytes, but the algorithm (as of #1107) uses a JS string as a response body.
Black box testing plan
At #1107 (comment) Boris gave a test plan that would allow us to figure out the string -> byte conversion in a black box way:
Relevant implementer reports
Gecko
From #1107 (comment)
Blink
From #1107 (comment) with further analysis by Boris in #1107 (comment):
EdgeHTML
From #1107 (comment)
The text was updated successfully, but these errors were encountered: