-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incorrect URL parsing #2114
Comments
@trevnorris so to clarify, you're saying that io.js is impacted by this also (since we're not strictly an 0.12 fork that's not obvious)? |
Yes. The test in the linked issue has the same result in io.js. A possible solution would be to simply parse the string with |
This could be related: #1693 |
It's probably the result of this change. Before f674b09, headers and the status line were parsed as UTF-8, now they're parsed as ISO-8859-1. |
@bnoordhuis Parsing the string this way follows more closely to the spec. Even though browsers may show the unicode characters in the URL bar, checking the network request shows it also decodes them before firing the request. io.js' http module would also barf on this request since we decode incoming headers using ISO-8859-1. @vkurchatkin It does look like the same issue. IMO the options are to let the user know they need to encode their header strings before sending them, or we should consider doing that automatically before turning them into a buffer. |
I wouldn't say it's duplicate. There are two problems with UTF8: parsing and writing, and they seem unrelated. |
http would previously accept paths with non-ASCII characters. This proved problematic, because multi-byte characters were encoded as 'binary', that is, the first byte was taken and the remaining bytes were dropped for that character. There is no sensible way to fix this without breaking backwards compatibility for paths containing U+0080 to U+00FF characters. We already reject paths with unescaped spaces with an exception. This commit does the same for paths with non-ASCII characters too. The alternative would have been to encode paths in UTF-8, but this would cause the behaviour to silently change for paths with single-byte non-ASCII characters (eg: the copyright character U+00A9 ©). I find it preferable to to add to the existing prohibition of bad paths with spaces. Bug report: nodejs#2114
Is there consensus at this time that one of these two options is superior to the other? |
@Trott Nope. May just want to throw this into the CTC meeting for quick vote for fast resolution. |
Should be addressed by the WHATWG URL impl here: #7448 |
@jasnell Does that change what |
@ChALkeR ... actually no, it doesn't, you're right. |
Does this need to remain open? |
Closing given the lack of any further progress on this. It's not even clear if this is still an issue |
I've created a very similar issue (with a failing test-case) here: #13296 |
A regression was introduced between v0.10 and v0.12 for the URL parsing of an
http.get()
request. Basically, multi-byte characters are decoded as'binary'
instead of eitherTest and additional information is located at nodejs/node-v0.x-archive#25634 (comment)
The text was updated successfully, but these errors were encountered: