-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Undocumented punycode validation exception #14994
Comments
ah yes... what you would need to do in this case is convert the URL to ascii first and then pass it in to |
Marking this as a feature request... specifically, URLs passed in to http.get should not be converted to punycode in the host header. |
It's other way around - I pass an ascii url and it gets converted from punycode. |
Lines 333 to 340 in d14c132
I did a little research and turns out this line of code does the opposite. process.binding('icu').toASCII('xn--a--a', true) adds that nasty unicode character.
|
We ruled against that in the URL Standard after whatwg/url#309 (comment) and whatwg/url#309 (comment). I'll take a look. |
To add some details, Also, how should |
It's consistent with how other malformed URLs are handled (e.g. It sounds like this is a documentation issue. |
OP, I take it you can and do access The frustrating thing here is that implementers from the various browser teams have been rather unresponsive to these concerns. If this turns out to be a blocker please raise an issue over at https://github.com/whatwg/url/issues/new and I'll have another go at poking everyone. |
Yes, but only if the internal behaviour is correct. It's ok to throw an exception if documented, but now it throws because toASCII injects unicode into Host: field and header validator then throws. Isn't it more logical either to throw in the Punycode validator or not return unicode from the function named
I don't care about what browsers do. In fact I don't have any target I want Node to be consistent with. And moreover you can choose whatever reaction on this URL you want as long as you find it logical and consistent. I'm not a specialist in Punycode/IDNA. So it's up to you as long as you provide enough documentation and explain rationale behind you decisions in this issue thread. If you are interested in my opinion, I find it strange that URL normalization is done inside the HTTP client. I'd like it to be done outside to separate the concerns, so that HTTP client would only accept fully escaped URLs e.g. valid punycode in domain names and percent-style urlencoding in the URI part. But I think it's too late to change the design so you must followe the design decisions taken earlier and which I'm not aware of. I think the logical problem is that now (after punycode and with human-readable non-ascii URI part in browsers' address bars) there are really 2 url formats: human-readable and machine-level, and But as I said above, I'll accept anything as a solution as long as you say it's consistent. I see 3 inconsistencies now:
|
Your understanding about ToASCII only applies to ToUnicode. ToASCII can and will fail at times. (I agree that in theory it would be nice if the HTTP code took a parsed URL and not a string.) |
Oh sorry. I meant "toASCII returns punicode". Since you don't understand me let's try step by step. Open node and enter process.binding('icu').toASCII('xn--a--a', true) You see that nasty char in the end. If you examine it closer with Is is correct behaviour of |
Not as far as I know. |
Great! So it's the first part of the bug. The second is that https://nodejs.org/dist/latest-v8.x/docs/api/http.html doesn't document that
Do you agree? |
@nponeccop Yes I would say the replacement appended to the end of the domain name is a bug. However, I'm just not sure if we can fix it, given that we delegate IDNA handling to ICU. The most we can do is to error out on it, which may break existing use cases and certain one of the tests in our test suite (which was why I added the lenient mode in the first place). @annevk The correct behavior is to error out on invalid domain names. We don't do that currently for compatibility. |
Should I fill the bug to ICU then? Where is the place in the sources where you call ICU? The bug only happens in lenient mode. Does lenient mode implementation come from ICU too? |
That's a good point and I take back what I said; I noticed that the ICU and non-ICU builds behave differently here as the latter uses
|
Your example also lacks lenient mode which is used in |
I think it's also worth noting that this is a fundamental difference between
For instance: > new url.URL('http://xn--a--a/testing')
TypeError: Invalid URL: http://xn--a--a/testing
at Object.onParseError (internal/url.js:90:17)
at parse (internal/url.js:99:11)
at new URL (internal/url.js:193:5)
at repl:1:1
at ContextifyScript.Script.runInThisContext (vm.js:23:33)
at REPLServer.defaultEval (repl.js:339:29)
at bound (domain.js:280:14)
at REPLServer.runBound [as eval] (domain.js:293:12)
at REPLServer.onLine (repl.js:536:10)
at emitOne (events.js:101:20) > url.parse('http://xn--a--a/test')
Url {
protocol: 'http:',
slashes: true,
auth: null,
host: 'xn--a--a�',
port: null,
hostname: 'xn--a--a�',
hash: null,
search: null,
query: null,
pathname: '/test',
path: '/test',
href: 'http://xn--a--a�/test' }
> The code in From my perspective this really isn't an issue with ICU or IDNA, it's an issue with the fact that the URL is not valid, the legacy In the long run, I would personally prefer But that's just my opinion. |
Yes, this is a complex issue with many intertwining parts. For example, |
Recommend closing this due to inactivity for a year. The WHATWG URL parser should work as an alternative. |
The code above prints
error
forgoodUrl
and throws a exception forbadUrl
.However, this throwing is not explained in the documentation of
http.get()
andhttp.request()
methods. Nor in Url constructor etc.The problem seems to stem from the fact that
url.parse('http://xn--a--a').hostname.codePointAt(8)
is65533
, i.e. url.parse tries to decode punycode which doesn't play well withHost
header validation.The text was updated successfully, but these errors were encountered: