-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
url: treat special characters in hostnames more strictly in url.parse() #45046
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
lib/url.js
Outdated
this.host = rest.slice(start, nonHost); | ||
// WHATWG URL removes tabs, newlines, and carriage returns. Let's do that too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does it everywhere, not only on the host component. I'm not sure if url.parse()
should do the same and how many breaking changes are tolerable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does it everywhere, not only on the host component. I'm not sure if
url.parse()
should do the same and how many breaking changes are tolerable.
I agree in principle, but I think narrowly scoping to hostname to start (and maybe even stopping there) makes a lot of sense as hostname spoofing is a bigger security concern than path or scheme spoofing. (Or at least that's my assumption. If anyone else thinks that's incorrect, I'm open to being persuaded.)
This comment was marked as resolved.
This comment was marked as resolved.
@Trott Can we run a benchmark to see the performance impact of this pull request? |
Sure. But I think I'm OK if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when this goes out in a release, let's be sure to include details on the performance impact of these changes. The perf regression is a key reason why these kinds of changes weren't made before. What we're essentially saying is that we are no longer committing to maintaining the same perf on url.parse()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Left a small/non-blocking comment.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
To unbreak one of the benchmark URL, we use toAscii()/punycode when we find % in a domain, like WHATWG URL. |
This comment was marked as outdated.
This comment was marked as outdated.
Here are all 55 of the three-star/highest-confidence results from the benchmark run. 😱
|
Benchmark with
|
769ffa2
to
105b71a
Compare
Throw if ^, |, and some other special characters are in the hostname, similar to WHATWG URL. Use punycode/toAscii when % appears in a hostname, like WHATWG URL.
Benchmark on latest changes: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1206/ |
Throw if ^, |, and some other special characters are in the hostname, similar to WHATWG URL.