Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http{s}: fix connecting to localhost if URL is invalid #2967

Closed

Conversation

thefourtheye
Copy link
Contributor

If the URL passed to http.request is not properly parsable by url.parse, we
fall back to use localhost and port 80. This creates confusing error messages
like in this question http://stackoverflow.com/q/32675907/1903116.

This patch throws an error message, if url.parse fails to parse the URL.

Previous Discussion: #2966

@thefourtheye thefourtheye added the http Issues or PRs related to the http subsystem. label Sep 20, 2015
@thefourtheye thefourtheye added https Issues or PRs related to the https subsystem. semver-major PRs that contain breaking changes and should be released in the next major version. labels Sep 20, 2015
@thefourtheye thefourtheye changed the title http: fix connecting to localhost if URL is invalid http{s}: fix connecting to localhost if URL is invalid Sep 20, 2015
@ChALkeR
Copy link
Member

ChALkeR commented Sep 20, 2015

Is this really a semver-major? The old behavior looks like a bug, is there really a case when it's desired?

Ah, now we throw an Error immediately when before it was passed to the callback. Seems reasonable.

@thefourtheye
Copy link
Contributor Author

@ChALkeR hmmm, to think of it, you may be right. Nobody would have preferred connecting to localhost when they actually meant to connect to a different URL. So, I guess this would not break a lot of code out there. I'll remove the major tag.

@thefourtheye thefourtheye removed the semver-major PRs that contain breaking changes and should be released in the next major version. label Sep 20, 2015
@ChALkeR
Copy link
Member

ChALkeR commented Sep 20, 2015

@thefourtheye Yes, even with the «pass error to callback» → «throw immediately» change it looks more like a bugfix to me. Let's see what the others will say.

@nodejs/collaborators

@targos
Copy link
Member

targos commented Sep 20, 2015

There are probably applications out there that request('localhost'). They get what they expect for the wrong reason but this will still break them.

@thefourtheye
Copy link
Contributor Author

@targos That's a good thing to have in the tests. I included them. Thanks :-)

@trevnorris
Copy link
Contributor

This does come closer to the wep API new URL() to throw on invalid url's, and it's the path of least surprise. Though it may currently may have been undefined behavior, but that doesn't mean it's a bug fix. I'm +1 for the change, but am going to lend towards semver-major, but am open for discussion.

@targos
Copy link
Member

targos commented Sep 25, 2015

I also think it should be semver-major

@yosuke-furukawa
Copy link
Member

+1 semver-major
+1 throw an error immediately
But if urls are invalid, url.parse should throw the error.

@thefourtheye
Copy link
Contributor Author

Looks like this is seen mostly as semver-major change. @ChALkeR Are you okay with landing this in major and #2966 in master now?

@@ -22,6 +22,9 @@ function ClientRequest(options, cb) {

if (typeof options === 'string') {
options = url.parse(options);
if (!options.host && !options.hostname) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just check options.hostname? Is there a reason for allowing empty hostnames at this point?
Example: url.parse('http://:80/'): host: ':80', hostname: '', the only difference per doc between them is the presence of port in host.

Btw, how will the following code behave in such case (when the effective host includes the port, but not the domain)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChALkeR Good point. I also think we need to check only the hostname here. I changed the check and included a test as well.

Btw, how will the following code behave in such case (when the effective host includes the port, but not the domain)?

I tried http.get('http://:80/') and got the error Error: getaddrinfo ENOTFOUND :80 :80:80

@ChALkeR
Copy link
Member

ChALkeR commented Sep 28, 2015

@thefourtheye

Looks like this is seen mostly as semver-major change. @ChALkeR Are you okay with landing this in major and #2966 in master now?

LGTM (as a semver-major) with notes and if tests pass. Btw, don't semver-major entries go directly into master now? If they do, then what is the purpose for landing #2966 in master?

Not sure about #2966. If it gets merged, it would make sense to merge that directly into 4.x branch and backport to 0.12/0.10 if applicable, and put a notice that this behavior is changed in 5.x. But I see no actual need in that, not sure about others. Let's move that discussion to #2966.

@ChALkeR
Copy link
Member

ChALkeR commented Sep 28, 2015

@targos
request\(.localhost regex:

min-request-1.4.1.tgz/test.js:29:               request('localhost:' + port, {body: data}, function(err, res, body) {
min-request-1.4.1.tgz/test.js:36:               request('localhost:' + port, {
mitm-servers-1.0.3.tgz/test/test.js:66:    return request('localhost:1234')
rest-stop-framework-1.0.0.tgz/test/server_test.js:17:    chai.request('localhost:3000')
rest-stop-framework-1.0.0.tgz/test/server_test.js:30:    chai.request('localhost:3000')
rest-stop-framework-1.0.0.tgz/test/server_test.js:42:    chai.request('localhost:3000')
rest-stop-framework-1.0.0.tgz/test/server_test.js:55:    chai.request('localhost:3000')
rest-stop-framework-1.0.0.tgz/test/server_test.js:68:    chai.request('localhost:3000')

request\(.127\.0\.0\.1 regex:

cadvisor-to-metric-server-0.0.4.tgz/demo/test.js:92:request('127.0.0.1:8080/api/v1.0/machine', function(err, machineInfo) {
cadvisor-to-metric-server-0.0.4.tgz/demo/test.js:94:    request('127.0.0.1:8080/api/v1.2/docker/' + id, function(err, data) {

@thefourtheye
Copy link
Contributor Author

don't semver-major entries go directly into master now

@ChALkeR Yes, they do. But if the PR is tagged as semver-major, it will be picked only during a major release cut. Till we release v5, we may want to tell users that this is the expected behavior. That is why #2966 is needed.

@ChALkeR
Copy link
Member

ChALkeR commented Oct 4, 2015

@thefourtheye Once again: this change LGTM, and once you decide to land it, it should be landed to master. There is no separate «major» branch, nor there is a reason for delaying landing this till the release.

Also, this change could include #2966 without the note:

 `options` can be an object or a string. If `options` is a string, it is
-automatically parsed with [url.parse()][].
+automatically parsed with [url.parse()][] and it must be a valid complete URL,
+including protocol and complete domain name or IP address.

@ChALkeR ChALkeR added the semver-major PRs that contain breaking changes and should be released in the next major version. label Oct 4, 2015
@ChALkeR
Copy link
Member

ChALkeR commented Oct 4, 2015

Ah, also cc @rvagg for semver-major.

@@ -22,6 +22,9 @@ function ClientRequest(options, cb) {

if (typeof options === 'string') {
options = url.parse(options);
if (!options.hostname) {
throw new Error('Unable to determine the domain name');
}
} else {
options = util._extend({}, options);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we also need the check here? Why not just check after these if statements?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fishrock123 At this place, if user doesn't pass hostname, then the default value undefined will be used

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't that default to localhost again? Isn't that what this is trying to prevent? I don't understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fishrock123 If the user doesn't specify any hostname in the options, then defaulting to localhost is expected. But when I give google.com (without http://), it should not take me to localhost.

If the URL passed to `http{s}.request` or `http{s}.get` is not properly
parsable by `url.parse`, we fall back to use `localhost` and port 80.
This creates confusing error messages like in this question
http://stackoverflow.com/q/32675907/1903116.

This patch throws an error message, if `url.parse` fails to parse the
URL properly.

Previous Discussion: nodejs#2966
PR-URL: nodejs#2967
@Trott
Copy link
Member

Trott commented Oct 26, 2015

cc @nodejs/http

@dougwilson
Copy link
Member

I'm 👍 for this change.

@evanlucas
Copy link
Contributor

LGTM if the CI is happy

@jasnell
Copy link
Member

jasnell commented Oct 26, 2015

LGTM if CI is green

@thefourtheye
Copy link
Contributor Author

thefourtheye added a commit that referenced this pull request Oct 27, 2015
If the URL passed to `http{s}.request` or `http{s}.get` is not properly
parsable by `url.parse`, we fall back to use `localhost` and port 80.
This creates confusing error messages like in this question
http://stackoverflow.com/q/32675907/1903116.

This patch throws an error message, if `url.parse` fails to parse the
URL properly.

Previous Discussion: #2966
PR-URL: #2967

Reviewed-By: Сковорода Никита Андреевич <chalkerx@gmail.com>
Reviewed-By: Evan Lucas <evanlucas@me.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@thefourtheye
Copy link
Contributor Author

The failures seem unrelated to this change. Thanks for the review. Landed at 437930c

@thefourtheye thefourtheye deleted the http-fix-invalid-urls branch October 27, 2015 03:19
@thefourtheye
Copy link
Contributor Author

It would be better if we get more eyes on #2966 as well. @jasnell Will this be backported to LTS as well?

@jasnell
Copy link
Member

jasnell commented Oct 27, 2015

@thefourtheye .. not as a semver-major

@thefourtheye
Copy link
Contributor Author

@jasnell oops right...

@rvagg
Copy link
Member

rvagg commented Oct 28, 2015

@thefourtheye can we avoid http{s} in future please and stick to vanilla names in commit simmaries? It makes sorting and grokking changes a bit tricky and http almost always means https anyway.

@thefourtheye
Copy link
Contributor Author

@rvagg Oh, sorry about that. I'll use only proper subsystem names now on.

rvagg pushed a commit to rvagg/io.js that referenced this pull request Oct 29, 2015
If the URL passed to `http{s}.request` or `http{s}.get` is not properly
parsable by `url.parse`, we fall back to use `localhost` and port 80.
This creates confusing error messages like in this question
http://stackoverflow.com/q/32675907/1903116.

This patch throws an error message, if `url.parse` fails to parse the
URL properly.

Previous Discussion: nodejs#2966
PR-URL: nodejs#2967

Reviewed-By: Сковорода Никита Андреевич <chalkerx@gmail.com>
Reviewed-By: Evan Lucas <evanlucas@me.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@chalkers
Copy link

chalkers commented Nov 10, 2016

I believe this PR introduced a bug.

In the following example code in Node.js 4.0.0 or earlier...

const request = http.get("localhost", res => { });

request.on('error', e => console.error(e.message));

...requires a try catch now in 5.0.0 or later

try {
  const request = http.get("localhost", res => { });
  request.on('error', e => console.error(e.message));
} catch (e) {
  console.error(e.message);
}

I would expect all parse error would be emitted like all the other errors.

Happy to do a pull request if you agree :)

@bnoordhuis
Copy link
Member

@chalkers Mmm, it's actually closer to the core design philosophy of throwing early on bad inputs. Asynchronous error events are for operational errors, not logic errors.

@chalkers
Copy link

@bnoordhuis Interesting. Do you have any other examples in the Node.js API that work that way?

I.E.

try {
   //some event emitter
}  catch (e) {
 
}

I ask because it's confusing a lot of students or new users to Node.js. I've been teaching Node.js on Treehouse for over two years and we started on 0.12. If people follow along they tend to have upgraded to 6+ (and we did naively in our interactive coding environment) expecting this would still be compatible.

If there are more examples of this pattern it would be good to know as an educator. I'm also refreshing the Node.js Basics course right now. If this is an unintended consequence of this patch, as I said before, I'd be more than happy to create a pull request resolving the issue.

@bnoordhuis
Copy link
Member

@chalkers As a rule of thumb:

  • passing bad arguments to an API function throws an exception
  • passing good arguments that fail at run-time emits an asynchronous 'error' event.

Most APIs in Node.js work this way (although there are shades of gray, of course.) Taking this particular issue as an example:

  • a malformed URL throws an exception
  • a valid URL with a hostname that doesn't resolve to a DNS record results in an 'error' event

The fs module blatantly deviates from that rule with its 'does this path contain nul bytes?' check - it emits an asynchronous 'error' event even though it's clearly a bad input - but that is the only real outlier I can think of (and I still think that was a stupid move.)

@chalkers
Copy link

Thanks for clearing that up - I'll be sure to mention the philosophy in the update :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http Issues or PRs related to the http subsystem. https Issues or PRs related to the https subsystem. semver-major PRs that contain breaking changes and should be released in the next major version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.