Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whatwg protocol setter only seems to support special schemes #13523

Closed
sebakerckhof opened this issue Jun 7, 2017 · 22 comments
Closed

Whatwg protocol setter only seems to support special schemes #13523

sebakerckhof opened this issue Jun 7, 2017 · 22 comments
Labels
doc Issues and PRs related to the documentations. whatwg-url Issues and PRs related to the WHATWG URL implementation.

Comments

@sebakerckhof
Copy link

sebakerckhof commented Jun 7, 2017

Currently the new whatwg url class doesn't seem to work with any other scheme except so called 'special schemes' in the spec:
https://url.spec.whatwg.org/

E.g. this works:

const { URL } = require('url');
const url = new URL('http://foo.bar');
url.protocol = 'ftp';
url.toString(); // => ftp://foo.bar

But this doesn't work:

const { URL } = require('url');
const url = new URL('http://foo.bar');
url.protocol = 's3';
url.toString(); // http://foo.bar => should be s3://foo.bar

Even schemes defined here: https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml don't work.

Not that this does work:

const { URL } = require('url');
const url = new URL('s3://foo.bar');
url.protocol; // = s3:
@sebakerckhof sebakerckhof changed the title Whatwg url only seems to support special urls. Whatwg protocol setter only seems to support special schemes Jun 7, 2017
@mscdex mscdex added the whatwg-url Issues and PRs related to the WHATWG URL implementation. label Jun 7, 2017
@mscdex
Copy link
Contributor

mscdex commented Jun 7, 2017

/cc @nodejs/url

@TimothyGu
Copy link
Member

By spec you are not allowed to change a special-scheme URL to a non-special-scheme, or vice versa. See scheme state step 2.1.1-2. However you can change from one special scheme to another (as you have found out), or from one non-special scheme to another:

const { URL } = require('url');
const url = new URL('s3://foo.bar');
url.protocol = 's4';
url.toString(); // => s4://foo.bar

The spec change was made in whatwg/url@5533c8d. See there for more details about it.

@TimothyGu TimothyGu added the question Issues that look for answers. label Jun 8, 2017
@sam-github
Copy link
Contributor

Isn't this a bug in node's URL docs, or do they mention this special case?

@jasnell
Copy link
Member

jasnell commented Jun 8, 2017

Not a bug as much as not documented. The docs can be expanded.

@gibfahn gibfahn reopened this Jun 8, 2017
@gibfahn
Copy link
Member

gibfahn commented Jun 8, 2017

Reopening as a request to document this then.

@broofa
Copy link

broofa commented Jun 26, 2017

@TimothyGu Are you sure that's how the spec is to be interpreted? My reading of https://url.spec.whatwg.org/#scheme-state, indicates that the parse method should only return in cases where state override is set. Otherwise it should just go ahead and set the scheme regardless of it being special/unspecial scheme. Am I reading that wrong? And what exactly is the state override argument for? 'Pretty abstract name, that.

It's worth noting that Chrome does not have this issue/behavior. It sets the scheme regardless
image

In practice, I would argue that silently failing to do what is very obviously the desired action here (change the protocol) is arguably the worst behavior possible. At a minimum, there should be a console warning of some sort, probably even an thrown exception. I know this is probably more appropriate for discussion on the WhatWG forum, but this just wasted an hour and a half of my time and I'm not interested in throwing any more of my time into that particular sink hole. ☹️

@TimothyGu
Copy link
Member

the parse method should only return in cases where state override is set

Yes, and:

The protocol attribute’s setter must basic URL parse the given value, followed by U+003A (:), with context object’s url as url and scheme start state as state override.

It's worth noting that Chrome does not have this issue/behavior.

Then it is Chrome's bug for not implementing whatwg/url@5533c8d.

In practice, I would argue that silently failing to do what is very obviously the desired action here (change the protocol) is arguably the worst behavior possible.

While I'm not exactly a huge fan of silently ignoring errors, we are unfortunately in a bind here, as that is what browsers as well as the URL Standard are doing for all other attributes (for example, setting url.port to something out of range like 65536 will reset the port to 0 in Chrome). On the other hand, the URL Standard has the responsibility to maintain compatibility between URL and HTMLAnchorElement, as IIRC URL is intended to be a DOM-independent drop-in replacement of HTMLAnchorElement.

@broofa
Copy link

broofa commented Jun 27, 2017

What about this?

The url and state override arguments are only for use by various APIs. [HTML]

Since Node isn't HTML, doesn't this mean "no state override"?

@TimothyGu
Copy link
Member

You have to take the entire quote into context:

The encoding override argument is a legacy concept only relevant for HTML. The url and state override arguments are only for use by various APIs. [HTML]

It's my belief that the [HTML] link only refers to the first sentence, while by "various APIs" they mean the URL interface (and the getters/setters of its attributes) that is visible in JavaScript.

@jasnell
Copy link
Member

jasnell commented Jun 27, 2017

ping @domenic ... thoughts on this?

@domenic
Copy link
Contributor

domenic commented Jun 27, 2017

@TimothyGu's got it right.

You can always add console warnings, however; those are not governed by any spec.

@broofa
Copy link

broofa commented Jun 27, 2017

It's my belief that the [HTML] link only refers to the first sentence, while by "various APIs" they mean the URL interface (and the getters/setters of its attributes) that is visible in JavaScript.

Why would the spec make an exemption ("only for use by various APIs") that is meant to be interpreted as "every API implementing the URL interface"? That seems... incongruous.

Getting back to what you wrote previously about [lack of] error reporting ...

While I'm not exactly a huge fan of silently ignoring errors, we are unfortunately in a bind here, as that is what browsers as well as the URL Standard are doing for all other attributes

That seems at odds with the spec. For example, for port state (and most other URL components addressed in the parsing verbiage) we see the phrase "validation error, return failure":

If port is greater than 216 − 1, validation error, return failure.

Under validation error we have (emphasis mine) :

A validation error indicates a mismatch between input and valid input. User agents, especially conformance checkers, are encouraged to report them somewhere

Doesn't that pretty clearly say that user agents should not fail silently?

You can always add console warnings, however; those are not governed by any spec.

@domenic, who were you addressing with this comment? 'Not sure if you were suggesting the URL api should warn in this case, or if consumers of the API should be left to their own devices to figure out how to implement the warning.

@TimothyGu
Copy link
Member

TimothyGu commented Jun 28, 2017

Why would the spec make an exemption ("only for use by various APIs") that is meant to be interpreted as "every API implementing the URL interface"? That seems... incongruous.

Algorithms in spec-land have two primary consumers:

  • Other algorithms
  • JS APIs

For WHATWG/W3C specs at least, when they refer to "APIs" the JavaScript part is implied, so by "various APIs" they mean "not other algorithms but something directly observable from JS." And no, not only every API implementing the URL interface but the Location and HTMLHyperlinkElementUtils interfaces also make use of state override.


That seems at odds with the spec. For example, for port state (and most other URL components addressed in the parsing verbiage) we see the phrase "validation error, return failure":

(I'll address "validation error later...) Usually it is the "return failure" part that makes the JS function throw. Indeed, the new URL constructor throws on every parsing errors like this one because it contains the following verbiage:

  1. Let parsedURL be the result of running the basic URL parser on url with parsedBase.
  2. If parsedURL is failure, throw a TypeError exception.

On the other hand, most attribute setters (other than href) ignores the failure. Like the port attribute's setter:

  1. Otherwise, basic URL parse the given value with context object’s url as url and port state as state override.

EOF

Note that unlike JS, failures don't bubble up automatically in specs (this is true even for the ECMAScript spec).


Under validation error we have (emphasis mine) :

A validation error indicates a mismatch between input and valid input. User agents, especially conformance checkers, are encouraged to report them somewhere

Doesn't that pretty clearly say that user agents should not fail silently?

"encouraged to" ≠ "must". In fact, none of the browsers I tested (Firefox and Chrome) have any indication for validation errors.

But even if we were to implement validation errors, we would have nowhere to show them. @domenic suggested console warnings, but that can be considered to be fairly spammy. I guess we could add some util.debuglog that's only enabled when NODE_DEBUG includes url, but I don't see an immediate need to. Especially since, as you can see in the spec, the case where specialness of schemes mismatches (the original complaint of this issue) isn't even designated a validation error.

@broofa
Copy link

broofa commented Jun 28, 2017

@TimothyGu , I appreciate your patience and explanations here. I'm content to let this die, but can you explain why this distinction between special and non-special schemes is important? I'm genuinely puzzled by how arbitrary this feels. You've said it's important to DOM APIs like Location and HTMLAnchorElement... perhaps you could elaborate a bit more on that?

One thing that's surprising to me is that, if there's a line to be drawn, I would expect it to be between secure .vs. insecure schemes like http and https. Something like that would be more logical, but there's no such distinction. Instead it's drawn between "schemes we've heard of" and "schemes we haven't". I feel like I'm missing some important insight.

@TimothyGu
Copy link
Member

@broofa It can be gleaned from whatwg/url@d5470b8 and whatwg mailing list that initially "special scheme" are meant to be those that

  • need to handle three slashes after the scheme as two, can treat lack of slashes as relative, etc.

  • cannot have an empty host (as that would lead to reparsing issues)

  • [support] backslash replacement and the encoding override

The same e-mail mentions:

I also think we should change the API such that you cannot change anything for non-relative URLs (setters are no-ops, already largely the case). And that you cannot change the scheme from a special URL to a relative URL. Given the different handling of hosts that might lead to security issues.

I'm sure @annevk (author of the aforementioned email and spec changes), @domenic, and the WHATWG bug tracker can give more insightful explanations.

@TimothyGu TimothyGu added doc Issues and PRs related to the documentations. and removed question Issues that look for answers. labels Jun 28, 2017
@annevk
Copy link

annevk commented Jun 28, 2017

I feel like I'm missing some important insight.

The main reason we have "special schemes" is because URL parsers in browsers historically (and some to this day) are more per-scheme than they are generic across all URLs. That has led to subtle things being allowed for what the URL Standard calls "special schemes" that are now hard to remove as you'd break the web.

So it's basically a special case necessary for compatibility and to be able to move everyone to generic URL parsers.

@sam-github
Copy link
Contributor

I follow this with interest. One of the complaints about node's traditional url API is that it silently ignores invalid input makes a "best effort" to do something, it would be good to have the new URL API do better.

But even if we were to implement validation errors, we would have nowhere to show them.

URL could throw exceptions.

Its not how browsers work, but browsers discard unhandled exceptions. As @mikeal has pointed out in the discussion about Promises, node chose to diverge from browser behaviour with unhandled exceptions, something that is generally considered a strength of node, not a weakness.

@annevk
Copy link

annevk commented Jun 28, 2017

If there is sufficient interest it might be worth considering adding something like URL.isValid(input) to the URL Standard, though it might take a while before that would ship in all browsers, as only Safari has a compliant implementation thus far.

@jasnell
Copy link
Member

jasnell commented Jun 28, 2017

I am definitely very -1 on adding validation errors that are not implemented in browsers or defined by the spec.

I'd be +1 to updating the spec if necessary.

@TimothyGu
Copy link
Member

TimothyGu commented Jun 28, 2017

URL could throw exceptions.

We already throw exceptions for many fatal errors ("return failure" in spec-speak), but "validation errors" denote a class of URLs that are not strictly correct but still commonly seen in the wild and reasonably well-formed. In other words, we are already stricter than url.parse by a large margin. One example is how the WHATWG API actually parses the host, and throws an error if the IP address or Punycode encoding is malformed, etc.

I think deviating from browser behavior by throwing validation errors in the URL parser is much less justifiable than Promises handling, which is implicated directly by what Node.js is and is not. However, as mentioned above, I do think adding appropriate debuglogs could be helpful if performance impact is not too significant.

Also of note is that this issue is about attribute setters, not the initial parsing step, as the conversation seems to have veered towards.

@sam-github
Copy link
Contributor

#13523 (comment) would be useful then, @annevk's suggestion.

@apapirovski
Copy link
Member

@TimothyGu @jasnell Should this stay open? I read through the thread but unclear on whether anything actionable came out or was taken to the spec side?

jasnell added a commit to jasnell/node that referenced this issue Aug 11, 2018
targos pushed a commit that referenced this issue Aug 12, 2018
Fixes: #13523

PR-URL: #22261
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Jon Moss <me@jonathanmoss.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Issues and PRs related to the documentations. whatwg-url Issues and PRs related to the WHATWG URL implementation.
Projects
None yet
Development

No branches or pull requests

10 participants