-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http module does not enforce spec #18405
Comments
Thanks for the report but since there are several open issues and pull requests about this particular topic already, I'm going to close out this one. |
@bnoordhuis Can you link to some of them? |
I could more easily if GH's search was less terrible... at any rate, #13296 is one. |
I've done some more investigations around this. It seems that there are multiple reasons for the way we act right now.
|
Thanks for the explanation. All of the above points make sense, but I think it's worth stating their counter arguments:
const requestTarget = /^(?:(?:(https?):\/\/|(?=[^/?*]+$))((?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)\.){3}(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)|(?:[a-z\d](?:[a-z\d-]{0,61}[a-z\d])?\.)*[a-z](?:[a-z\d-]{0,61}[a-z\d])?\.?)(?::([1-9]\d*|0)?)?(?!\*)|(?=[/*]))((?:\/(?:[\w.~:@!$&'()*+,;=-]|%[\da-f]{2})+)+\/?|\/|\*$)?(?:\?((?:[\w.~:@!$&'()*+,;=/?-]|%[\da-f]{2})*))?$/i;
const match = url.match(requestTarget);
// If `match` is null, the url is not a valid `request-target`
// Otherwise, we can split it into its parts:
const [, scheme, host, port, path, query] = match;
// If the `scheme` is truthy, the url is in `absolute-form`
// If the `scheme` is falsey and `host` is truthy, the url is in `authority-form`
// If the `scheme` is falsey and `host` is falsey, the url is in `origin-form` or `asterisk-form`
So it doesn't make sense to me to take their standards too seriously in other contexts, such as back-end applications. Implementation of their APIs should be considered optional for non-web-browser environments. |
There are many differences between your regular expression and real-world URL parsers:
Aside from regex vs. state machine, I'm interested in investigating the 5x performance difference you are observing between
There is an important distinction between APIs and URL semantics. While implementing the exact APIs is certainly optional (though helpful for folks writing isomorphic applications), WHATWG as effectively the only organization that is still evolving URL should certainly be seen as an (if not the) authoritative source. |
The HTTP spec (RFC 7230) has the following definition for the
request-target
(URI) of a request:We can follow this further to find that each form has similar requirements, all defined here, which references RFC 3986. We then see the definitions of
authority
(hostname and port) andpath-abempty
(pathname).The specification restricts these values to specific sets of characters, and for good reason! However, the
http
module in Node.js simply allows any characters to appear here. Even theurl
module does not enforce the specification. Is it expected that anyone wishing to build an http server in Node.js should painstakingly examine these RFCs and do the protocol-level validation themselves?The text was updated successfully, but these errors were encountered: