-
-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect handling of empty host in absolute URL #821
Comments
Hi, if you could send a PR with just a test that could then be marked as xfail, it'd be useful. |
Done! I'm not too familiar with the codebase, but I imagine the fix is to delete lines 187 and 188 in 187: if host is None:
188: raise ValueError("Invalid URL: host is required for absolute urls") A host is always permitted to be empty, unless scheme-specific rules say otherwise. |
This is a bit more nuanced because http says empty host is invalid, but other schemes may allow it https://datatracker.ietf.org/doc/html/rfc3986#section-3.2.2
|
Yep! See my previous message:
The "http" scheme is an example of a case in which scheme-specific rules do indeed say otherwise. So a correct fix for this issue would ensure that both
|
The big challenge is having a list of each scheme and which ones support no host, and which ones don’t |
What’s the real world use case you are solving for here. I’m assuming |
This library already special-cases a few schemes to give them default port numbers. This seems like it could be handled in the same way.
Minimization of programmer surprise :) |
Let me ask the question another way: Which schemes do you care about? |
All of them. I care about the parser being correct, not any individual scheme or schemes. |
We can't realistically add support for every conceivable scheme or hope to maintain a list of which ones require a host and which ones don't unless there is some external data source that can provide this for us. |
I think we're all in agreement that the current behavior is not correct for a few reasons:
Fixing either one of these problems requires enumerating which URL schemes disallow empty authorities or hosts. We already special-case certain schemes for default port numbers, so maintaining another list of schemes seems to be within scope.
The WHATWG URL spec is this data source. It specifies that "ftp", "http", "https", "ws", and "wss" (i.e., all special schemes except "file") require nonempty hosts, and everything else allows empty hosts. This is the rule that Ada and libcurl enforce, and is the most reasonable position in my opinion. |
If the answer is that we only care about enforcing non-empty host on |
Let me know if #1136 is what you were looking for |
Looks good to me! |
Describe the bug
Absolute URLs are permitted to have empty hosts in RFC 3986.
Relevant grammar rules:
Thus, a URL like
a://:1
conforms to the standard.However, yarl rejects this URL.
urllib3
, CPythonurllib
,rfc3986
,furl
, andhyperlink
all correctly handle this situation.To Reproduce
Try running the following snippet:
Expected behavior
The parse should have succeeded, resulting in
URL('a://:1')
.Logs/tracebacks
multidict Version
yarl Version
OS
Arch Linux
Additional context
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: