Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File URL reparse bug with percent encoded windows drive letter #559

Closed
rmisev opened this issue Oct 26, 2020 · 14 comments
Closed

File URL reparse bug with percent encoded windows drive letter #559

rmisev opened this issue Oct 26, 2020 · 14 comments

Comments

@rmisev
Copy link
Member

rmisev commented Oct 26, 2020

There is a reparse bug, when parsing URL with percent encoded windows drive letter (C|) in the host. URLs to test:

file://%43%7C
file://C%7C
file://%43|

Any of them parses to file://c|/, the latter - to file:///c:/.

Let's see how browsers parse them:

Browser first pass second pass
Chrome: file://c%7C/ file://c%7C/
Firefox: file:/// file:///
Safari: file://c|/ file:///c:/

Safari has the same bug as spec. Chrome suggests possible resolution.

Some ways how to fix bug:

  1. Percent encode | in the host after host parsing (as Chrome does)
  2. Fail if host contains | after host parsing.
  3. Check for Windows drive letter after host parsing.
@annevk
Copy link
Member

annevk commented Oct 26, 2020

Nice find. I kinda like 3 (and then returning failure or explicitly dropping it). 1 and 2 seem a bit too generic.

@rmisev
Copy link
Member Author

rmisev commented Oct 27, 2020

I think, if | is allowed in hosts, then it may be a bit strange why c| host is dropped or failed. So I prefer Chrome's behavior.
Percent encoding lets use c| as host (because percent encoded c| will not be treated as windows drive letter).
To not be too generic, I think | can be percent encoded if 3 succeeds.

Yet another example, which leads to the same bug:

var u = new URL("file://h/")
u.hostname = "c|"
// u.href == "file://c|/"

@achristensen07
Copy link
Collaborator

Does your table include Chrome on Linux or Chrome on Windows? There are some functional differences in Chromium's interpretation of Windows drive letters in URLs on different operating systems, as seen by looking for C|/foo/bar on https://wpt.fyi/results/url/url-constructor.html . I think it's important when considering Windows drive letters what happens on Windows. What is the behavior of new and old Edge? What about IE?

While I do think URL parsing should be idempotent, I don't have a strong opinion here. I initially think all those URLs should be canonicalized to file:///c:/ and parse successfully.

@rmisev
Copy link
Member Author

rmisev commented Oct 28, 2020

Does your table include Chrome on Linux or Chrome on Windows?

I tested both. The same result: file://c%7C/

What is the behavior of new and old Edge? What about IE?

Browser first pass second pass
new Edge: file://c%7C/ file://c%7C/
old Edge: file://c%7c/ file://c%7c/
IE: file://c%7c/ file://c%7c/

@achristensen07
Copy link
Collaborator

After tinkering around a bit more, I think we should have file://c%7C/ canonicalize to file:///c:/ and not return failure because file://c|/ canonicalizes to file:///c:/ in IE and Edge and the current specification has us percent-decode file hosts. It's worth making this change to stay idempotent.

@rmisev
Copy link
Member Author

rmisev commented Oct 30, 2020

I think I can agree with you, but what result must be here:

var u = new URL("file://h/path")
u.hostname = "c|"

Currently result (u.href) is file://c|/path, and after reparse: file:///c:/path.
But maybe it is better to reject such host name value? I think it may not be acceptable to change path by setting host name.

@achristensen07
Copy link
Collaborator

Ah, yes. I think we should reject a windows drive letter or something that would percent decode to a windows drive letter in the hostname setter but not in the main parser.

@alwinb
Copy link
Contributor

alwinb commented Nov 1, 2020

Coincidentally, step 1.3.1 in the file host state, states that file-hosts should be parsed as non-special, i.e. as opaque hosts, I think that's a typo.

@alwinb
Copy link
Contributor

alwinb commented Nov 16, 2020

It might not be an unreasonable idea to just add | to the set of forbidden host code points. It is not an URL code point, so it is invalid in hostnames as per the current standard already. RFC 3986 and 3987 do not allow it either, so it seems unlikely that there are URLs with a | in the hostname out there.

@achristensen07
Copy link
Collaborator

I tried implementing forbidding '|' in hosts in https://bugs.webkit.org/show_bug.cgi?id=220778

webkit-commit-queue pushed a commit to WebKit/WebKit that referenced this issue Jan 26, 2021
https://bugs.webkit.org/show_bug.cgi?id=220778

Patch by Alex Christensen <achristensen@webkit.org> on 2021-01-26
Reviewed by Youenn Fablet.

LayoutTests/imported/w3c:

* web-platform-tests/url/a-element-expected.txt:
* web-platform-tests/url/a-element-origin-expected.txt:
* web-platform-tests/url/a-element-origin-xhtml-expected.txt:
* web-platform-tests/url/a-element-xhtml-expected.txt:
* web-platform-tests/url/failure-expected.txt:
* web-platform-tests/url/resources/urltestdata.json:
* web-platform-tests/url/url-constructor-expected.txt:
* web-platform-tests/url/url-origin-expected.txt:

Source/WTF:

This is one of the proposed solutions to whatwg/url#559
and RFC 3986 and 3987 forbid such characters, so let's try forbidding it.

* wtf/URLParser.cpp:
(WTF::isC0Control):
(WTF::isForbiddenHostCodePoint):

LayoutTests:

* fast/url/file-http-base-expected.txt:
* fast/url/file-http-base.html:

Canonical link: https://commits.webkit.org/233360@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@271899 268f45cc-cd09-0410-ab3c-d52691b4dbfc
bertogg pushed a commit to Igalia/webkit that referenced this issue Feb 1, 2021
https://bugs.webkit.org/show_bug.cgi?id=220778

Patch by Alex Christensen <achristensen@webkit.org> on 2021-01-26
Reviewed by Youenn Fablet.

LayoutTests/imported/w3c:

* web-platform-tests/url/a-element-expected.txt:
* web-platform-tests/url/a-element-origin-expected.txt:
* web-platform-tests/url/a-element-origin-xhtml-expected.txt:
* web-platform-tests/url/a-element-xhtml-expected.txt:
* web-platform-tests/url/failure-expected.txt:
* web-platform-tests/url/resources/urltestdata.json:
* web-platform-tests/url/url-constructor-expected.txt:
* web-platform-tests/url/url-origin-expected.txt:

Source/WTF:

This is one of the proposed solutions to whatwg/url#559
and RFC 3986 and 3987 forbid such characters, so let's try forbidding it.

* wtf/URLParser.cpp:
(WTF::isC0Control):
(WTF::isForbiddenHostCodePoint):

LayoutTests:

* fast/url/file-http-base-expected.txt:
* fast/url/file-http-base.html:

git-svn-id: http://svn.webkit.org/repository/webkit/trunk@271899 268f45cc-cd09-0410-ab3c-d52691b4dbfc
@achristensen07
Copy link
Collaborator

Is there a consensus on changing the spec to forbid '|' in hosts to solve this issue?

@valenting
Copy link
Collaborator

Firefox already forbids | in the hostname.
I expect when we get around to actually allowing hostnames for file:// URLs that will still be the case.
So that's a 👍 from me.

@annevk
Copy link
Member

annevk commented Mar 17, 2021

@achristensen07 want to make the change to the specification as well?

achristensen07 added a commit to achristensen07/url that referenced this issue Mar 17, 2021
achristensen07 added a commit to achristensen07/url that referenced this issue Mar 17, 2021
@achristensen07
Copy link
Collaborator

Github found my PR to this repo. web-platform-tests/wpt#28118 is my PR for the tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants