Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is file:///C|/foo/bar meant to be a syntax violation? #209

Closed
dd8 opened this issue Jan 16, 2017 · 8 comments
Closed

Is file:///C|/foo/bar meant to be a syntax violation? #209

dd8 opened this issue Jan 16, 2017 · 8 comments

Comments

@dd8
Copy link

dd8 commented Jan 16, 2017

file:///C:/foo/bar parses ok using the state machine, but file:///C|/foo/bar flags a syntax violation in the path state because | is not a URL code point while it's scanning for the / at end of the C| path segment:

  1. Otherwise, run these steps:
    If c is not a URL code point and not "%", syntax violation.

Is that intended? C| is not the normalised window drive letter form, but it's not clear from any of the narrative it's a syntax violation.

@dd8 dd8 changed the title Is file:///C|/foo/bar supposed to be a syntax violation? Is file:///C|/foo/bar meant to be a syntax violation? Jan 16, 2017
@jasnell
Copy link
Collaborator

jasnell commented Jan 16, 2017

Yes, C| has historically been used (rather unfortunately) as an alternative for C: in some contexts. It is a syntax violation but for the sake of being forgiving, it is still accepted by the parser

@dd8
Copy link
Author

dd8 commented Jan 16, 2017

It's probably be a good idea to add a note to that effect to https://url.spec.whatwg.org/#windows-drive-letter definition (otherwise you need to step through the state machine to find this out)

I was uncertain because there are bunch of test suite files in https://github.com/w3c/web-platform-tests/tree/master/conformance-checkers and https://github.com/validator/tests showing file:///C|/foo/bar as a must-be-valid case.

I'll raise an issue in the web-platform-tests repo

@domenic
Copy link
Member

domenic commented Jan 16, 2017

This is supposedly covered by https://url.spec.whatwg.org/#url-syntax which is pretty clear that the | form is disallowed. But I guess there's no overall note explaining that that section is meant to provide a conformance guideline.

@annevk
Copy link
Member

annevk commented Jan 17, 2017

So duplicate of #118 then?

@annevk
Copy link
Member

annevk commented Jan 17, 2017

I guess one thing that's novel to this issue is where those terms in the Infrastructure section should go. They're mostly there because I couldn't find a better place, but maybe they should go into the top-level section 4.

@dd8
Copy link
Author

dd8 commented Jan 17, 2017

I think one thing that could be clearer in the current spec is the high level division of URLs into:

  • conforming URLs
  • non-conforming, but parsable URLs (syntax violations)
  • non-parsable URLs (return failure)

Adding some syntax violations to the examples table would make this clearer. Would be good to add file: URLs to the examples table (both conforming and non-conforming) since the long history of interop problems with file: URLs suggests they're a source of much confusion, so need most clarity.

annevk added a commit that referenced this issue Feb 2, 2017
Also point out that the variant with a trailing “|” is not conforming.

Fixes part of #209.
annevk added a commit that referenced this issue Feb 2, 2017
@annevk
Copy link
Member

annevk commented Feb 2, 2017

I created some PRs that go somewhat towards those goals @dd8. Hope they help.

The main problem with the table at the moment is that it's specific to the parser. I guess you're suggesting we should generalize it? We discussed that a bit in #177 but didn't go for it. I guess I can do that once the above two PRs are accepted as improving the status quo and close this issue with that table change.

annevk added a commit that referenced this issue Feb 7, 2017
Also point out that the variant with a trailing “|” is not conforming.

Fixes part of #209.
annevk added a commit that referenced this issue Feb 8, 2017
annevk added a commit that referenced this issue Feb 9, 2017
annevk added a commit that referenced this issue Feb 9, 2017
annevk added a commit that referenced this issue Feb 10, 2017
In particular, do away with the word "syntax" as that causes lots of confusion and focus on validity instead. Also explain the relationship between the parser, serializer, representation, and (valid) input.

"Syntax violation" is now known as "validation error".

Fixes #118 and fixes part of #209.
annevk added a commit that referenced this issue Feb 10, 2017
Also make sure the base URLs are actually represented in serialized
form by adding trailing slashes.

Fixes #209.
@annevk
Copy link
Member

annevk commented Feb 10, 2017

I posted the final PR, but I forgot to add file URL examples. Will do that now. If anyone has suggestions for ones they think should be included, that would be most welcome.

annevk added a commit that referenced this issue Feb 10, 2017
Also add a couple more examples for file URLs and various URLs discussed in #118.

Also make sure the base URLs are actually represented in serialized form by adding trailing slashes.

Fixes #209.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants