-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explain how syntax relates to the parser for hosts and URLs #228
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good start in the right direction, but I am concerned about two things, given the massive confusion we've seen elsewhere:
- Lack of global explanation of the difference between conformant and parseable. Some abbreviated version of https://html.spec.whatwg.org/multipage/introduction.html#conformance-requirements-for-authors, or maybe just a link to it, might be a good idea. I would envision a sibling section to "Parsers" (and into which "syntax violation" would move). It might also be good to use this to give concrete examples of the usefulness of conformance checkers for URLs; one that comes to mind is text-entry software that only recognizes or autolinks conformant URLs.
- Lack of local clarity while reading specific sections. I touch on this in the review, by suggesting renames like "URL string" and "URL syntax" to include "conformant" as a prefix so that when you read them without first reading the intros you're less confused. I also think an introductory sentence reiterating the fact that this is about conformance and not parsing would be good to add to the URL syntax section.
I think there is a lot of value in the work done to separate these and maintain both concepts, but as we've seen, the spec doesn't make it easy for people to appreciate that.
url.bs
Outdated
|
||
<ul> | ||
<li><p>The <a>host parser</a> takes an arbitrary string and returns either failure or a | ||
<a for=/>host</a>. (This <a for=/>host</a> cannot be an <a>opaque host</a>, those can only be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comma should be semicolon; each half is a complete sentence
url.bs
Outdated
<a>host serializer</a> relate as follows: | ||
|
||
<ul> | ||
<li><p>The <a>host parser</a> takes an arbitrary string and returns either failure or a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if you want to link to infra for strings
url.bs
Outdated
|
||
<li><p>The <a>URL serializer</a> takes a <a for=/>URL</a> and returns a string. (If that string | ||
is then <a lt="URL parser">parsed</a>, the result will <a for=url>equal</a> the | ||
<a lt="URL serializer">serialized</a> <a for=/>host</a>.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copypasta "host"
url.bs
Outdated
@@ -823,6 +842,27 @@ unified model would be, please file an issue. | |||
<!-- History behind URL as term: | |||
https://lists.w3.org/Archives/Public/uri/2012Oct/0080.html --> | |||
|
|||
<p>At a high level, a <a for=/>URL</a>, <a>URL string</a>, <a>URL parser</a>, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading this makes me wonder if it should be "conformant URL string" or "valid URL string" instead. A web programmer probably thinks of "a URL string" as "a string that is a URL", with an ambiguous meaning of "is" that doesn't necessarily have the nuance of conformance involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly maybe renaming the "URL syntax" section to "Conformant URL syntax"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related: whatwg/html#2318
Since HTML uses "valid URL" (and "valid foo" in general), maybe URL spec should use "valid URL string", and HTML can use that term directly?
Thoughts thus far:
So I'm not entirely sure if I want to make more drastic changes at this point. Things I'm open to doing:
|
Maybe renaming "Infrastructure" to "Shared concepts" then. I disagree that "syntax violation" is a parser concept; it's detected during the course of parsing, but it's much more interesting in the context of conformance checking, and the fact that conformance checking reuses the parser infrastructure is almost incidental. You could imagine a world where conformance checking uses a grammar instead, for example.
I don't think the confusion has passed for HTML over time. We see implementers unaware of the difference on almost a weekly basis. To me anything we can do to make it more explicit helps the situation. Anyway, it's fine if you don't want to change much; what's in this PR is already an improvement and we shouldn't block it. |
Yeah, you're right that folks get confused by HTML too and I was wrong that I followed HTML here. HTML calls the whole setup syntax, and then distinguishes between writing and parsing. Maybe that's what URL should do. (See https://html.spec.whatwg.org/multipage/#toc-syntax.) |
b1e3159
to
19e3ec7
Compare
url.bs
Outdated
|
||
<div class="note no-backref"> | ||
<p>A <a>writing violation</a> does not mean that the parser terminates. Termination of a parser is | ||
always stated explicitly, E.g., through a return statement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lowercase e.g.
url.bs
Outdated
always stated explicitly, E.g., through a return statement. | ||
|
||
<p>It is useful to signal <a>writing violations</a> as error-handling can be non-intuitive, legacy | ||
user agents might not implement correct error-handling, the intent of what is written might be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing "and"
url.bs
Outdated
@@ -74,6 +74,22 @@ DOM, Encoding, IDNA, and Web IDL Standards. | |||
number. | |||
|
|||
|
|||
<h3 id=writing>Writing</h3> | |||
|
|||
<p>A <dfn oldids=syntax-violation>writing violation</dfn> indicates a non-fatal mismatch between |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, this turn of phrase just seems awkward... a "violation of writing"? I think it's OK for the section to be about writing URLs, but to call it a "conformance violation" or "syntax violation" still.
url.bs
Outdated
<h3 id=writing>Writing</h3> | ||
|
||
<p>A <dfn oldids=syntax-violation>writing violation</dfn> indicates a non-fatal mismatch between | ||
input and writing requirements. User agents, especially conformance checkers are encouraged to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing comma after "conformance checkers"
url.bs
Outdated
<h3 id=writing>Writing</h3> | ||
|
||
<p>A <dfn oldids=syntax-violation>writing violation</dfn> indicates a non-fatal mismatch between | ||
input and writing requirements. User agents, especially conformance checkers are encouraged to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again "writing requirements" is a bit of an odd turn of phrase. It could work with some explanation, probably... Maybe "requirements for writing URLs" would be enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't work for hosts.
<var>result</var>. | ||
</ol> | ||
|
||
|
||
<h3 id=host-syntax>Host syntax</h3> | ||
<h3 id=host-writing oldids=host-syntax>Host writing</h3> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Writing hosts" or "Writing conformant hosts" maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All other sections lead with "Host".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess "Book writing" is not that much worse than "Writing books".
url.bs
Outdated
|
||
<li><p>The <a>URL serializer</a> takes a <a for=/>URL</a> and returns a string. (If that string | ||
is then <a lt="URL parser">parsed</a>, the result will <a for=url>equal</a> the | ||
<a lt="URL serializer">serialized</a> <a for=/>URL</a>.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it will equal the serialized URL; the serialized URL is a string. Maybe "the URL that was serialized".
url.bs
Outdated
<ul> | ||
<li><p>The <a>host parser</a> takes an arbitrary string and returns either failure or a | ||
<a for=/>host</a>. (This <a for=/>host</a> cannot be an <a>opaque host</a>; those can only be | ||
returned through the <a>URL parser</a>.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cdbcce6 made it so that the host parser can return an opaque host, so this note is no longer true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent point, thanks!
e889fbb
to
56b60c5
Compare
So we discussed terms before in #60 with @sideshowbarker and decided to rename from "parse error" then because of #59 (comment). There's additionally the question of whether to signify both non-fatal and fatal the same way. I think we probably should signify them the same way (since it's still a requirements mismatch), but fatal has additionally the failure return value. "Conformance violation" would work, but I don't like that we then have both "valid" and "conformance". And "conforming" seems overall like a more complicated word. Anyone any good ideas? |
"validity error" or "validation error"? Not sure if that crosses over into the line of "bad" that "validator" does. |
I still don't really understand the problem with "validator". "Validation error" seems reasonable. @sideshowbarker? |
Yes agreed |
Thank you both! Getting somewhere. Commit message:
|
cb49e6f
to
8fc66a8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, although re-reading this reminds me of #219 and how we should work on solving that at some point.
I kinda addressed that issue in #228 (comment). I think the duplication is fine and more clear, especially with the revised wording. |
Fixes #118 and fixes part of #209.
Preview | Diff