Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain how syntax relates to the parser for hosts and URLs #228

Merged
merged 9 commits into from
Feb 10, 2017

Conversation

annevk
Copy link
Member

@annevk annevk commented Feb 2, 2017

Fixes #118 and fixes part of #209.


Preview | Diff

Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good start in the right direction, but I am concerned about two things, given the massive confusion we've seen elsewhere:

  • Lack of global explanation of the difference between conformant and parseable. Some abbreviated version of https://html.spec.whatwg.org/multipage/introduction.html#conformance-requirements-for-authors, or maybe just a link to it, might be a good idea. I would envision a sibling section to "Parsers" (and into which "syntax violation" would move). It might also be good to use this to give concrete examples of the usefulness of conformance checkers for URLs; one that comes to mind is text-entry software that only recognizes or autolinks conformant URLs.
  • Lack of local clarity while reading specific sections. I touch on this in the review, by suggesting renames like "URL string" and "URL syntax" to include "conformant" as a prefix so that when you read them without first reading the intros you're less confused. I also think an introductory sentence reiterating the fact that this is about conformance and not parsing would be good to add to the URL syntax section.

I think there is a lot of value in the work done to separate these and maintain both concepts, but as we've seen, the spec doesn't make it easy for people to appreciate that.

url.bs Outdated

<ul>
<li><p>The <a>host parser</a> takes an arbitrary string and returns either failure or a
<a for=/>host</a>. (This <a for=/>host</a> cannot be an <a>opaque host</a>, those can only be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comma should be semicolon; each half is a complete sentence

url.bs Outdated
<a>host serializer</a> relate as follows:

<ul>
<li><p>The <a>host parser</a> takes an arbitrary string and returns either failure or a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you want to link to infra for strings

url.bs Outdated

<li><p>The <a>URL serializer</a> takes a <a for=/>URL</a> and returns a string. (If that string
is then <a lt="URL parser">parsed</a>, the result will <a for=url>equal</a> the
<a lt="URL serializer">serialized</a> <a for=/>host</a>.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copypasta "host"

url.bs Outdated
@@ -823,6 +842,27 @@ unified model would be, please file an issue.
<!-- History behind URL as term:
https://lists.w3.org/Archives/Public/uri/2012Oct/0080.html -->

<p>At a high level, a <a for=/>URL</a>, <a>URL string</a>, <a>URL parser</a>, and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this makes me wonder if it should be "conformant URL string" or "valid URL string" instead. A web programmer probably thinks of "a URL string" as "a string that is a URL", with an ambiguous meaning of "is" that doesn't necessarily have the nuance of conformance involved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly maybe renaming the "URL syntax" section to "Conformant URL syntax"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related: whatwg/html#2318

Since HTML uses "valid URL" (and "valid foo" in general), maybe URL spec should use "valid URL string", and HTML can use that term directly?

@annevk
Copy link
Member Author

annevk commented Feb 8, 2017

Thoughts thus far:

  • A sibling section to "Parsers" doesn't really work since syntax doesn't require infrastructure. A "syntax violation" is really a parser concept since only the parser calls them out.
  • If we start renaming syntax to "Valid URLs" or "URL validity" we'd need a whole lot of accompanying changes. We'd also no longer match how HTML talks about this. I'm not sure that's an improvement.
  • I'm not entirely convinced the confusion is as widespread as it's made to appear. There are indeed a couple of folks with a shared background who are confused, but we've had that with HTML as well and as time passed those objections passed.

So I'm not entirely sure if I want to make more drastic changes at this point.

Things I'm open to doing:

  • Add a non-normative paragraph at the start of the syntax sections explaining who they are for, similar to what HTML does.

@domenic
Copy link
Member

domenic commented Feb 8, 2017

A sibling section to "Parsers" doesn't really work since syntax doesn't require infrastructure. A "syntax violation" is really a parser concept since only the parser calls them out.

Maybe renaming "Infrastructure" to "Shared concepts" then. I disagree that "syntax violation" is a parser concept; it's detected during the course of parsing, but it's much more interesting in the context of conformance checking, and the fact that conformance checking reuses the parser infrastructure is almost incidental. You could imagine a world where conformance checking uses a grammar instead, for example.

I'm not entirely convinced the confusion is as widespread as it's made to appear. There are indeed a couple of folks with a shared background who are confused, but we've had that with HTML as well and as time passed those objections passed.

I don't think the confusion has passed for HTML over time. We see implementers unaware of the difference on almost a weekly basis. To me anything we can do to make it more explicit helps the situation.

Anyway, it's fine if you don't want to change much; what's in this PR is already an improvement and we shouldn't block it.

@annevk
Copy link
Member Author

annevk commented Feb 8, 2017

Yeah, you're right that folks get confused by HTML too and I was wrong that I followed HTML here. HTML calls the whole setup syntax, and then distinguishes between writing and parsing. Maybe that's what URL should do. (See https://html.spec.whatwg.org/multipage/#toc-syntax.)

@annevk annevk force-pushed the annevk/concept-relations branch from b1e3159 to 19e3ec7 Compare February 8, 2017 15:27
url.bs Outdated

<div class="note no-backref">
<p>A <a>writing violation</a> does not mean that the parser terminates. Termination of a parser is
always stated explicitly, E.g., through a return statement.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lowercase e.g.

url.bs Outdated
always stated explicitly, E.g., through a return statement.

<p>It is useful to signal <a>writing violations</a> as error-handling can be non-intuitive, legacy
user agents might not implement correct error-handling, the intent of what is written might be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing "and"

url.bs Outdated
@@ -74,6 +74,22 @@ DOM, Encoding, IDNA, and Web IDL Standards.
number.


<h3 id=writing>Writing</h3>

<p>A <dfn oldids=syntax-violation>writing violation</dfn> indicates a non-fatal mismatch between
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, this turn of phrase just seems awkward... a "violation of writing"? I think it's OK for the section to be about writing URLs, but to call it a "conformance violation" or "syntax violation" still.

url.bs Outdated
<h3 id=writing>Writing</h3>

<p>A <dfn oldids=syntax-violation>writing violation</dfn> indicates a non-fatal mismatch between
input and writing requirements. User agents, especially conformance checkers are encouraged to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing comma after "conformance checkers"

url.bs Outdated
<h3 id=writing>Writing</h3>

<p>A <dfn oldids=syntax-violation>writing violation</dfn> indicates a non-fatal mismatch between
input and writing requirements. User agents, especially conformance checkers are encouraged to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again "writing requirements" is a bit of an odd turn of phrase. It could work with some explanation, probably... Maybe "requirements for writing URLs" would be enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't work for hosts.

<var>result</var>.
</ol>


<h3 id=host-syntax>Host syntax</h3>
<h3 id=host-writing oldids=host-syntax>Host writing</h3>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Writing hosts" or "Writing conformant hosts" maybe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All other sections lead with "Host".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess "Book writing" is not that much worse than "Writing books".

url.bs Outdated

<li><p>The <a>URL serializer</a> takes a <a for=/>URL</a> and returns a string. (If that string
is then <a lt="URL parser">parsed</a>, the result will <a for=url>equal</a> the
<a lt="URL serializer">serialized</a> <a for=/>URL</a>.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it will equal the serialized URL; the serialized URL is a string. Maybe "the URL that was serialized".

url.bs Outdated
<ul>
<li><p>The <a>host parser</a> takes an arbitrary string and returns either failure or a
<a for=/>host</a>. (This <a for=/>host</a> cannot be an <a>opaque host</a>; those can only be
returned through the <a>URL parser</a>.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cdbcce6 made it so that the host parser can return an opaque host, so this note is no longer true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent point, thanks!

@annevk annevk force-pushed the annevk/concept-relations branch from e889fbb to 56b60c5 Compare February 9, 2017 11:17
@annevk
Copy link
Member Author

annevk commented Feb 9, 2017

So we discussed terms before in #60 with @sideshowbarker and decided to rename from "parse error" then because of #59 (comment).

There's additionally the question of whether to signify both non-fatal and fatal the same way. I think we probably should signify them the same way (since it's still a requirements mismatch), but fatal has additionally the failure return value.

"Conformance violation" would work, but I don't like that we then have both "valid" and "conformance". And "conforming" seems overall like a more complicated word.

Anyone any good ideas?

@domenic
Copy link
Member

domenic commented Feb 9, 2017

"validity error" or "validation error"? Not sure if that crosses over into the line of "bad" that "validator" does.

@annevk
Copy link
Member Author

annevk commented Feb 9, 2017

I still don't really understand the problem with "validator". "Validation error" seems reasonable. @sideshowbarker?

@sideshowbarker
Copy link
Contributor

"Validation error" seems reasonable.

Yes agreed

@annevk
Copy link
Member Author

annevk commented Feb 9, 2017

Thank you both! Getting somewhere.

Commit message:

Attempt to explain valid input better

In particular, do away with the word "syntax" as that causes lots of confusion and focus on validity instead. Also explain the relationship between the parser, serializer, model, and (valid) input.

"Syntax violation" is now known as "validation error".

Fixes #118 and fixes part of #209.

@annevk annevk force-pushed the annevk/concept-relations branch from cb49e6f to 8fc66a8 Compare February 9, 2017 18:09
Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although re-reading this reminds me of #219 and how we should work on solving that at some point.

@annevk
Copy link
Member Author

annevk commented Feb 10, 2017

I kinda addressed that issue in #228 (comment). I think the duplication is fine and more clear, especially with the revised wording.

@annevk annevk merged commit 50cb9ab into master Feb 10, 2017
@annevk annevk deleted the annevk/concept-relations branch February 10, 2017 08:10
rmisev added a commit to upa-url/upa that referenced this pull request May 24, 2020
rmisev added a commit to upa-url/upa that referenced this pull request May 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

It's not immediately clear that "URL syntax" and "URL parser" conflict
6 participants