It's not immediately clear that "URL syntax" and "URL parser" conflict #118

domenic · 2016-05-09T13:28:58Z

URL syntax is a model for valid URLs---basically "authoring requirements". The URL parser section allows parsing URLs which do not follow URL syntax.

An easy example is https://////example.com, which is disallowed because the portion after https: contradicts

A scheme-relative URL
must be "//", followed by a host, optionally followed by ":" and a port, optionally followed by a path-absolute URL.

/cc @bagder via some Twitter confusion this morning.

The text was updated successfully, but these errors were encountered:

bagder · 2016-05-09T13:37:00Z

Is there really any reason for accepting more than two slashes for non-file: URLs? I mean apart from this spec saying that the parser should accept them.

domenic · 2016-05-09T13:38:09Z

The fact that all browsers do.

bagder · 2016-05-10T09:55:30Z

The fact that all browsers do.

I tested Safari on a recent OS X version and it doesn't even accept three slashes. Not in the address bar and not in Location: headers in a redirect. It handles one or two slashes, no more. So I refute your claim.

The fact that all browsers do

That's exactly the sort of mindset that will prevent the WHATWG URL spec to ever become the universal URL spec. URLs need to be defined to work in more contexts than browsers.

annevk · 2016-05-10T10:14:16Z

URLs need to be defined to work in more contexts than browsers.

They are, no? Handling multiple slashes or not seems irrespective of that. If Safari does not do it there might be wiggle room, or Safari might hit similar compatibility issues to curl.

bagder · 2016-05-10T10:31:59Z

That's a large question and too big of a subject for me to address fully here.

URLs in the loose sense of the term are used all over the place.

URLs by the WHATWG definition are probably not used by much else than a handful of browsers, no. In my view (wearing my curl goggles), there are several reasons why we can't expect that to change much short-term going forward either. Like this slash issue shows.

I would love a truly universal and agreed URL syntax, but in my view we've never been further away from that than today.

domenic · 2016-05-10T15:25:31Z

I'm sorry for the imprecision. We often use "all browsers" to mean "the consensus browser behavior, modulo minor deviations and bugs."

The URL Standard defines URLs for software that wants to be compatible with browsers, and participate in the ecosystem of content which produces and consumes URLs meant for browsers. If cURL does not want to be part of that ecosystem, then yes, the URL Standard is probably not a good fit for cURL. But we've found over time that most software (e.g. servers which which to interact with browsers, or scraping tools which wish to be able to scrape the same sites as browsers visit) wants to converge on those rules.

bagder · 2016-05-10T17:49:06Z

We often use "all browsers" to mean "the consensus browser behavior, modulo minor deviations and bugs."

This made me also go and check IE11 on win7, and you know what? It doesn't support three slashes either.

To me, this is important. It shows you've added a requirement to the spec that a notable share of browsers don't support. When I ask about why (because it really makes no sense to me), you make a recursive answer and say you did this because "all browsers" act like this. Which we now know isn't true. It's just backwards in so many levels.

If cURL does not want to be part of that ecosystem

Being part of that ecosystem does not mean that I blindly just suck up what the WHATWG says a URL is without me questioning and asking for clarification and reasoning. Being here, asking questions, responding, complaining, is part of being in the ecosystem.

curl already is and has been part of the ecosystem since a very long time. Deeply, firmly and actively - we have supported and worked with URLs since back when they were still truly standard "URLs" (RFC 1738). I'm here, writing this, because I'd rather want an interoperable world where we pass URLs back and forth and we agree on what they mean.

When you actively decide to break RFC 3986 and in extension RFC 7231 for the Location: header I would prefer you could explain why. If you want to be a part of the ecosystem.

the URL Standard is probably not a good fit for cURL

I wish we worked on a URL standard, then I'd participate and voice my opinions like I do with some other standards work. A URL standard is very much a good idea for curl and for the entire world.

A URL works in browsers and outside of browsers. It can be printed on posters, it works parsed and highlighted by terminal emulators or IRC clients, they get parsed by scripts and they get read out loud over the phone by kids to their grandparents. URLs are, or at least could be, truly universal. Limiting the scope to "all browsers" limits the usability of them. It fragments what a URL is and how it works (or not) in different places and for which uses.

If you want a URL standard, you must look beyond "all browsers".

domenic · 2016-05-10T17:51:23Z

This made me also go and check IE11 on win7, and you know what? It doesn't support three slashes either.

Edge does, however: http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4182

In general, Edge has made changes like this to be compatible with the wider ecosystem of web content. I can't speak for their engineers, but this shows clear convergence.

It's good to hear you're interested in participating. That wasn't my impression from your earlier comments, and I welcome the correction.

JohnMH · 2016-05-10T23:16:52Z

Why should malformatted URLs be parsed? Surely the solution is to simply tell people who are using malformatted URLs to.. stop using malformatted URLs?

jyasskin · 2016-05-10T23:31:32Z

In the interest of looking for ways forward, instead of just saying "no", per https://twitter.com/yoavweiss/status/730173495464894465, it might make sense to collect usage data and see if browsers can simplify the URL grammar.

JohnMH · 2016-05-10T23:38:03Z

It may be best to ignore that browsers even use URLs, because there are definitely other pieces of software that use URLs. Consider the following URL: irc://network:port/#channel

On May 10, 2016 7:31:35 PM EDT, Jeffrey Yasskin notifications@github.com wrote:

In the interest of looking for ways forward, instead of just saying
"no", per https://twitter.com/yoavweiss/status/730173495464894465, it
might make sense to collect usage data and see if browsers can simplify
the URL grammar.

You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#118 (comment)

John M. Harris, Jr.
PGP Key: f2ea233509f192f98464c2e94f8f03c64bb38ffd

Sent from my Android device. Please excuse my brevity.

domenic · 2016-05-10T23:40:54Z

I'd suggest the following plan to any browsers interested in tightening the URL syntax they accept:

look through https://github.com/w3c/web-platform-tests/blob/master/url/urltestdata.json and find all non-error results that you wish became error results
(optionally, wait until I finish my long-delayed project to expand that file to give 100% coverage of the spec. Currently it gives around 60%.)
Instrument all URL parsing to figure out how often these undesirable patterns occur
When the numbers come back, decide which percentage of users or pages you are willing to break, and pick the subset of algorithm changes you can make to do so up to that percentage
optionally, weigh the percent of users broken vs. the corresponding spec or implementation complexity reduction
start coordinating with other vendors to see which of your changes they're interested in. (Some cases might already behave differently in those other vendors' browsers, which could help!)
Ship, preferably in a coordinated fashion with appropriate devrel support.

JohnMH · 2016-05-11T00:26:45Z

Browsers are not the only applications that use URLs.

On May 10, 2016 7:40:56 PM EDT, Domenic Denicola notifications@github.com wrote:

I'd suggest the following plan to any browsers interested in tightening
the URL syntax they accept:

look through
https://github.com/w3c/web-platform-tests/blob/master/url/urltestdata.json
and find all non-error results that you wish became error results

(optionally, wait until I finish my long-delayed project to expand
that file to give 100% coverage of the spec. Currently it gives around
60%.)

Instrument all URL parsing to figure out how often these undesirable
patterns occur

When the numbers come back, decide which percentage of users or pages
you are willing to break, and pick the subset of algorithm changes you
can make to do so up to that percentage

optionally, weigh the percent of users broken vs. the corresponding
spec or implementation complexity reduction

start coordinating with other vendors to see which of your changes
they're interested in. (Some cases might already behave differently in
those other vendors' browsers, which could help!)

Ship, preferably in a coordinated fashion with appropriate devrel
support.

You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#118 (comment)

John M. Harris, Jr.
PGP Key: f2ea233509f192f98464c2e94f8f03c64bb38ffd

Sent from my Android device. Please excuse my brevity.

domenic · 2016-05-11T00:29:54Z

@JohnMHarrisJr your comment seems irrelevant to my plan for "any browsers interested in tightening the URL syntax they accept".

ghost · 2016-05-11T02:02:17Z

The syntax for URIs is such that the authority component (user:password@host:port) is always separated from the scheme by two slashes, except for some schemes that do not require them. The path may only begin with // if the authority component is there, and in that case it must begin with a slash. So there is no possible case where there would be more than three slashes after the colon following the URI scheme.(1) HTTP in particular requires the two slashes between the URI scheme and the authority component, so there should always be exactly two slashes between the http URI scheme and the URI component.(2)

In other words, http://user:password@host:port/path is valid, http://user:password@host:port//path might be valid, but anything else is definitely not valid.

URLs being a subset of URIs, it would only make sense to follow the standards that have already been established for a long time—especially since they make sense.

jyasskin · 2016-05-11T02:29:35Z

The main difference between the WHATWG and some other standards organizations is that the WHATWG attempts to describe the world as it is, rather than as folks would like it to be. That means that if a major implementation of URLs implements them in one way, the WHATWG specification needs to allow that implementation. So, it doesn't help to "prove" that browsers are wrong by citing earlier specifications.

That said, it seems like there's a good argument to have the URL spec allow implementations to reject too many slashes, since at least one recent browser and several other tools do reject them.

domenic · 2016-05-11T02:35:56Z

Well, we want interoperable behavior, so it's either accept or reject. There's room for "accept with a console log message telling you you're being bad" (the spec already has this concept), but it would take some cross-browser agreement to move toward "reject".

ghost · 2016-05-11T03:19:14Z

@domenic
That would mean web browsers are not allowed not to accept something that is pointless and has been considered invalid since URIs were a thing, only to accept it but give a console log message. Which again considers only web browsers. Other applications that use URLs probably don't have a console for logging such messages.

@jyasskin
If the goal is interoperability, standardizing the behavior of the major (read: popular) implementations—which for WHATWG usually means "let's standardize whatever Google Chrome does, other browsers don't matter as much and anything that isn't a web browser or isn't the HTTP protocol doesn't matter at all"—isn't the best option since these implementations are usually the most actively developed and the ones that care most about those standards. If the standards make another decision than what these implementations do, it's more likely that they would change their behavior than it is that the other implementations would. Of course, this sounds like a bad argument, but that's only because it relies on a wrong premise, which is that defining things based on the major implementations is a good idea.

This isn't surprising given that many people interested in WHATWG use Chrome, have Gmail email addresses, or are Google employees. The others are with Mozilla, probably use Firefox, and probably use Gmail email addresses.

This approach of standards as a popularity contest is harming the web, it tries to make tools like curl that already do URL parsing correctly and very well behave like the popular web browsers for "interoperability". And the popular web browsers behave like they do only in order to support every unreasonable thing that can be found on web pages, because their market share depends on supporting as many web pages as possible so that users don't switch to another browser. And then other browsers, and tools like curl, are expected to do the same because a spec says to!

domenic · 2016-05-11T03:23:48Z

Your claims about the WHATWG having a Chrome bias are false. Please be respectful and make on-topic, evidence-based comments, and not ad hominem attacks, or moderation will be required.

ghost · 2016-05-11T03:28:32Z

@domenic I admit my comment is the result of frustration, and neither on-topic, nor evidence-based, nor respectful.

domenic · 2016-05-11T03:36:14Z

Thanks for that. We can hopefully keep things more productive moving forward.

At this point I think the thread's original action item (from my OP) still stands, to clarify the authoring conformance requirements for "valid" URLs, versus the user agent requirements for parsing URLs.

Besides that, there seems to be some interest from some people in getting browsers (and other software in the ecosystem that wishes to be compatible with browsers) to move toward stricter URL parsing. I think my plan at #118 (comment) still represents the best path there.

As for the particular issue of more than two slashes, I have very little hope that this can be changed to restrict to two slashes, since software like cURL is already encountering compatibility issues, and we can at least guess that the change from IE11 to Edge to support 3+ slashes might also be compatibility motivated. (Does anyone want to try http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=4182 in Safari tech preview to see if they've changed as well?)

But, of course, more data could be brought to bear, as I outlined in my plan. I personally don't think three slashes is the worst kind of URL possible out of all the weird URLs in https://github.com/w3c/web-platform-tests/blob/master/url/urltestdata.json (my vote goes to maybe h\tt\nt\rp://h\to\ns\rt:9\t0\n0\r0/p\ta\nt\rh?q\tu\ne\rry#f\tr\na\rg) but it seems like a lot of people care about that, so maybe browsers will want to invest effort in measuring that particular use case.

justjanne · 2016-05-11T11:24:47Z

@domenic Considering how many people write software working with existing URL libraries, wouldn’t it be more useful to define URL as "whatever the majority of tools support" (like the URL libraries in about every language, framework, command line tool, server framework, etc)?

Sure, what users input should be accepted, but considering that the input bars of browsers will also happily accept any text (and, if search is off, prepend an http://www., and append a .com/), is already a sign that maybe the definition here is wrong.

Maybe we need a defined spec for a single correct storage format for an identifier, and additionally an algorithm on how to get to this identifier based on user input.

"Google.com" is not a URL, although users think it is one – seperating the actual identifier and the user-visible representations might be helpful here (especially for people writing tools, as they can then declare "we accept only the globally defined format", and let you use other libraries for transforming user-input into that format).

annevk · 2016-05-11T11:27:33Z

@justjanne the URL Standard does not concern itself with the address bar. That is a misconception. It concerns itself with URLs found in <a>, Location headers, URL API, <form>, XMLHttpRequest, etc.

annevk · 2016-05-11T11:29:00Z

To be crystal clear, the browser address bar is UX and accepts all kinds of input that is not standardized. And it is totally up to each browser how they want to design that. They could even make it not accept textual input if they wanted to. That code has no influence on how URLs are parsed.

justjanne · 2016-05-11T11:29:24Z

@annevk It also concerns itself with URLs used for cross-app communication in Android, for IPC in several situations, etc. (Android uses for intents a URL format, and for cross-app communication, and doesn’t accept more than 2 or 1 slash either)

It also concerns itself with address bars.

What I was suggesting that maybe we should split it into one specific, defined representation (which libraries, tools, Android, cURL, etc could accept), and one additional definition for "how to parse input/Location headers/etc into that format".

Because obviously browsers have to accept a lot more malformed input than other tools, but also it’s obvious that not every tool should include a way to try and fix malformed input itself.

annevk · 2016-05-11T11:37:42Z

That is basically how it is structured today. There's a parser that parses input into a URL record. And that URL record can then be serialized. The URL record and its serialization are much more constrained than the input.

I think that cURL basically wants the same parsing library in the end. It has already adopted many parts of it. I'm sure as it encounters more content it will gradually start to do the same thing. There's some disagreement on judgments calls such as whether or not to allow multiple slashes, but given that browsers are converging there (see Edge) and there is content that relies on it (see cURL's experience) I'm not sure why we'd change that.

justjanne · 2016-05-11T11:39:34Z

@annevk That’d be cURL – but do you suggest every tool that handles URLs is rewritten based on this single implementation of a parser? Do you actually want Android to accept single or triple slashes in cross-app communication? You’d add a lot more ambiguity, complexity, and performance issues to any software working with it.

There are use cases for when you want to accept malformed input (for example, when it comes from a user), and there are use cases where you don’t. The definition of URL should be what you call "serialization of a URL record".

(And, IMO, for cURL it would be better to split the parsing into a URL record into a seperate tool, and do url-parse "http:/google.com/" | curl. The UNIX principle applies in many places, including here. )

justjanne · 2017-10-19T14:34:59Z

@nox Yet, you only contacted browser vendors. https://///url.spec.whatwg.org/ writes that the goal of the WHATWG is to

Align RFC 3986 and RFC 3987 with contemporary implementations and obsolete them in the process.

Replacing the URL spec (RFCs 3986 and 3987) also includes everything that uses it. From offline usage to libraries in languages, from autolinking regex used on Android to inter-process communication schemes. From fileformats that use URLs (every XML implementation) to low-level filesystem APIs in every operating system.

If you state that your official goal is to obsolete and replace the standard all these use, you'll need to open a discussion with a great deal more stakeholders. Or you need to go back, and rethink if replacing the entire URL spec, and every usage of it, is such a good idea.

annevk · 2017-10-20T11:46:33Z

It's easy enough to find emails seeking feedback on this document on various mailing lists, including those controlled by W3C and IETF. Where else would you like us to solicit feedback from?

justjanne · 2017-10-20T11:58:24Z

@annevk let's ask the other way around.

WHATWG officially declared the URL RFCs obsolete.

So, what format should I use to specify URLs in XML files, in my HTML, on Android for Intents, in my file explorer, and elsewhere?

You said your standard hasn't failed, so is there a single place outside of web browsers that has adopted your spec? You say all other specs are obsolete, so can I use your format everywhere now?

Not even your issue tracker has adopted your spec: https:////url.spec.whatwg.org/ — how is this not a failure?

annevk · 2017-10-20T12:03:06Z

@justjanne I don't understand why the issue tracker would have to accept non-conforming URLs. There's all kind of restrictions and heuristics around plain text URLs that don't normally apply. (This also goes for the address bar as @magcius pointed out, which is UX and has vastly different considerations from URLs transmitted via other means.)

As an example of adoption outside browsers: Node.js ships an implementation.

justjanne · 2017-10-20T12:08:13Z

@annevk Why is it non-conforming? According to the spec on url.spec.whatwg.org, it's a valid URL.

The only spec according to which this isn't a valid URL would be the URL RFCs, but those are obsolete, according to WHATWG.

annevk · 2017-10-20T12:11:43Z

It's not valid, please see https://url.spec.whatwg.org/#urls.

justjanne · 2017-10-20T12:26:26Z

Mhm, I just found the definition of the scheme limiter in the host string.

So, as your goal was to define URL parsing for any situation, and obsolete all previous ones, but your own issue tracker doesn't use your base URL parser...

What should I use to parse URLs out of plain text that could potentially be of any language (including ones that wouldn't leave word boundaries between text, and a URL)?

I find very little in your spec about these situations, but you said very handwavy that there should be additional restrictions.

justjanne · 2017-10-20T12:29:41Z

Also, the format should be ideally universal and accepted by any system a user might ever enter a URL in, as your standard is not that young anymore, and you said it's successful, I trust that it has been adopted aby any system a user might enter URLs in, so they will be immediately familiar with my interface.

domenic · 2017-10-20T12:45:11Z

To hopefully clear up any confusion, neither the URL Standard nor the RFCs it obsoletes provides an algorithm for interpreting an arbitrary string of Markdown text and finding URLs within it. That seems to be what you're wondering about, @justjanne, with your discussion of the issue tracker. I believe that might be specified by CommonMark, but I am not sure.

Both the URL Standard and the RFCs it obsolete only operate on specific string inputs which are identified as URLs, for example in a Location: header or <a href=""> element. In other surfaces, such as Markdown text, plaintext emails, or location bar entry, different heuristics apply.

For example, as you noted, https://///url.spec.whatwg.org/ is parsed by <a href=""> and Location: parsers as https://url.spec.whatwg.org/. But it isn't parsed that way by GitHub's Markdown parser.

On the other hand, www.example.com is parsed by <a href=""> and Location: parsers as a relative URL, so e.g. https://github.com/whatwg/url/issues/www.example.com. But if you enter that into Markdown, it is instead parsed as http://www.example.com/, with an implicit http: and added slash. (Example: www.example.com). Similar considerations apply to the location bar, although that uses different heuristics, e.g. for my browser it parses the input url as https://www.google.com/search?q=url&ie=utf-8&oe=utf-8.

To reiterate: the URL Standard, like the RFCs before it, consider Markdown and the location bar out of scope. The URL Standard's "valid" definition can be helpful in building heuristics for situations like that, but won't suffice by itself. (E.g. without a base URL, www.example.com is not a valid URL, so it's clear GitHub/CommonMark are not simply using an algorithm "only link valid URLs".)

justjanne · 2017-10-20T12:50:05Z

@domenic So, you deprecated the URL RFCs, and replaced them with a version that's even more limited in scope? (Because the RFCs at least try to define URLs for all situations — be it XML namespaces, IPC systems, or filesystems).

That's what I'd consider a failure at the goal of universally replacing the RFCs.

And it doesn't help me with parsing URLs (according to whatever definition) out of IRC messages.

domenic · 2017-10-20T12:58:12Z

I'm not sure where you got that conclusion from my comment.

justjanne · 2017-10-20T13:01:18Z

@domenic

Premise 1: url.spec.whatwg.org defines that its goal is to obsolete and replace all other URL definitions.

Premise 2: the URL RFCs define URL syntax for all use cases, including parsing out of plain text, or in XML namespaces.

Premise 3: you state that the scope of the WHATWG spec is limited to basically only web browsing.

Conclusion: WHATWG is trying to obsolete and replace a spec with one that's far more limited in scope.

annevk · 2017-10-20T13:21:28Z

I don't how you came to premise 2 and 3, especially after what @domenic just wrote down. They're both incorrect.

justjanne · 2017-10-20T13:25:42Z

@annevk:

Premise 3 comes from

the URL Standard […] consider Markdown and the location bar out of scope.

Premise 2, I'll just quote RFC 3986

URIs are often transmitted through formats that do not provide a clear context for their interpretation. For example, there are many occasions when a URI is included in plain text; examples include text sent in email, USENET news, and on printed paper. In such cases, it is important to be able to delimit the URI from the rest of the text, and in particular from punctuation marks that might be mistaken for part of the URI.

RFC 3986 — which you claim to obsolete and replace — has an entire section on parsing URLs out of plain text.

justjanne · 2017-10-20T13:26:45Z

See: RFC 3986, Appendix C.

nox · 2017-10-20T13:28:16Z

See: RFC 3986, Appendix C.

Nothing in this appendix is normative.

justjanne · 2017-10-20T13:28:24Z

So I still don't see how your standard can be a replacement if you consider an entirely different scope than the RFCs.

And it worries me greatly that you apparently haven't even read the RFCs while trying to replace them.

justjanne · 2017-10-20T13:29:25Z

@nox And yet, it's part of the scope of the RFC, and the normative part mention such situations several times, too.

The RFCs have a far greater scope than this "replacement" claims to have.

justjanne · 2017-10-20T13:31:45Z

This specification does not place any limits on the nature of a resource, the reasons why an application might seek to refer to a resource, or the kinds of systems that might use URIs for the sake of identifying resources. This specification does not require that a URI persists in identifying the same resource over time, though that is a common goal of all URI schemes.

justjanne · 2017-10-20T13:33:04Z

I could quote half of the RFCs introduction, they make it especially clear that the scope of it is NOT limited to a subset of resources that a web browser might refer to, but it is a spec for any universal usage of an URI, in any situation.

annevk · 2017-10-20T13:33:16Z

There's no processing requirements for plain text in that RFC. E.g., for http://example.com. there's nothing there that states whether the trailing dot is part of the URL or not. I guess you could assume that the RFC states it should be included, but almost no tool would want that kind of behavior so they'd all violate the RFC.

annevk · 2017-10-20T13:35:55Z

As for your premises, I don't see how considering the address bar out of scope (which I can tell you it's also for the RFC as no standard gets to dictate UI) means that the URL Standard is only for web browsing. As I told you earlier it's implemented by Node.js.

As for XML namespaces, they're treated as strings (and required as such by the XML specification) so it really doesn't matter whether they're considered in scope or not. I suppose they're in scope for the URL Standard as far as validity is concerned.

justjanne · 2017-10-20T13:35:56Z

The RFC gives recommendations for parsing plain text, but not strict requirements. Your standard — which claims to obsolete the RFCs — hasn't even tried addressing every stakeholder that currently relies on the RFC before declaring it obsolete. Nor does it provide an adequate replacement with identical scope.

annevk · 2017-10-20T13:36:26Z

Sure it does, there's a whole section on writing URLs, which is perfectly adequate for plain text.

gibson042 · 2017-10-20T16:00:03Z

To be honest, I gave up on this specification after #87 (comment) , which explicitly rejects RFC 3986 Normalization and Comparison and ostensibly allows servers to treat /- vs. /%2D vs. /%2d as distinct. Such a position is impractical (since "comparison methods are designed to minimize false negatives while strictly avoiding false positives"), but also basically impossible to implement in a world where middleboxes do use syntax-based and scheme-based normalization for equivalence comparison.

The problem could be fixed by defining normalization, which should include specifying a model for addressing invalid input like /%%2d%2d%3f (e.g., /%25--%3F or %25%252d-%3F or %25%252d%252d%253f or …), but given an express desire to avoid that I think it's dead in the water.

annevk · 2017-10-21T11:03:09Z

@gibson042 servers are able to treat those as distinct, they don't have to. I wouldn't mind encouraging them not to, if we can find a suitable algorithm to recommend.

gibson042 · 2017-10-21T21:11:11Z

@annevk Servers have the ability to treat them distinctly, but in so doing would be non-conforming with RFC 3986 and not web-compatible. As for a remedy, I gave five options in the original post, and directly linked to them above. If you have changed position and are now amenable to including one here, then let me know and I'll submit a PR.

annevk · 2017-10-22T07:29:12Z

Yeah, I think it's reasonable to recommend normalization for servers (and maybe even expose it in the JavaScript API at some point). We've had some requests for it.

domenic mentioned this issue May 9, 2016

Make it clearer that just about anything is parseable, probably via some examples #119

Closed

bagder mentioned this issue May 9, 2016

handling URL with http:/// (3 slashes between protocol and domain) curl/curl#791

Closed

daurnimator mentioned this issue May 11, 2016

Should we use less strict uri parsing? daurnimator/lua-http#19

Closed

annevk mentioned this issue Apr 11, 2018

Parser generates invalid URLs #379

Closed

msporny mentioned this issue Jun 19, 2023

id, issuer properties: URL or URI w3c/vc-data-model#1160

Closed

It's not immediately clear that "URL syntax" and "URL parser" conflict #118

It's not immediately clear that "URL syntax" and "URL parser" conflict #118

Comments

domenic commented May 9, 2016

bagder commented May 9, 2016

domenic commented May 9, 2016

bagder commented May 10, 2016 • edited Loading

annevk commented May 10, 2016

bagder commented May 10, 2016 • edited Loading

domenic commented May 10, 2016

bagder commented May 10, 2016

domenic commented May 10, 2016 • edited Loading

JohnMH commented May 10, 2016

jyasskin commented May 10, 2016

JohnMH commented May 10, 2016

domenic commented May 10, 2016

JohnMH commented May 11, 2016

domenic commented May 11, 2016

ghost commented May 11, 2016 • edited by ghost Loading

jyasskin commented May 11, 2016

domenic commented May 11, 2016

ghost commented May 11, 2016 • edited by ghost Loading

domenic commented May 11, 2016

ghost commented May 11, 2016

domenic commented May 11, 2016

justjanne commented May 11, 2016

annevk commented May 11, 2016

annevk commented May 11, 2016

justjanne commented May 11, 2016 • edited Loading

annevk commented May 11, 2016

justjanne commented May 11, 2016 • edited Loading

justjanne commented Oct 19, 2017 • edited Loading

annevk commented Oct 20, 2017

justjanne commented Oct 20, 2017

annevk commented Oct 20, 2017

justjanne commented Oct 20, 2017

annevk commented Oct 20, 2017

justjanne commented Oct 20, 2017

justjanne commented Oct 20, 2017

domenic commented Oct 20, 2017 • edited Loading

justjanne commented Oct 20, 2017

domenic commented Oct 20, 2017

justjanne commented Oct 20, 2017

annevk commented Oct 20, 2017

justjanne commented Oct 20, 2017

justjanne commented Oct 20, 2017

nox commented Oct 20, 2017

justjanne commented Oct 20, 2017

justjanne commented Oct 20, 2017

justjanne commented Oct 20, 2017

justjanne commented Oct 20, 2017

annevk commented Oct 20, 2017

annevk commented Oct 20, 2017

justjanne commented Oct 20, 2017

annevk commented Oct 20, 2017

gibson042 commented Oct 20, 2017

annevk commented Oct 21, 2017

gibson042 commented Oct 21, 2017

annevk commented Oct 22, 2017

bagder commented May 10, 2016 •

edited

Loading

bagder commented May 10, 2016 •

edited

Loading

domenic commented May 10, 2016 •

edited

Loading

ghost commented May 11, 2016 •

edited by ghost

Loading

ghost commented May 11, 2016 •

edited by ghost

Loading

justjanne commented May 11, 2016 •

edited

Loading

justjanne commented May 11, 2016 •

edited

Loading

justjanne commented Oct 19, 2017 •

edited

Loading

domenic commented Oct 20, 2017 •

edited

Loading