-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid breaking pages that use the URL fragment for routing/state #15
Comments
I actually think that the use case we're proposing is very conforming to the intended and specified purpose of fragment identifiers so I'd rather not invent a new syntax for it. The motivation here being to avoid breaking SPA routing? My guess would be that if an SPA router is broken by appending '&targetText' it'll also be broken by '%%...'. There's also the post in #7 detailing similar breakage as a result of '##'. I could be convinced if it turns out extending fragid to support TextQuoteSelector is non-web-compatible but that seems like the most promising way forward to me. |
In this suggestion of mine, I'm also trying to recast it as "processing instructions". I did a poor job of making this clear in the title (fixed), but I was specifically trying to make it more programmatic and less declarative. It's not a simple selector, but a processing instruction.
I know the context of the issue might make it seem that way, but SPA router compatibility is actually not a primary driving factor for this - it's a reserved area of extensibility that browsers can iterate on and users can use without risk of breaking websites. I specifically debunked most of the SPA issues and other related hash (ab)uses in #7 as long as they drop some simple code to adapt. I also noted another motivating factor in a parenthetical in the original comment (bold emphasis not in original):
Routers are already broken when you use In general, I view the existing exceptions thrown in the cases of |
I liked that part:
|
I think having the text-fragment available to the app is necessary if it's to work in apps that load content after load. e.g., the would be no way to scroll to text that's on the second page of an infinite scroller. If the fragment is available, the page can parse it and perform the necessary load. e.g. Wikipedia does this on mobile where the sections are all collapsed by default and the page opens the section specified in the fragment. |
@bokand Fair. So what about stripping it only from the hash/query/pathname, but always keeping it in I will throw out there that any extension has potential to interfere with apps, whether it be |
Yep, I'm also concerned about that "potential". I mean, we make a very great assumption here, that there's no SPA that uses I'm thinking about some way to let website owners opt-in to text highlight feature. Probably, The reason why it will be popular is that it will be really good for SEO. So, think of it as of |
That kind of defeats the whole point of the entire proposal of letting you link to arbitrary sections in content you might not necessarily control. It'd be nice (and mildly preferable) to allow sites to customize that jump to content, like what Wikipedia would do to enable jumping to content in a collapsed subsection on mobile. |
@isiahmeadows I agree with you, it is not a great solution, that's why I haven't proposed it, just shared my thoughts. It's just that website owners probably should have control over the browser behavior in one way or another. We either should give the ability to enable this highlight behavior, or disable it. It depends on how you look at it. If you think that all websites should by default support it - then you might want to have |
Looping back here, sorry for the long delay
I think I'm coming around to something like Specifying any hash on this page causes it to load a blank article. I'll look into how feasible this is. |
Actually, I think a better solution to the point I brought up might be to implement something like what I mentioned in #2 - provide an explicit API that the browser can fill in using the |
I'm personally more concerned about breaking the URL than breaking a few poorly written (or narrowly designed at least) SPAs. Introducing For instance using the example posted in #5
Fails with "Could not resolve host: example.com%%targetText" Granted, that might have been a typo. 😃 So...
This one fails with a 404 because it hits the server (as it should).
Routing libraries can be updated more easily than browsers and certainly easier than ~30 years of URL usage and design. SPA's will have to deal with new "clutter" in any client-side URL "magic," so perhaps it's best to sit along side them rather than attempt to avoid each other entirely? |
That's a good point, thanks, and I think that rules out any invalid sequence. I wonder if we could use, e.g. I also am weary of touching URIs since that feels like a much bigger problem. But we don't have a good sense of how large the SPA problem is and pages can break with trivial fragments. It does also feel like having a section of the URI reserved for the UA (to prevent apps from using it) would be a useful thing for this and future features. |
This is essentially what the Consequently, perhaps the XPointer style fragments (proposed in #18) could be removed (or put into the That idea probably needs its own proposal, though... 😃 |
I was toying with the idea of proposing having
Effectively, we would change the interpretation of However,this assumes apps don't do their own hash parsing from the full URL and adds a bunch of icky magic to URL handling. A far simpler solution would be to specify that after processing a On navigating to The major drawback is the WDYT? Is there anything here I'm missing? |
Yes, I like this idea. It won't affect any existing apps if I understand correctly. |
I am not sure I understand the proposal right, @bokand, but the way I read your proposal is to have a URL string that has several '#' characters in it. However, the URL spec disallows this: a valid URL can have one '#' character to separate the 'url-fragment-string', which consists of code points that do not include the '#' characters. There are a number of URL libraries for different languages that parse URL strings that, I presume, rely on this and that raise errors if there are several '#'-s in the URL string. They may all go wrong, and updating all those (as well as updating the URL spec seems to be a major uphill battle... |
@bokand Most client-side routers IIUC do do full parsing of the URL, by necessity. This is especially necessary when query string parameters get involved. I do feel this compromise wouldn't break existing routers and wouldn't require updating URL specs:
Edit: any suffix string should work. |
@iherman |
Well... are we sure about the URL libraries in other environments like Python, Java, Rust, you-name-it? Would we break any code if I did that? |
Yeah, I noticed just after sending that '#' isn't a valid code point. I agree we wouldn't want to introduce an invalid format since that could break existing libraries - https://indieweb.org/fragmention ran into this exact problem using I think the core of the idea of stripping the directive is valid though, so long as we could find some valid and web compatible delimiter. That'd require some data gathering which will take time but we can do. |
@iherman The concern is client-side, not server-side. Server-side routers never see the hash anyways – browsers never send it to them – and if they do encounter one erroneously, most just ignore it or reject the request as malformed, assuming it doesn't itself get dropped somewhere in the middle to save bandwidth. |
I think it'd be bad to break client-side as well. The client might not see our special fragment in its own document, but links on the page would, e.g. |
@bokand I wasn't disagreeing with you, just stating we aren't at high risk of breaking very many servers, especially servers that aren't interpreting |
@isiahmeadows we are talking creating new kinds of URL-s, which may be used as identifiers regardless of whether they are used client-side or server-side. If one creates an annotation that is stored in an annotation server or database, those URL-s would be out in the wild, subject to processing by other tools. |
Right, I don't think the URL bar is constrained in this way. It could contain the |
There's a great appeal to this being an invalid URL. It means these new URLs might break some software, but it also means these URLs will not collide with existing ones. |
@kevinmarks - https://indieweb.org/fragmention says:
Could you elaborate on your experience here? Are there specific tools you found broke down? How did they break? |
I'd have to dig through issues, but we found that some libraries would throw an exception or truncate the url, particularly when it was in a plain text string - irc clients were one example. I think there was one that exited on the parse error. |
Part of the value of urls is that the can pass through intermediate text and still be useful, so I disagree with @tilgovi that breaking them on purpose is a good idea. |
As @bokand mentioned on Chromium bug 961440, we added metrics for URL fragments that contain an additional #, and it's actually surprisingly high at 0.3% of page loads. We're trying the double-hash syntax on Chromium in M77 (feature still behind a flag/canary-dev experiment/origin trial), while still supporting the original syntax. To summarize, our current idea with the double-hash is that we append ##targetText=example to the existing fragment, if any (e.g. #pagestate##targetText=example) and strip it from the fragment after processing, so the page only sees #pagestate (or an empty hash that it can then use for state like WebMD mentioned above) and behaves normally. I'll update the explainer with our current ideas on this as well. |
FWIW, twitter's latest release has broken the old hashbang links like http://twitter.com/#!/kevinrose/status/89578599098744832 |
Add a section on alternative syntax per issue WICG#15.
Just to update, I think there's enough risk with I've done some digging over a sample of all URLs seen by Google crawlers over the last 5 years with some candidate delimiters. We're going to update the proposal to use Example:
Of course, we still want it to be part of the fragment for non-implementing UAs so in the absence of an element-id fragment we must still include a
|
Related to my comment in #25, I actually really like that this does not start with |
I think this issue has been sufficiently addressed in our introduction of the fragment directive and the |
Edit:
s/client instruction/processing instruction/g
Currently, URIs treat
%%
as invalid. Should we maybe extend accepted URIs to support this use case using that token, as effectively "processing instructions" (like "scroll to text") rather than necessarily seeing it as a fragment? These "processing instructions" would not be exposed to users, at least initially, and it wouldn't be included in what's sent to servers.I'm thinking maybe view it as this:
https://example.com#!/route%%q=some%20text
- Search for some texthttps://example.com#!/route%%q=some%20text%%n=2
- Search for second occurrence of some text%%q=some%20text%%n=2
and%%n=2%%q=some%20text
are equivalent.Alternatively, you could wrap each "processing instruction" in brackets like
[q=some%20text]
as suggested in #13, but I feel a double percent sign is probably a little easier to explain and use. (I see potential use in both technical and non-technical circles, so accessibility to non-technical people is a concern of mine.)This would resolve and/or address numerous existing issues already filed:
%%delay=int_ms
or%%delay=float_s
instruction."target"
selectors, but I don't agree we should be bound to their format - it's a bit more verbose than what's necessary for this. We're just selecting crap, and we could include all their functionality with less boilerplate and more flexibility.%%q=
, something even non-technical users could potentially recognize right away due to familiarity with the common convention used by search engines.%%select=div>:nth-child(2)>span.whatever
instruction.%%n=int_n
instruction, as proposed above. We could also include range support here like1-3
for first three or similar. (Other concerns like "last 4" or "all but first two" need considered.)The text was updated successfully, but these errors were encountered: