-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scope matching algorithm breaks sites that don't end in a slash #1272
Comments
This seems the most palatable and least likely to cause breakage on the web. |
Why do people think matching on path segments would cause breakage? What would be an example breakage? |
You can match scopes against file names or even substrings of filenames today. Making it a path comparison doesn't seem compatible with that? Or maybe I don't understand that proposal. For example, pretty sure we have tests that set scopes "https://foo.com/path/dummy-file" and expect it to control both "dummy-file.js" and "dummy-file.html". |
I believe we also have scopes that use query strings for uniqueness: Probably less chance people are doing that one in the wild, though, but who knows. |
I will also just mention the scope mechanism is kind of lame since storage and permissions are origin based. It would be much nicer if we just had maps.google.com. Edit: For example, when I visited the google maps PWA it asked me for location permission (reasonable), but I have to grant it to all of google.com (unreasonable IMO). |
Got it, thanks for filling in those examples for me. I guess in that sense the "secondary measure" seems like the only way to allow those cases to work while also solving the OP's problem. I'd kind of hope people aren't using either of those patterns in the wild, but I have no data to back up that hope, and imagine that the chance of me being right is low enough that it's not worth collecting data/waiting to fix this. (And, agreed on scopes being lame in general :(.) |
Also, while not a great reason, changing our WPT test corpus to a path-only scoping mechanism would take quite a bit of time. We have a lot of tests that use file based scopes I think. |
I found out we had designed it as the exact-path match plus globbing. (see #287.) We dropped the globbing in favor of removing the complexity posed in the OP of that issue. With the OP of this issue considered, I agree the "secondary measure" is a reasonable option we can take. Any concerns about adding that condition? Or any other good ideas? |
Ping on this proposal. To save time reading the above, the amended proposal is to change scope matching so any scope ending with a slash ("/abc/xyz/") also matches the URL without a slash ("/abc/xyz"), but not with any suffixes after that. So:
This is to allow a site like Google Maps to use the scope "https://www.google.com/maps/", which would now match "https://www.google.com/maps" but not "https://www.google.com/mapsearch". Currently there is no way to do this. |
I understand the use case but I'm a bit reluctant to add intelligence/exceptions to the scope-matching algorithm. It could conceivably break some assumptions in code if suddenly document URL can be shorter than the scope. I agree the real solution is to do maps.google.com, or as a workaround can they claim all of "maps*" (i.e., move mapsearch somewhere else?) Another workaround is two service workers: one at maps and one at maps/, and have maps redirect to maps/ (though that will incur two service worker startup costs). |
There is no "mapsearch". This is just a hypothetical example of why the current options are too limited. Note that just because this is hypothetical does not mean it isn't a problem right now. google.com/maps is real. While google.com/mapsearch is not real, the possibility of google.com/maps* in the future makes it dangerous for them to define a service worker with "google.com/maps" as the scope. But they can't define "google.com/maps/" as the scope, because that would exclude their canonical URL! This dilemma affects 100% of applications whose scope is not "/". You could argue that they should use "maps.google.com" instead; I tend to agree, but then why does "scope" exist as a concept (why not just automatically scope SWs to the origin)?
I agree that it might be useful to have more expressive scoping. But the issue I'm talking about here means that "scope" as a concept is fundamentally broken. It just isn't broken too badly which is why not many people are complaining; 1. because most scopes are probably "/", and 2. because everybody else is probably defining over-broad scopes by accident, or excluding their non-slashed URL by accident, which means subtle breakage, not catastrophic. I still think this should be fixed as a priority. |
I think this might be correct. Scopes were never a universally popular thing. I'd still like to consider the other big use cases before making a quick decision here that could limit what we do later or end up adding more complexity. Interestingly, it looks like this was considered in issue 3: #3. |
That may be true, and perhaps the best thing would have been to always scope to the origin and force apps to design around that. But scopes are a thing, so they should work. Having said that, I do think scope was a valuable feature. While it's easy to say to developers, "oh you should just use "maps.google.com" instead of "google.com/maps", the reality is that you'd be telling a gigantic organisation like Google Maps that they have to change all of their URLs before they can begin using your technology. That's going to drastically lower the the cost/benefit ratio for implementing Service Workers (and Web App Manifests). So I'd rather we fix scope, than simply tell developers, "best to design your site such that scope is
Interesting. That issue was closed with "Per today's f2f discussion, this is your app's responsibility." It looks like that decision was made back in the day when " |
Sorry for the slow response. Was discussing with others on another team here, as well as @jungkees. I can imagine a few extensions that would address the full range of issues I'm seeing: 1.) An extension for controlled scopes that marks them as "pathComponent" matches to solve the These might come together like: <html>
<script>
navigator.serviceWorker.register("/sw.js", {
scope: "/thinger",
exact: true
}).then(...);
</script>
</html> // sw.js
self.onactivate = (e) => {
// Auxiliaries don't affect the registration
e.addAuxiliaryScope({
// handle all navigations to `/whatevs/*` but not `/whatevslol` (e.g.)
scope: "/whatevs",
pathComponent: true
// `exact` would also be legal here
});
// ...
}; Thoughts? |
I thought the "secondary measure" would be good. But after seeing #1272 (comment), adding an option to opt in seems to be a safer option. From @slightlyoff's proposal, "pathComponent" seems to be able to solve the OP issue. Would we have use cases where "exact" match is required? |
I don't understand the difference between "pathComponent" and "exact" in these two (alternative?) proposals? Seems like they both do the same thing which is "/foo" (without a slash) would only handle "/foo" and "/foo/anything" but not "/foobar". Having it be opt-in is fine, though it adds to the complexity of correctly setting up a service worker. I'd ask that we try to find a solution that has an analogue in Web App Manifest as well, since this issue also affects Manifest.
Do you mean force developers to migrate their app to a sub-origin rather than a path? |
I meant the sub-origins spec proposal, but I already deleted my comment because I decided the answer was probably "no". |
Oh, you're referring to this? I wasn't aware of this proposal. That would be fantastic; if we could tie SW scope, Manifest (app) scope, and perhaps permission scope and a few other things, into the same concept of a sub-origin, without forcing developers to rewrite their URL scheme. But I'm not sure what the status of this proposal is. |
There's a lot of hate for service worker scope from standards folks, but remember that it's what allowed developers to use service workers on github pages, rawgit, WPT, and It feels like the best short-term solution is:
If I don't have a good solution for the multiple TLDs. To fix that we'd need a way to scope a service worker to something greater than an origin, like a cert. Ew. The above is complicated because maps as so many URLs, across so many origins. They have Is this a mess that service worker should be trying to fix? |
The current behavior of ServiceWorker scope matching being a simple string prefix is a bigger problem than just the maps-trailing-slash example. Here's another non-hypothetical example: I want to install a simple ServiceWorker on the Google homepage - www.google.com - in order to speed it up and make some functionality available offline. However, there are a ton of other properties hosted on www.google.com that I don't want this ServiceWorker to be activated for - today any SW that handles requests to www.google.com (or www.google.com/ with the slash) must also intercept every request to www.google.com/maps, www.google.com/flights, www.google.com/search, www.google.com/preferences, and countless more. It's very much not a feasible solution to contact all the unknown number of people who own some random path off of www.google.com and ask them to install an empty ServiceWorker so we don't add latency and potential bugs to their serving path. @slightlyoff 's proposal around allowing a SW to specify multiple paths and limiting some to the exact specified path rather than treating it as a prefix solves this problem nicely, as well as the maps-trailing-slash case, as well as some others (such as wanting to register for example.com/myapp and example.com/settings but not example.com/betatestapp), in a relatively straightforward way. edit: reading through the comment thread again I think there may be some confusion as to the behavior of the "exact" behavior vs the "pathComponent" behavior. "/foo" as a pathComponent would match "/foo", "/foo/", and "/foo/bar/baz", but not "/foobar". On the other hand, "/foo" registered as exact would match only exactly "/foo" and not any of the others. |
@jakearchibald: I think you're fixating on the Maps example too much. Yes, Maps is a mess because they have like 40 non-structurally-related URLs that all redirect to the same domain, and there's not much we can do about it. But a path without a slash being a parent of a path with a slash is universal.
Yes, they are different URLs; redirecting one to the other is a convention, not part of a standard. Note that I am not proposing that they be treated equivalently, or an automatic redirect, but a sensible containership rule.
If this is not possible for breakage reasons, I'd like for it to at least be possible to define this boundary. I agree that making scope
Having the two service workers doesn't buy you anything at all, since you still end up with a SW that handles all URLs that start with
This is not a great solution. Again, this is not a Maps-specific quirk. This affects 100% of SWs that aren't installed at the origin root. Should all such sites be required to install a dummy SW at any paths that share a string prefix with another SW, just because the scoping rules are broken? Also, this solution doesn't work for Web App Manifest (which suffers the same issue, on account of consistency with the SW spec), because unless the user has installed the Whether it's opt-in or default behaviour, I would really like to see a solution that makes it possible to define a sane scope that includes slashless paths. |
But then you have app logic in the SW that may also control
I think it's a bit much to imply the current scoping rules are 'insane'. Can we tone down the rhetoric, please? |
(I'm not ignoring the points in the above posts. I'll try to summarise them to make sure we're all on the same page.) |
Problem 1: An app is hosted at Problem 2: An app hosted at Problem 3: An app hosted at Is that a fair description of the problems? Possible solution 1: Distinct apps should have their own origin, eg However, large parts of the web aren't built with this in mind, so for those sites: Possible solution 2: Sub-origins. These allow arbitrary URLs to become part of another origin, allowing However, since the sub-origin is assigned at response time, it isn't clear to me how this can work with service worker, which needs to select a controlling registration before the request. Possible solution 3: Add secondary scopes with optional exact-matching. This way you could have service workers scoped to:
However, this creates big questions around the expected behaviour of Possible solution 4: Add an option to registration that means "match this scope URL's path component (as in, ignore search), and also match the scope URL + However, this is kinda magic, and doesn't solve problem 2. It could be used in combination with solution 3, as it works around the search component of the URL. Also, although problem 3 is real-world and creates the same user experience issues as 1 & 2, are we happy to WONTFIX that? If so, why are we less bothered about that case? |
Not sure if sub-origins are a feasible solution to the general problem, as the boundaries a developer might want between Service Workers or pages without a SW don't necessarily match up to origin boundaries. For example, for whatever reason I might want a separate SW to manage my app preferences page or script resource caching or whatever, but still want it to share storage and permissions access with my main app SW. (Not to say I'm not a fan of sub-origins, I just don't think they solve this problem in its entirety.) Regarding solution 3, while I have no opinion on getRegistration() behavior, could one solution to two SWs trying to register for the same scope simply be that the install of the second fails with a clear and descriptive error? So long as the additional scopes are defined in register and not onactivate that seems like a straightforward and reasonably practical solution. Really good point about migrating a scope between SWs - we didn't consider that. When we install a SW and give it a cache expry date does it automatically unregister from its scopes once its cache TTL expires? If so then we may be ok, and I think a reasonable story for migrating a scope such as /login from SW A to SW B would be something like:
Problem 3 seems like a very hard problem. Not sure how to solve it cleanly without something like the conceptual inverse of sub-origins, where two domains can claim to be actually the same and share resources and the like, which mildly terrifies me. We've been thinking about options like a long-term cached page that is just a full-screen iframe to a canonical origin, though that has obvious latency issues. Given the difficulties here it's not clear it makes sense to tie solving this problem to solving the other problems and it might be better to consider it separately. |
Of course, what was I thinking. OK, I withdraw this; there is no "internal" argument (in URL syntax) that Ultimately, this all goes back to the Unix file system, where
From the point of view of which URLs are captured by SWs, having two SWs instead of one doesn't change things. Either way, you still need a third SW (what I'm going to call an "anti-service-worker" since it exists solely to poke a hole in the parent SW) in
Well, I used the word "sane", which sounded softer in tone (to me) than explicitly calling the current behaviour "insane".
Yes, but I believe not all three problems are equally valid. I think Problem 1 is something SW scoping should handle, Problem 2 is a maybe, and Problem 3 is out of scope. I'll try to justify this, but if nothing else, as @davidcblack says, because each problem is significantly harder to solve at the spec level than the last, and if we can easily solve Problem 1, we shouldn't stop because we can't solve the others. The reason to favour solving P1 is:
Ironically, there seems to have at one point been a "Problem 0": defining a scope that includes several sibling directories that start with a common prefix. I consider that way down on my list of "must have" features of a scope matching algorithm, and can't think of any use case for it. Yet it seems to have been prioritized over Problem 1.
I don't agree it creates the same UX issues. First of all, P3 is solvable by the developer, while P1 is not (unless they do the whole "anti-service-worker" trick, and make sure any future developers who play in the same URL space do the same). Secondly, as a user, I barely notice the trailing slash. In fact, I think most users (who know what a URL is at all), and heck probably most developers, think that a trailing slash is semantically identical to not having it (i.e., that "google.com/maps" and "google.com/maps/" are the same URL). I was certainly under that impression for a long time. So it's quite unintuitive if a user types "google.com/maps" and it doesn't load, and then they are told, "oh, you need to add a trailing slash for it to work offline." On the other hand, anyone can tell that "maps.google.com" and "google.com/maps" are different addresses. Even if people can't tell you the difference between them, and they seem to both work the same, you can understand if you're told, "maps.google.com" is a web page that bounces you to "google.com/maps"; even though "google.com/maps" works offline, if you type "maps.google.com" you need to be online for it to work. The possible solutions:
I don't get why matching path segments is "magic". If scope always matched path segments, would you consider it magic? In other words, is the "magic" the path matching algorithm itself, or the fact that it's something the developer has to opt into? If the former: ... why? Matching path segments is the basic way of telling if one URL (or Unix path) is a prefix of another. If the latter: I agree, it sucks that you'd have to opt in to getting the "right" behaviour. But the ship has sailed on having the right behaviour by default. I'd rather tell web devs, "use this flag to opt in to path prefix" than "make sure every URL that ever starts with the four letters 'maps' registers a blank service worker." Now I've finished every email to this thread mentioning Web App Manifest and nobody has responded about it. I'd like to have the same scope matching solution for both SW and Manifest. I'm happy if we come to the conclusion that they have different requirements and should therefore go in separate directions on this. But I think it would be best if there is a consistent approach for both. (Note: I am not really invested in Service Workers here; I'm just trying to ensure consistency between the two specs, since the Manifest spec explicitly cites the SW spec for why it has this behaviour.) |
Nah, your service worker registration lasts, in theory, forever. Taking the example where a SW has scopes
You'd need to update the register call for the This is pretty hard, and it'll behave differently depending on which step happens first, so the issue may not be caught until things go live. It seems more risky, and requires more inter-app awareness than the blank service worker idea I posted earlier. |
Yeah, but that isn't how URLs work. On the file system
I agree it isn't a pretty solution, but we should compare the proposed solutions to it.
Fair enough. It felt implied to me. By wanting a "sane" solution, the implication is that the current solution isn't that, therefore "insane".
That was never a specific use-case. Prefix matching was chosen as it matches how URLs behave. Yes, there are conventions that treat URLs differently, but as we've seen with the maps example, some conventions are pretty weird, like serving the same content from multiple URLs. It's also convention that URLs that end in
Relative URLs behave quite differently.
Is typing URLs in full a use case worth considering? I don't think it is real-world these days. Users are clicking on links, icons, or relying on autocomplete in the address bar. When I type "maps" into my URL bar, the first autocompletion offered to me is If a user clicks a link that points to
Can you name another web API that treats
By "right" you mean differently to how every other web API treats URLs. I'm a little confused that we've been presented with a series of real-life use cases, but now we're being asked to ignore some of them because a solution has been suggested that solves a subsection of the problems. That said, solution 4 seems the least problematic, and doesn't feel like it clashes with possible solutions to problem 2. |
GitHub does the |
https://jakearchibald.github.io/thing.txt and https://jakearchibald.github.io/thing.txt/ are different resources. Update: Hah, no, I was caught out by caching. Github gives priority to project names over organisations, so event though https://jakearchibald.github.io/thing.txt exists, it can't be accessed, because https://jakearchibald.github.io/thing.txt/ exists.
Absolutely. navigator.serviceWorker.register('/thing/sw.js', {
treatScopeLikeFilesystemPath: true
}); The above would require a service worker that could be scoped to |
As an FYI, sub-origin work has stopped in Chrome and that code is being removed. I don't expect it's going to assist us here unless Mozilla (or someone else) pick it up and champion it generally. |
The Match Service Worker Registration algorithm is a simple string prefix match, rather than a path segment prefix match. This means that if the SW scope does not end in a slash, you get unexpected behaviour, e.g.: if scope is "https://www.google.com/maps", you will match a (hypothetical) URL "https://www.google.com/mapsearch", which is intended to be a different product. The work-around for this issue is to always include a trailing slash in the scope.
I asked the Maps team to do this and they raised an important point: if they did that, then yes, "/mapsearch" would be correctly filtered out of the scope. But then the URL "https://www.google.com/maps" (no trailing slash) would also not be in the SW scope, and thus not hit the fetch handler. Since "https://www.google.com/maps" redirects to "https://www.google.com/maps/", this isn't a big deal, except that it breaks offline support. If the entire of "https://www.google.com/maps/" works correctly offline, but someone links to "https://www.google.com/maps", the user would need to be online to make a network request to "https://www.google.com/maps" to get a redirect to "https://www.google.com/maps/" which would then be served by the SW.
So essentially, developers are forced to make a decision between two bad choices:
Is there any practical reason why this algorithm is a string prefix match, rather than a path segment prefix match? In a file system, I don't consider "/foobar/baz" to be inside the directory "/foo". Is it possible to change this behaviour now, or is it too late?
If we can't do a path segment match, as a secondary measure, can we add a rule saying if the scope ends with '/', it matches that path without scope. So scope "https://www.google.com/maps/" matches "https://www.google.com/maps" and "https://www.google.com/maps/anything", but not "https://www.google.com/mapsearch".
Note that the Web App Manifest spec has the same algorithm, and it was deliberately chosen for compatibility with the Service Worker algorithm. This is causing similar problems over there; see w3c/manifest#554; particularly this comment. I am making a similar proposal there.
The text was updated successfully, but these errors were encountered: