-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making a concurrent request for navigations #920
Comments
I'd like to know how specific this is to Chrome. From a high-level perspective it seems like a very weird hack. |
@annevk I guess we need feedback from all implementers here, i.e. include Edge team to this conversation. Does anyone know who is working on SW in Microsoft? |
Going to tackle this in two replies, one addressing the problem & another addressing the proposed solution - because I don't think they match up right now. Back when we started work on SW there were calls for higher level APIs, manifests and such, because SW may present performance issues in some cases, and your response to that was roughly "We shouldn't try and solve performance problems until we know we have them, and what shape & size they are". This is still a good plan. I want us to present solid data here that shows the size & shape of the problem, and I want other vendors to verify that SW startup is the bottleneck here, and not something specific to Chrome. There are multiple ways to solve this problem, and the way to pick the best solution is for us all to have good visibility into the issue. |
Without concrete data on the problem, it's difficult to assess solutions. Service worker startup is not free - but we need to assess when that cost significantly detracts from the benefits. It could be: I have no fetch event. I'm paying the startup cost for no reason. We solved this in #718, but Chrome hasn't landed it yet. This is the case for at least one of the large sites reporting this regression. I have a fetch event but it doesn't We have an existing solution that we can apply here: passive listeners. Although we may need to disable reading the request body in this case. I have a fetch event, it sometimes calls I think this is the case the proposed solution is aiming for, but I really want to hear more about sites that use this pattern. self.onactivate = (e) => {
// If unset, preflight requests are sent without special marking
e.setPreflightHeader("X-Site-Specific-Header", "thinger");
// ...
} This suggests the preflight will happen for each and every request (or just navigation requests for some reason?), which feels like a huge change. If I serve a 3gb video from the cache, what happens to the preflight? Is the user going to end up downloading the 3gb again, or will the stream be aborted? Either way, as a developer, I feel like I've just lost a lot of control. self.onfetch = (e) => {
if (e.preflightResponse) {
// This is a navigation fetch which has already been issued.
// If the `preflightResponse` isn't used, then everything proceeds as
// if it hadn't been sent in the first place.
}
} If the fetch event is blocked on a preflight response, you've killed service worker as a means for creating offline-first experiences. One way around this is to make self.onfetch = event => {
event.respondWith(
event.preflightResponse || fetch(event.request)
)
} If Additionally, using the response is still blocked on the SW, as the fetch event is always consulted. Is that ok? Again I want more concrete data. The preflight is always going to happen, but I have to opt into using it. It seems really weird that the preflight isn't an opt-in, there isn't an opt-out, yet using the preflight is opt-in.
I see you put that bit in because I'm in favour of a route-based solution. 😄 But I don't see how routes are a worse solution, and your proposal doesn't feel awfully flexible, easy to reason about, or even address (what seems to be) the route of the problem. A route-based solution should allow a developer to declare "for requests that look like this, do this", in a way that can at the very least started while the service worker boots up. This wins over the proposed preflight solution because:
But again, I think we need clear data before we do something like this. We shouldn't rubber-stamp a scenario-solving solution, especially while we're so vague on the scenario. |
A route-based sketch: self.addEventListener('install', event => {
event.waitUntil(
// populate caches
);
event.declareRoute("/", new FallbackSources(
new CachesSource(),
new FetchSource(),
new FetchEventSource()
));
}); This API is up for debate wayyyyyy beyond the naming, and I still think we need better data before we'd proceed with this. But here's how the above sketch could work:
Attempt each source in series until an acceptable response is found. Alternatives could be
Look for a match in the cache.
Try to get a response from the network.
Defer to the fetch event. So the example above:
The whole thing can complete without the service worker starting up unless we hit step 3, and I'm pretty confident it could be polyfilled. |
Can we tie this "preflight request" to the concept of "preload"? We already have a rel=preload concept in the specs I believe. The PlzNavigate effort to preload the site before the user finishes typing could be seen as the same mechanism. This would give us a consistent mechanism to hang this optimization on without directly tying it to browser-specific mechanisms. This does some like it could be useful, but its complex enough that it would probably only be used by a minority of sites. Maybe its worth it if those sites are extremely high traffic? |
@jakearchibald So, the following would allow service worker startup to happen simultaneous with the network request?
|
Yep! |
But the load would be delayed by bad network connections. I guess maybe your AnySource() solution would allow you to handle this, but its not clear to me exactly how the pre-flight proposal would like to handle flaky network, either. |
Yeah, |
I don't think this is particularly chrome specific. Starting a worker thread, parsing js, interpreting, running the jit if necessary, etc are all overhead to letting the SW handle the fetch event. Coming up with a mechanism to allow this overhead to be performed in parallel with an initial network request would benefit all browsers. I guess ultimately its a tradeoff between complexity and those overhead costs. Firefox might be slightly faster to start a service worker right now (I don't really know though), but we will likely end up with overhead similar to chrome once we fix our infrastructure to handle multiple content processes. If we can come up with something that is not too complex, then it seems like a useful addition. I do kind of agree with Jake, though, that maybe we should wait to implement these kinds of optimizations. Its still early days for people figuring out how they want to use service workers. If we implement this now we might miss out on a useful pattern people find they need or might make it more general purpose (complex) than is needed. |
With the routing proposal... self.addEventListener('install', event => {
event.waitUntil(
// populate caches
);
});
self.addEventListener('fetch', event => {
event.respondWith(
caches.match(event.request).then(r => r || fetch(event.request))
);
}); ...could be replaced with.... self.addEventListener('install', event => {
event.waitUntil(
// populate caches
);
event.declareRoute("/", new FallbackSources(
new CachesSource(),
new FetchSource()
));
}); ...and the SW wouldn't need to boot up for any fetch events. My blog currently composes a streamed response. If the cost of service worker bootup was greater than its benefit, I could do: self.addEventListener('install', event => {
event.waitUntil(
// populate caches
);
event.declareRoute({mode: "navigation"}, new AnySource(
new FetchEventSource(),
new FetchSource()
));
}); ...allowing the browser to race the network and the fetch event for navigations. |
A few things seems missing from the conversation. Many sites we've partnered with have deployed nearly-no-op SWs to do tracking and monitoring. This defeats the no-
Getting UI to users tends to involve "booting up" large amounts of JS, CSS, and associated context. This is true for FB, Inbox (in some modes), Docs (in some cases), and many others. In a no-SW world, the best strategy is to inline fresh content into the response document where the booted code can consume it and render it. With SWs in the mix or server-rendering of snapshots (Inbox & Docs, but only in certain modes), things get more complicated. A service designer of one of these systems wants to be able to:
To handle the second of those, the proposal I've provided lets the server know a few things:
It does this with minimal new infrastructure, enabling both streaming for "static" sites that want to handle HTML partials and not data, as well as allowing sites that deal in data (vs. inline HTML) to do so naturally. Would love to hear from @bmaurer on the alternatives proposed here as well. |
That's a pretty good summary for what we want to do. In the shorter term we're going to cache parts of the page markup and then make a request for the rest, in the much longer run we're going to cache most of the markup but will still need to make a request for page data. In both cases we're going to want to make a request to our server to get data as soon as we can and at the same time start getting code to the client window as soon as we can. Service worker startup isn't free: in Chrome, our in-the-field logging is telling us that service worker startup adds about 200ms to page load time on desktop. It might be slightly better or worse in other browsers, but it's still going to be a significant cost to startup a process and initialize the service worker code. Our site is pretty optimized to deliver resources quickly and in the right order, so our concern is that this extra time might mean that service workers are a regression in some cases. What we don't want to do however is have a race between network and cache without actually opening from the service worker. If we're going to build a version of the site that can be fully loaded from a service worker we will most likely still need to make a request each time the page loads to get the newest content, so simply loading the site from cache won't always be a win if we have to wait to make a request afterwards anyway. The optimization that we need allows us to get the initial request out before the service worker starts up but still allows the fetch event to get and process a request if one was sent out. Having the option to set a header or other fetch options here is important because even in the short term we aren't going to return the same kind of markup when a service worker is there vs when it isn't. Also, making this field nullable in the fetch event seems fine to me. It can be called possiblePreflightRequest or something like that and then you can wait for it just like a normal request if you want to use it. |
@slightlyoff can you answer my questions about your proposal & show why it's better than declarative routing? |
To add a bit more color to Nate's comments -- First, I think it's worth considering the position a site is in as it first adopts service worker. SW is a huge change to how people write websites. Any complex site is going to take a while to get it's feet wet. One thing that Alex mentions is that some people might use SW to do things that don't involve intercepting the root document the user navigates to (eg, caching static resources). A site might gradually use sw for only parts of the root document. Maybe first they just cache their page header. To the extent that using SW causes a perf hit (as Nate mentions there's a startup cost at least in chrome) that makes it hard for a site to dip their feet into using SW. Even if SW eventually enables great wins sites tend to develop incrementally and want each increment to be better on metrics. So in my mind an ideal solution here would be that a service worker that does nothing, has no cost Now let's think about a more advanced SW deployment, taking a site like FB as an example. FB would have two goals:
I think many apps share this set of dual goals -- for example, an email application would want to show existing emails quickly but fetch updated emails. In order to accomplish (2) one needs to communicate to the site's server "this user is opening the app". This message is generally fairly light weight -- at a minimum it needs to identify the user. But it may also need to communicate a small amount of information about the state of the cache. For example, you might need to communicate the last cached newsfeed story. As a slightly more complicated case one could imagine a large class of apps (uber, postmates, etc) that want to communicate the location as quickly as possible. On our mobile apps we go to great lengths to make this notification happen as quickly as possible. Early in the startup process we send out a UDP packet that contains an encrypted user identifier. (https://code.facebook.com/posts/1675399786008080/optimizing-facebook-for-ios-start-time/) Jake's route based proposal makes sense to me here. It seems like a generalization of Alex's preflight request. I think to make Jake's proposal work you'd need a few things:
|
I'll try (although I though I had by clarifying the requirements):
I think we should probably do declarative routing. We should probably go with something not dissimilar from your proposal, @jakearchibald, but I don't think it solves the issues @n8schloss or @bmaurer are raising nor does it really help our PlzNavigate integration. It's also quite a bit larger. As a final thought, I think these can layer together quite nicely. Having the preflight come back as a readable stream that can be directed to the document (in cases where you might use a |
I think that if we want to handle navigation differently (so sites can only do service workers for subresources or some such) it needs to be an opt-in primitive of sorts. We should not take total network control away from the site due to Chrome's PlzNavigate project. |
It's also now more clear than ever that those pushing to make fetch opt-in were correct. 😕 |
Trying to get a concrete picture of the problem here. Someone shout up if this is wrong. The loading strategy is:
Here's some sample code for this: https://gist.github.com/jakearchibald/4927b34bb14c3681a3c1c1ce6ede5b09/b43199815f451cadde7280aa9b8dea1f84a88387 And the problem is:
It's still unclear to me how that delay compares to the benefits of rendering without the network. Even if it's only the shell that's rendered without the network, getting the shell & JS from the cache should come with a benefit vs the network. But yes, as a site adopts SW, there may be particular navigation routes that simply defer to default browser behaviour. |
But what about a race between the fetch event and a default network response? |
This is why we need a routing approach. Making all-or-nothing behaviour changes doesn't fit into this gradual model.
Like I said in #920 (comment), the spec already caters for this, and impelementation is in progress.
Seems fair. It certainly shouldn't be slower than appcache.
Do you need something beyond
This seems like a separate thing (it's the first time posting has come up). What problem is this solving? As in, when would you want to POST along with an initial navigation? Related: there is a rough plan for background posting/downloads. |
These could easily be options to event.declareRoute({mode: 'navigation'}, new FetchSource({
applyHeaders: {
'X-Site-Specific-Header': "thinger"
}
}), {fireFetchEvent: 'yes'}); I think that's equivalent to your proposal (I don't know for sure as you haven't described how your proposal works),
Are we sure that's the case? Based on the code example above, if we had routing, the following could happen while the SW was booting up:
If that takes longer than 200ms, then we've won. Not only that, but we rendered while that was happening. Contrast this to your preflight proposal, where:
If you're racing
We're already talking about hacking around your proposal. It's too high level, and too all-encompasing. Also, you still haven't answered my questions on how it works. My questions are in #920 (comment). The outstanding questions are:
|
It's already opt-in. If you don't want it, don't add a fetch listener. |
True, we made it opt-in ex post facto through a hack. It still looks weird compared to the other APIs exposed by service workers and makes it harder to add registration options specific to fetch, such as some kind of filtering mechanism. |
Here's the current model for an "app shell" site that uses JS to populate content https://gist.github.com/jakearchibald/4927b34bb14c3681a3c1c1ce6ede5b09/b43199815f451cadde7280aa9b8dea1f84a88387 Here's what it would look like with the routing proposal https://gist.github.com/jakearchibald/4927b34bb14c3681a3c1c1ce6ede5b09/c8bdf11ada8a99bb506f369d7d040051ad4dda16 - this doesn't need the SW to boot up at all. @slightlyoff, can you show how your solution would work in this case? (in addition to clarifying the behaviour of your proposal) |
The problem is that even though we win on getting the shell displayed faster, we lose in terms of the amount of time that it takes for the server to start processing our request and to send more data. Keep in mind that today Facebook uses chunked encoding. Within 10 ms of a request touching our server we start sending back instructions on how to load the app shell. So for us the loading of the app shell is already happening in parallel with our server work. For many requests the critical path in displaying new content is the speed of computing that ranking. Even if SW is a win in being able to display the app shell more quickly that may not help us meet the business objective of displaying up to date newsfeed stories more quickly. What we're looking for here is the best of both worlds -- being able to display the app shell without a network connection without increasing the delay between when a user clicks facebook.com and when our servers are able to start giving them new content. |
Thanks! Is there any server rendering of dynamic content going on here, or is content added to the page using JS? If it's the former, a streaming solution seems a better fit. If it's the latter, could |
This would be content to be added via JS. The problem is that rel=preload is going to get loaded way too late. In chrome at least that requires the renderer process to be started. Also, this doesn't give the SW a great chance to save state. Your idea of using etags is interesting but I think it might not capture the kinds of data one would want to send. For example, one may wish to send information about what JS/CSS files are in the cache so the server can render markup for an appropriate version of the site. Ideally what I'd want here is something like "facebook.com/ has a route that will load a cached version of the app shell. In addition, it will send a request to facebook.com/newdata with the header CurState:XYZ where XYZ can be managed by the service worker as its cached state changes". The request to /newdata should have as little delay as possible. |
From a developer perspective it's much much much simpler if you are able to call enable()/disable() during the active event. It'd be a bit confusing to have to call enable() first thing after your SW runs once it's already been activated, or to have to make logic to deal with both the activated/unactivated cases. |
That's covered because in onactivate the worker is already active. This will restrict you from calling it in oninstall (in the case of a new registration. during updates it's callable because there's an active worker). |
During the activate event the worker is already the active worker (at least with the terminology as used in the spec), so I'd expect that to still work |
I think it's important for NPS promises to resolve once the data is successfully stored. As for the timing, I was following the push API's lead here. I'm not against changing as long as we both change. I don't fully understand the complexity in waiting. Assuming… registration.navigationPreload.enable(); …waits for an active worker, how is this more complex than: await registration.ready;
registration.navigationPreload.enable(); (assuming #770) |
I actually didn't really have waiting in mind when talking about complexity above. The complexity was about implementing resolve-after-store without waiting for active. This would likely require (for Chrome) initially storing the NPS outside the registration record if needed, then moving it into the record once that's stored (to avoid regressing the time taken to load a registration). Implementation-wise, I think waiting is doable. The big disadvantage is the It's still simplest to just reject when there is no active worker though. Since Chrome's push API has done this since inception, it's probably OK to keep doing it. |
But store can only happen once there's an active worker. If the tab closes before a worker becomes active, nothing is stored, so I'm not sure why we'd need to store anything outside the registration.
Hmm, yeah, background spec also does the same even though it's spec'd to "wait". @wanderview how do you feel about this change? I'm happy with it. If we agree I'll submit PRs in push, sync & update NPS. |
Works for me as long as we update the spec to match. Thanks. |
Ah yes, this is getting into implementation details that are probably not worth going into. The proposals to reject on non-active or wait for active are OK with me, with a preference for rejection. Again my complexity feedback was about trying to implement a solution that neither waits for active nor rejects on non-active, yet still has the properties that upon resolution:
For this solution, it’s not enough to just set the NPS in memory on our “IO” thread (where live service worker registrations are managed) and expect it to be stored once the registration is stored because of race conditions: the store operation may already be queued up or running on the “DB” thread. So I was thinking the DB thread should always write the newest NPS state: either in the registration if it exists or outside if not, and swap in outside NPS once the registration is stored. But I guess alternatively the DB thread could keep unstored NPS in memory and check that whenever storing a registration, so yes it could be possible. Still a bit clunky though. |
I'm going to update the PR so we reject if there's no active SW. I'll submit PRs for background sync and push to do the same. Update: I'm going to tackle #965 then this |
This matches current implementations w3c/ServiceWorker#920 (comment)
As per agreement in w3c/ServiceWorker#920 (comment)
As per agreement in w3c/ServiceWorker#920 (comment)
Ok, I've updated the spec to throw if there's no active worker. Also filed w3c/push-api#230 and WICG/background-sync#134. |
As per agreement in w3c/ServiceWorker#920 (comment)
F2F: Chrome want to ship this. Feedback good from Facebook, although they don't have solid numbers as it hasn't shipped to Chrome stable yet. No objections to Chrome shipping from other vendors. |
This has shipped in Chrome \o/ |
Is Edge working on this yet? |
Apologies in advance for the length of this issue.
A few weeks ago I was discussing the topic of the upcoming "PlzNavigate" feature with @naskooskov, @n8schloss, and @bmaurer.
The TL;DR of PlzNavigate is that navigation actions in Chromium will not be handled as they currently are -- sending them through a Renderer process which then routes them to the Browser process for eventual dispatch by the network stack -- and will instead be immediately routed to the Browser-side network stack, improving time-to-navigation in the common (non-SW-controlled) case. This is beneficial in the PlzNavigate world which is much more aggressively multi-process oriented. Saving the time to create processes is a big win, particularly on Android which "features" particularly slow native process creation.
In Chromium (and one assumes similarly architected browsers), this means that PlzNavigate-style request optimisation runs afoul of Service Worker handling of these requests. This isn't particularly satisfying as the SW may indeed choose to make a request for the top-level resource from the network. Indeed, waiting to issue these requests on Service Worker startup is being reported by large sites as a regression in the 10s or even hundred+ millisecond range. This is notable on sites which do not handle fetches for top-level documents but only want to use SWs for caching.
What if we could enable PlzNavigate and remove the hit generated by SW startup?
The idea in the following proposal is to allow a style of declarative navigation request decoration for these "preflight" navigation requests, allowing the Service Worker to use (or discard) the response. If no decoration is added and the site's SW decides to handle the request directly (e.g. with a
e.respondWith(fetch(e.request))
), nothing should break. Similarly, it's a goal to avoid sending the results of the "preflight" request to the document without the Service Worker's involvement.To accomplish this, the proposal we sketched out on the whiteboard was to allow the
onfetch
event that corresponds to the navigation to have access to the original (preflight) response. To enable a savvy server-side to repurpose this preflight navigation to, e.g., send up-to-date data in a different format than HTML (imagine JSON or similar), we'd also allow the Service Worker to register a header to pass along with the preflight'd navigation request. All together, the strawman looks roughly like:Obviously the names are bike-sheddable. The goal however isn't to be super declarative about deciding what "routes" are handled in which style. Instead, it's to allow the maximum of flexibility for cooperating servers and clients to eliminate SW startup latency.
Thoughts?
/cc @jakearchibald @wanderview @jungkees @mkruisselbrink
The text was updated successfully, but these errors were encountered: