Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making a concurrent request for navigations #920

Closed
slightlyoff opened this issue Jun 28, 2016 · 143 comments
Closed

Making a concurrent request for navigations #920

slightlyoff opened this issue Jun 28, 2016 · 143 comments
Assignees
Milestone

Comments

@slightlyoff
Copy link
Contributor

Apologies in advance for the length of this issue.

A few weeks ago I was discussing the topic of the upcoming "PlzNavigate" feature with @naskooskov, @n8schloss, and @bmaurer.

The TL;DR of PlzNavigate is that navigation actions in Chromium will not be handled as they currently are -- sending them through a Renderer process which then routes them to the Browser process for eventual dispatch by the network stack -- and will instead be immediately routed to the Browser-side network stack, improving time-to-navigation in the common (non-SW-controlled) case. This is beneficial in the PlzNavigate world which is much more aggressively multi-process oriented. Saving the time to create processes is a big win, particularly on Android which "features" particularly slow native process creation.

In Chromium (and one assumes similarly architected browsers), this means that PlzNavigate-style request optimisation runs afoul of Service Worker handling of these requests. This isn't particularly satisfying as the SW may indeed choose to make a request for the top-level resource from the network. Indeed, waiting to issue these requests on Service Worker startup is being reported by large sites as a regression in the 10s or even hundred+ millisecond range. This is notable on sites which do not handle fetches for top-level documents but only want to use SWs for caching.

What if we could enable PlzNavigate and remove the hit generated by SW startup?

The idea in the following proposal is to allow a style of declarative navigation request decoration for these "preflight" navigation requests, allowing the Service Worker to use (or discard) the response. If no decoration is added and the site's SW decides to handle the request directly (e.g. with a e.respondWith(fetch(e.request))), nothing should break. Similarly, it's a goal to avoid sending the results of the "preflight" request to the document without the Service Worker's involvement.

To accomplish this, the proposal we sketched out on the whiteboard was to allow the onfetch event that corresponds to the navigation to have access to the original (preflight) response. To enable a savvy server-side to repurpose this preflight navigation to, e.g., send up-to-date data in a different format than HTML (imagine JSON or similar), we'd also allow the Service Worker to register a header to pass along with the preflight'd navigation request. All together, the strawman looks roughly like:

self.onactivate = (e) => {
  // If unset, preflight requests are sent without special marking
  e.setPreflightHeader("X-Site-Specific-Header", "thinger");

  // ...
}

self.onfetch = (e) => {
  if (e.preflightResponse) {
    // This is a navigation fetch which has already been issued.
    // If the `preflightResponse` isn't used, then everything proceeds as
    // if it hadn't been sent in the first place.
  }
}

Obviously the names are bike-sheddable. The goal however isn't to be super declarative about deciding what "routes" are handled in which style. Instead, it's to allow the maximum of flexibility for cooperating servers and clients to eliminate SW startup latency.

Thoughts?

/cc @jakearchibald @wanderview @jungkees @mkruisselbrink

@annevk
Copy link
Member

annevk commented Jun 28, 2016

I'd like to know how specific this is to Chrome. From a high-level perspective it seems like a very weird hack.

@NekR
Copy link

NekR commented Jun 28, 2016

@annevk I guess we need feedback from all implementers here, i.e. include Edge team to this conversation. Does anyone know who is working on SW in Microsoft?

@jakearchibald
Copy link
Contributor

jakearchibald commented Jun 28, 2016

Going to tackle this in two replies, one addressing the problem & another addressing the proposed solution - because I don't think they match up right now.

@slightlyoff

Back when we started work on SW there were calls for higher level APIs, manifests and such, because SW may present performance issues in some cases, and your response to that was roughly "We shouldn't try and solve performance problems until we know we have them, and what shape & size they are". This is still a good plan.

I want us to present solid data here that shows the size & shape of the problem, and I want other vendors to verify that SW startup is the bottleneck here, and not something specific to Chrome.

There are multiple ways to solve this problem, and the way to pick the best solution is for us all to have good visibility into the issue.

@jakearchibald
Copy link
Contributor

jakearchibald commented Jun 28, 2016

Without concrete data on the problem, it's difficult to assess solutions. Service worker startup is not free - but we need to assess when that cost significantly detracts from the benefits. It could be:

I have no fetch event. I'm paying the startup cost for no reason.

We solved this in #718, but Chrome hasn't landed it yet. This is the case for at least one of the large sites reporting this regression.

I have a fetch event but it doesn't respondWith. The result is blocked on SW startup, despite the SW having no impact on the result.

We have an existing solution that we can apply here: passive listeners. Although we may need to disable reading the request body in this case.

I have a fetch event, it sometimes calls respondWith. If it does, it sometimes fetches event.request.

I think this is the case the proposed solution is aiming for, but I really want to hear more about sites that use this pattern.

self.onactivate = (e) => {
  // If unset, preflight requests are sent without special marking
  e.setPreflightHeader("X-Site-Specific-Header", "thinger");

  // ...
}

This suggests the preflight will happen for each and every request (or just navigation requests for some reason?), which feels like a huge change. If I serve a 3gb video from the cache, what happens to the preflight? Is the user going to end up downloading the 3gb again, or will the stream be aborted? Either way, as a developer, I feel like I've just lost a lot of control.

self.onfetch = (e) => {
  if (e.preflightResponse) {
    // This is a navigation fetch which has already been issued.
    // If the `preflightResponse` isn't used, then everything proceeds as
    // if it hadn't been sent in the first place.
  }
}

If the fetch event is blocked on a preflight response, you've killed service worker as a means for creating offline-first experiences. One way around this is to make e.preflightResponse undefined if the fetch event can fire before we have the preflight response. This becomes unpredictable and another loss of control. But furthermore:

self.onfetch = event => {
  event.respondWith(
    event.preflightResponse || fetch(event.request)
  )
}

If preflightResponse is fired because the fetch event won the race, we execute fetch(event.request), making the request a second time. You could work around this by making preflightResponse a promise, but things are getting messy. You're still going to get the double request for existing service workers.

Additionally, using the response is still blocked on the SW, as the fetch event is always consulted. Is that ok? Again I want more concrete data.

The preflight is always going to happen, but I have to opt into using it. It seems really weird that the preflight isn't an opt-in, there isn't an opt-out, yet using the preflight is opt-in.

The goal however isn't to be super declarative about deciding what "routes" are handled in which style. Instead, it's to allow the maximum of flexibility for cooperating servers and clients to eliminate SW startup latency.

I see you put that bit in because I'm in favour of a route-based solution. 😄 But I don't see how routes are a worse solution, and your proposal doesn't feel awfully flexible, easy to reason about, or even address (what seems to be) the route of the problem.

A route-based solution should allow a developer to declare "for requests that look like this, do this", in a way that can at the very least started while the service worker boots up. This wins over the proposed preflight solution because:

  • It needn't be restricted to navigation requests
  • It's opt-in, removing the need for redundant requests & bandwidth usage
  • It can reach completion prior to service worker boot up
  • Whether the service worker is needed at all for a particular request can be determined in advance

But again, I think we need clear data before we do something like this. We shouldn't rubber-stamp a scenario-solving solution, especially while we're so vague on the scenario.

@jakearchibald
Copy link
Contributor

jakearchibald commented Jun 28, 2016

A route-based sketch:

self.addEventListener('install', event => {
  event.waitUntil(
    // populate caches
  );

  event.declareRoute("/", new FallbackSources(
    new CachesSource(),
    new FetchSource(),
    new FetchEventSource()
  ));
});

This API is up for debate wayyyyyy beyond the naming, and I still think we need better data before we'd proceed with this. But here's how the above sketch could work:

installEvent.declareRoute(requestDescriptor, action, opts)

  • requestDescriptor - the requests this route this should handle. I'm doing a lot of hand-waving with this param right now, but it should be able to describe specific paths, path prefixes, path suffixes, origin (or any origin), method, maybe more. This could also be used in an API like request.matches(descriptor). Could also include ways to modify the request, such as appendHeaders.
  • action - where to get the response from.
  • opts.fireFetchEvent = "no" - ("yes"/"no"/"passive") - should the fetch event trigger after the action has resolved? The response, if any, will be available in fetchEvent.routeResponse. This allows JS to handle the response either directly or passively.

new FallbackSources(...sources)

Attempt each source in series until an acceptable response is found. Alternatives could be new AnySource(...sources) which races sources until one provides an acceptable response.

new CachesSource(opts)

Look for a match in the cache. opts can handle the request to match on, which would be the current request by default, but could be set to something static (for getting a fallback page from the cache). Other options would cover specific caches, and things like ignoreSearch.

new FetchSource(opts)

Try to get a response from the network. opts could include the request to fetch, which would be the current request by default. opts could also include things like timeouts and such.

new FetchEventSource()

Defer to the fetch event.

So the example above:

  1. Looks for a response in the cache
  2. Else attempts to request from the network
  3. Else fires the fetch event a usual

The whole thing can complete without the service worker starting up unless we hit step 3, and I'm pretty confident it could be polyfilled.

@wanderview
Copy link
Member

Can we tie this "preflight request" to the concept of "preload"? We already have a rel=preload concept in the specs I believe. The PlzNavigate effort to preload the site before the user finishes typing could be seen as the same mechanism. This would give us a consistent mechanism to hang this optimization on without directly tying it to browser-specific mechanisms.

This does some like it could be useful, but its complex enough that it would probably only be used by a minority of sites. Maybe its worth it if those sites are extremely high traffic?

@wanderview
Copy link
Member

@jakearchibald So, the following would allow service worker startup to happen simultaneous with the network request?

  event.declareRoute("/", new FetchSource(), { fireFetchEvent: 'yes' });

@jakearchibald
Copy link
Contributor

Yep!

@wanderview
Copy link
Member

But the load would be delayed by bad network connections. I guess maybe your AnySource() solution would allow you to handle this, but its not clear to me exactly how the pre-flight proposal would like to handle flaky network, either.

@jakearchibald
Copy link
Contributor

Yeah, AnySource would handle this, as would a FallbackSource that favoured the cache

@wanderview
Copy link
Member

I'd like to know how specific this is to Chrome. From a high-level perspective it seems like a very weird hack.

I don't think this is particularly chrome specific. Starting a worker thread, parsing js, interpreting, running the jit if necessary, etc are all overhead to letting the SW handle the fetch event. Coming up with a mechanism to allow this overhead to be performed in parallel with an initial network request would benefit all browsers.

I guess ultimately its a tradeoff between complexity and those overhead costs. Firefox might be slightly faster to start a service worker right now (I don't really know though), but we will likely end up with overhead similar to chrome once we fix our infrastructure to handle multiple content processes.

If we can come up with something that is not too complex, then it seems like a useful addition.

I do kind of agree with Jake, though, that maybe we should wait to implement these kinds of optimizations. Its still early days for people figuring out how they want to use service workers. If we implement this now we might miss out on a useful pattern people find they need or might make it more general purpose (complex) than is needed.

@jakearchibald
Copy link
Contributor

With the routing proposal...

self.addEventListener('install', event => {
  event.waitUntil(
    // populate caches
  );
});

self.addEventListener('fetch', event => {
  event.respondWith(
    caches.match(event.request).then(r => r || fetch(event.request))
  );
});

...could be replaced with....

self.addEventListener('install', event => {
  event.waitUntil(
    // populate caches
  );

  event.declareRoute("/", new FallbackSources(
    new CachesSource(),
    new FetchSource()
  ));
});

...and the SW wouldn't need to boot up for any fetch events.

My blog currently composes a streamed response. If the cost of service worker bootup was greater than its benefit, I could do:

self.addEventListener('install', event => {
  event.waitUntil(
    // populate caches
  );

  event.declareRoute({mode: "navigation"}, new AnySource(
    new FetchEventSource(),
    new FetchSource()
  ));
});

...allowing the browser to race the network and the fetch event for navigations.

@slightlyoff
Copy link
Contributor Author

A few things seems missing from the conversation.

Many sites we've partnered with have deployed nearly-no-op SWs to do tracking and monitoring. This defeats the no-onfetch optimisation. Sites like FB and Google Inbox have goals that they go to extraordinary lengths to meet but have difficulty achieving thanks to the complexity of their stacks and the # of folks adding features:

  1. Get UI to users as fast as possible
  2. Get fresh data into that UI, even though the UI is script-mediated (and, if not possible, show stale data)
  3. Monitor the performance of the overall experience end-to-end

Getting UI to users tends to involve "booting up" large amounts of JS, CSS, and associated context. This is true for FB, Inbox (in some modes), Docs (in some cases), and many others. In a no-SW world, the best strategy is to inline fresh content into the response document where the booted code can consume it and render it.

With SWs in the mix or server-rendering of snapshots (Inbox & Docs, but only in certain modes), things get more complicated. A service designer of one of these systems wants to be able to:

  • Get UI to the user reliably; that is to say, if there's a "header" or "shell" that can be served from cache with low variance, that's a good thing to do, even if the total latency may be slightly higher due to SW wakeup costs
  • Get requests for fresh data onto the network ASAP. Assuming you respond with a "shell" for example.com/ (which we might imagine coming from a declaratively routed cache as @jakearchibald suggests, but not necessarily so), it'd be helpful to get a request to the server for new data started as soon in the lifetime as possible.
  • Ensure that everything works sensibly when there's no SW in play

To handle the second of those, the proposal I've provided lets the server know a few things:

  1. A Service Worker is in play and therefore the app shell or "header" resources are already booting up
  2. That, as a result, it's safe to respond with just data, not any cruft or dressing associated with a full-page render. In sever-language, it allows the server to send only a "partial"
  3. That the response won't be dropped on the floor because the SW will be in play to handle (and redirect) it

It does this with minimal new infrastructure, enabling both streaming for "static" sites that want to handle HTML partials and not data, as well as allowing sites that deal in data (vs. inline HTML) to do so naturally.

Would love to hear from @bmaurer on the alternatives proposed here as well.

@n8schloss
Copy link

That's a pretty good summary for what we want to do. In the shorter term we're going to cache parts of the page markup and then make a request for the rest, in the much longer run we're going to cache most of the markup but will still need to make a request for page data. In both cases we're going to want to make a request to our server to get data as soon as we can and at the same time start getting code to the client window as soon as we can.

Service worker startup isn't free: in Chrome, our in-the-field logging is telling us that service worker startup adds about 200ms to page load time on desktop. It might be slightly better or worse in other browsers, but it's still going to be a significant cost to startup a process and initialize the service worker code. Our site is pretty optimized to deliver resources quickly and in the right order, so our concern is that this extra time might mean that service workers are a regression in some cases.

What we don't want to do however is have a race between network and cache without actually opening from the service worker. If we're going to build a version of the site that can be fully loaded from a service worker we will most likely still need to make a request each time the page loads to get the newest content, so simply loading the site from cache won't always be a win if we have to wait to make a request afterwards anyway.

The optimization that we need allows us to get the initial request out before the service worker starts up but still allows the fetch event to get and process a request if one was sent out.

Having the option to set a header or other fetch options here is important because even in the short term we aren't going to return the same kind of markup when a service worker is there vs when it isn't. Also, making this field nullable in the fetch event seems fine to me. It can be called possiblePreflightRequest or something like that and then you can wait for it just like a normal request if you want to use it.

@jakearchibald
Copy link
Contributor

@slightlyoff can you answer my questions about your proposal & show why it's better than declarative routing?

@bmaurer
Copy link

bmaurer commented Jul 3, 2016

To add a bit more color to Nate's comments --

First, I think it's worth considering the position a site is in as it first adopts service worker. SW is a huge change to how people write websites. Any complex site is going to take a while to get it's feet wet. One thing that Alex mentions is that some people might use SW to do things that don't involve intercepting the root document the user navigates to (eg, caching static resources). A site might gradually use sw for only parts of the root document. Maybe first they just cache their page header. To the extent that using SW causes a perf hit (as Nate mentions there's a startup cost at least in chrome) that makes it hard for a site to dip their feet into using SW. Even if SW eventually enables great wins sites tend to develop incrementally and want each increment to be better on metrics. So in my mind an ideal solution here would be that a service worker that does nothing, has no cost

Now let's think about a more advanced SW deployment, taking a site like FB as an example. FB would have two goals:

  1. Display the chrome of the app as fast as possible, start warming up the JIT on core JS frameworks, and render existing data in cache
  2. Get up to date data from the server as quickly as possible (eg an updated newsfeed)

I think many apps share this set of dual goals -- for example, an email application would want to show existing emails quickly but fetch updated emails. In order to accomplish (2) one needs to communicate to the site's server "this user is opening the app". This message is generally fairly light weight -- at a minimum it needs to identify the user. But it may also need to communicate a small amount of information about the state of the cache. For example, you might need to communicate the last cached newsfeed story. As a slightly more complicated case one could imagine a large class of apps (uber, postmates, etc) that want to communicate the location as quickly as possible.

On our mobile apps we go to great lengths to make this notification happen as quickly as possible. Early in the startup process we send out a UDP packet that contains an encrypted user identifier. (https://code.facebook.com/posts/1675399786008080/optimizing-facebook-for-ios-start-time/)

Jake's route based proposal makes sense to me here. It seems like a generalization of Alex's preflight request. I think to make Jake's proposal work you'd need a few things:

  1. The cost of having a route that uses the cache and then fetch should be nearly identical to a site which doesn't use SW at all.
  2. A sw should be able to have some state that is sent with a declaratively specified fetch. Eg, last cached feed story time.
  3. It might be nice if the declarative post can be a POST request that is incomplete. That way once the SW starts up it can add more data to the post but the server can start processing

@slightlyoff
Copy link
Contributor Author

slightlyoff commented Jul 3, 2016

I'll try (although I though I had by clarifying the requirements):

  • First, the route based proposal doesn't appear to allow the server to decide how to overload the first response to serve only data (not UI + data).
  • Next, blocking on the SW is usually OK because now we're in a race. SW startup can be in the 10s to hundreds of ms in the worst cases, but isn't usually worse than that. You burn as much on DNS. As a result, it won't generally be a net loss to wait on the SW for mediating this interaction, particularly if you can shave off overhead of getting a new request out the door.
  • Lastly, the routing based proposal either has to defeat PlzNavigate entirely (if you only specify a FetchSource()) or, as I understand the example given, still wait on the renderer process for handling the CacheSource() entries (it's unclear we'd want the mapping logic for them in the browser process).

I think we should probably do declarative routing. We should probably go with something not dissimilar from your proposal, @jakearchibald, but I don't think it solves the issues @n8schloss or @bmaurer are raising nor does it really help our PlzNavigate integration. It's also quite a bit larger.

As a final thought, I think these can layer together quite nicely. Having the preflight come back as a readable stream that can be directed to the document (in cases where you might use a FetchSource() or a CacheSource()) doesn't seem to be at odds with the declarative proposal. It does mean you'd need to wake the SW to handle it, but that could also be turned into a declarative option I suppose. It's a new model and quite a lot of new API, though, so I have a preference for a minimal intervention.

@annevk
Copy link
Member

annevk commented Jul 4, 2016

I think that if we want to handle navigation differently (so sites can only do service workers for subresources or some such) it needs to be an opt-in primitive of sorts. We should not take total network control away from the site due to Chrome's PlzNavigate project.

@annevk
Copy link
Member

annevk commented Jul 4, 2016

It's also now more clear than ever that those pushing to make fetch opt-in were correct. 😕

@jakearchibald
Copy link
Contributor

jakearchibald commented Jul 4, 2016

Trying to get a concrete picture of the problem here. Someone shout up if this is wrong.

The loading strategy is:

  • Load UI sans-content from the cache
  • JS loads content 'include' from network

Here's some sample code for this: https://gist.github.com/jakearchibald/4927b34bb14c3681a3c1c1ce6ede5b09/b43199815f451cadde7280aa9b8dea1f84a88387

And the problem is:

  • On initial navigation the service worker bootup time (200ms?) delays cached shell render and delays subsequent network request for content

It's still unclear to me how that delay compares to the benefits of rendering without the network. Even if it's only the shell that's rendered without the network, getting the shell & JS from the cache should come with a benefit vs the network.

But yes, as a site adopts SW, there may be particular navigation routes that simply defer to default browser behaviour.

@jakearchibald
Copy link
Contributor

jakearchibald commented Jul 4, 2016

@n8schloss

What we don't want to do however is have a race between network and cache without actually opening from the service worker.

But what about a race between the fetch event and a default network response?

@jakearchibald
Copy link
Contributor

@bmaurer

A site might gradually use sw for only parts of the root document

This is why we need a routing approach. Making all-or-nothing behaviour changes doesn't fit into this gradual model.

So in my mind an ideal solution here would be that a service worker that does nothing, has no cost

Like I said in #920 (comment), the spec already caters for this, and impelementation is in progress.

The cost of having a route that uses the cache and then fetch should be nearly identical to a site which doesn't use SW at all.

Seems fair. It certainly shouldn't be slower than appcache.

A sw should be able to have some state that is sent with a declaratively specified fetch. Eg, last cached feed story time.

Do you need something beyond last-modified or ETag?

It might be nice if the declarative post can be a POST request that is incomplete. That way once the SW starts up it can add more data to the post but the server can start processing

This seems like a separate thing (it's the first time posting has come up). What problem is this solving? As in, when would you want to POST along with an initial navigation? Related: there is a rough plan for background posting/downloads.

@jakearchibald
Copy link
Contributor

jakearchibald commented Jul 4, 2016

@slightlyoff

the route based proposal doesn't appear to allow the server to decide how to overload the first response

These could easily be options to FetchSource.

event.declareRoute({mode: 'navigation'}, new FetchSource({
  applyHeaders: {
    'X-Site-Specific-Header': "thinger"
  }
}), {fireFetchEvent: 'yes'});

I think that's equivalent to your proposal (I don't know for sure as you haven't described how your proposal works),

blocking on the SW is usually OK because now we're in a race

Are we sure that's the case? Based on the code example above, if we had routing, the following could happen while the SW was booting up:

  • Shell fetched from cache, giving a first render
  • JS fetched from cache
  • JS parsed & executed
  • Fetch called for data

If that takes longer than 200ms, then we've won. Not only that, but we rendered while that was happening.

Contrast this to your preflight proposal, where:

  • Render is blocked by SW startup, and potentially the network request too (you haven't explained how it works)
  • The fetch call is a separate request, so it won't have the preflight there, so I don't see how the problem is solved
  • The fetch call may also be blocked on the network, even if it doesn't need to be.

Lastly, the routing based proposal either has to defeat PlzNavigate entirely (if you only specify a FetchSource()) or, as I understand the example given, still wait on the renderer process for handling the CacheSource() entries (it's unclear we'd want the mapping logic for them in the browser process).

If you're racing FetchSource with other sources it won't be a problem. That's basically what your proposal is doing (I think), but it only does that.

Having the preflight come back as a readable stream that can be directed to the document (in cases where you might use a FetchSource() or a CacheSource()) doesn't seem to be at odds with the declarative proposal. It does mean you'd need to wake the SW to handle it, but that could also be turned into a declarative option I suppose.

We're already talking about hacking around your proposal. It's too high level, and too all-encompasing. Also, you still haven't answered my questions on how it works.

My questions are in #920 (comment). The outstanding questions are:

  • Is the preflight opt-in? If so, by what mechanism?
  • Does the preflight happen for every request, or just navigations?
  • What type is e.preflightResponse is it a Response?
  • What happens if the fetch event can fire before e.preflightResponse is ready?
  • If I do not use the preflight, what happens to it? Eg what if it's a 3gb video?
  • If my SW does fetch(event.request), is that an additional request? That will be the case for most existing SWs
  • If my SW does fetch(event.request.url), is that an additional request?

@jakearchibald
Copy link
Contributor

@annevk

It's also now more clear than ever that those pushing to make fetch opt-in were correct.

It's already opt-in. If you don't want it, don't add a fetch listener.

@annevk
Copy link
Member

annevk commented Jul 4, 2016

True, we made it opt-in ex post facto through a hack. It still looks weird compared to the other APIs exposed by service workers and makes it harder to add registration options specific to fetch, such as some kind of filtering mechanism.

@jakearchibald
Copy link
Contributor

Here's the current model for an "app shell" site that uses JS to populate content https://gist.github.com/jakearchibald/4927b34bb14c3681a3c1c1ce6ede5b09/b43199815f451cadde7280aa9b8dea1f84a88387

Here's what it would look like with the routing proposal https://gist.github.com/jakearchibald/4927b34bb14c3681a3c1c1ce6ede5b09/c8bdf11ada8a99bb506f369d7d040051ad4dda16 - this doesn't need the SW to boot up at all.

@slightlyoff, can you show how your solution would work in this case? (in addition to clarifying the behaviour of your proposal)

@bmaurer
Copy link

bmaurer commented Jul 4, 2016

The loading strategy is:

And the problem is:

  • On initial navigation the service worker bootup time (200ms?) delays cached shell render and delays subsequent network request for content

It's still unclear to me how that delay compares to the benefits of rendering without the network. Even if it's only the shell that's rendered without the network, getting the shell & JS from the cache should come with a benefit vs the network.

The problem is that even though we win on getting the shell displayed faster, we lose in terms of the amount of time that it takes for the server to start processing our request and to send more data. Keep in mind that today Facebook uses chunked encoding. Within 10 ms of a request touching our server we start sending back instructions on how to load the app shell. So for us the loading of the app shell is already happening in parallel with our server work. For many requests the critical path in displaying new content is the speed of computing that ranking. Even if SW is a win in being able to display the app shell more quickly that may not help us meet the business objective of displaying up to date newsfeed stories more quickly.

What we're looking for here is the best of both worlds -- being able to display the app shell without a network connection without increasing the delay between when a user clicks facebook.com and when our servers are able to start giving them new content.

@jakearchibald
Copy link
Contributor

jakearchibald commented Jul 4, 2016

@bmaurer

Thanks! Is there any server rendering of dynamic content going on here, or is content added to the page using JS? If it's the former, a streaming solution seems a better fit. If it's the latter, could <link rel="preload"> (in combination with the routing) start warming up the server as the shell is booting up?

@bmaurer
Copy link

bmaurer commented Jul 4, 2016

This would be content to be added via JS. The problem is that rel=preload is going to get loaded way too late. In chrome at least that requires the renderer process to be started. Also, this doesn't give the SW a great chance to save state. Your idea of using etags is interesting but I think it might not capture the kinds of data one would want to send. For example, one may wish to send information about what JS/CSS files are in the cache so the server can render markup for an appropriate version of the site.

Ideally what I'd want here is something like "facebook.com/ has a route that will load a cached version of the app shell. In addition, it will send a request to facebook.com/newdata with the header CurState:XYZ where XYZ can be managed by the service worker as its cached state changes". The request to /newdata should have as little delay as possible.

@n8schloss
Copy link

From a developer perspective it's much much much simpler if you are able to call enable()/disable() during the active event. It'd be a bit confusing to have to call enable() first thing after your SW runs once it's already been activated, or to have to make logic to deal with both the activated/unactivated cases.

@mfalken
Copy link
Member

mfalken commented Nov 17, 2016

That's covered because in onactivate the worker is already active. This will restrict you from calling it in oninstall (in the case of a new registration. during updates it's callable because there's an active worker).

@mkruisselbrink
Copy link
Collaborator

From a developer perspective it's much much much simpler if you are able to call enable()/disable() during the active event.

During the activate event the worker is already the active worker (at least with the terminology as used in the spec), so I'd expect that to still work

@jakearchibald
Copy link
Contributor

I think it's important for NPS promises to resolve once the data is successfully stored.

As for the timing, I was following the push API's lead here. I'm not against changing as long as we both change.

I don't fully understand the complexity in waiting. Assuming…

registration.navigationPreload.enable();

…waits for an active worker, how is this more complex than:

await registration.ready;
registration.navigationPreload.enable();

(assuming #770)

@mfalken
Copy link
Member

mfalken commented Nov 20, 2016

I actually didn't really have waiting in mind when talking about complexity above. The complexity was about implementing resolve-after-store without waiting for active. This would likely require (for Chrome) initially storing the NPS outside the registration record if needed, then moving it into the record once that's stored (to avoid regressing the time taken to load a registration).

Implementation-wise, I think waiting is doable. The big disadvantage is the event.waitUntil(registration.navigationPreload.enable()) deadlock footgun in the install event. I guess developer tools can issue a warning in this case, but we'd need to do it for any API that waits for active.

It's still simplest to just reject when there is no active worker though. Since Chrome's push API has done this since inception, it's probably OK to keep doing it.

@jakearchibald
Copy link
Contributor

@mattto

The complexity was about implementing resolve-after-store without waiting for active

But store can only happen once there's an active worker. If the tab closes before a worker becomes active, nothing is stored, so I'm not sure why we'd need to store anything outside the registration.

It's still simplest to just reject when there is no active worker though. Since Chrome's push API has done this since inception, it's probably OK to keep doing it.

Hmm, yeah, background spec also does the same even though it's spec'd to "wait". @wanderview how do you feel about this change? I'm happy with it.

If we agree I'll submit PRs in push, sync & update NPS.

@wanderview
Copy link
Member

Works for me as long as we update the spec to match. Thanks.

@mfalken
Copy link
Member

mfalken commented Nov 22, 2016

The complexity was about implementing resolve-after-store without waiting for active

But store can only happen once there's an active worker. If the tab closes before a worker becomes active, nothing is stored, so I'm not sure why we'd need to store anything outside the registration.

Ah yes, this is getting into implementation details that are probably not worth going into. The proposals to reject on non-active or wait for active are OK with me, with a preference for rejection. Again my complexity feedback was about trying to implement a solution that neither waits for active nor rejects on non-active, yet still has the properties that upon resolution:

  • the NPS has been stored, or
  • the NPS will be stored once the registration is stored (in case of no active worker yet)

For this solution, it’s not enough to just set the NPS in memory on our “IO” thread (where live service worker registrations are managed) and expect it to be stored once the registration is stored because of race conditions: the store operation may already be queued up or running on the “DB” thread. So I was thinking the DB thread should always write the newest NPS state: either in the registration if it exists or outside if not, and swap in outside NPS once the registration is stored. But I guess alternatively the DB thread could keep unstored NPS in memory and check that whenever storing a registration, so yes it could be possible. Still a bit clunky though.

@jakearchibald
Copy link
Contributor

jakearchibald commented Nov 22, 2016

I'm going to update the PR so we reject if there's no active SW. I'll submit PRs for background sync and push to do the same.

Update: I'm going to tackle #965 then this

jakearchibald added a commit to w3c/push-api that referenced this issue Dec 2, 2016
jakearchibald added a commit to jakearchibald/BackgroundSync that referenced this issue Dec 2, 2016
jakearchibald added a commit to jakearchibald/BackgroundSync that referenced this issue Dec 2, 2016
@jakearchibald
Copy link
Contributor

jakearchibald commented Dec 2, 2016

Ok, I've updated the spec to throw if there's no active worker. Also filed w3c/push-api#230 and WICG/background-sync#134.

@jakearchibald
Copy link
Contributor

F2F: Chrome want to ship this.

Feedback good from Facebook, although they don't have solid numbers as it hasn't shipped to Chrome stable yet.

No objections to Chrome shipping from other vendors.

@jakearchibald
Copy link
Contributor

This has shipped in Chrome \o/

@wanderview
Copy link
Member

Is Edge working on this yet?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests