-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Website] 100% reliable deployments #1821
Labels
[Aspect] Website
[Priority] High
[Type] Enhancement
New feature or request
[Type] Reliability
Playground uptime, reliability, not crashing
Comments
adamziel
added
[Type] Enhancement
New feature or request
[Type] Reliability
Playground uptime, reliability, not crashing
[Aspect] Website
labels
Sep 28, 2024
adamziel
added a commit
that referenced
this issue
Oct 2, 2024
…r tabs to prevent fatal errors after new deployments. (#1822) ## Motivation for the change, related issues Solves fatal Playground breakages after new version deployments by adopting the following version upgrade protocol: * Playground version is upgraded as early as possible after a new release * HTTP cache is skipped when fetching new assets * Stale Playground tabs are forcibly refreshed Related to #1821 ## The problem Playground got affected by HTTP caching and ended up loading assets from both the old release and the new release. This broke the app's dependency graph and led to fatal errors. See the visualisation below. When Playground v184 is released, the app will only work properly if all the loaded assets come from v184: ![371781531-608a780e-60b8-4ed4-969a-d7497c7500a7](https://github.com/user-attachments/assets/605cba58-8eba-4fdb-b527-8c1f6942ce24) ## The solution This PR ensures HTTP cache is skipped for assets that are cached offline. This isn't perfect as the browser will sometimes download the same file twice, but it's much better than breaking the app. We'll explore making the most out of both cache layers in the future. Here's a rundown of the caching strategy implemented in this PR: * Playground version is upgraded as early as possible after a new release * HTTP cache is skipped ### Playground version is upgraded as early as possible after a new release New service workers call .skipWaiting(), immediately claim all the clients that were controlled by the previous service worker, and forcibly refreshes them. Why? Because Playground fetches new resources asynchronously and on demand. However, deploying a new webapp version of the app destroys the resources referenced in the previous webapp version. Therefore, we can't allow the previous version to run when a new version becomes available. #### Push notifications It would be supremely useful to proactively notify the webapp after a fresh deployment. Playground doesn't do that yet but it likely will in the future. ### HTTP cache is skipped Playground relies on the **Cache only** strategy. It loads assets from the network, caches them, and serves them from the cache. The assumption is that all network requests yield the most recent version of the remote file. This helps us avoid the HTTP cache problem. #### Cache layers We're dealing with the following cache layers: * HTTP cache in the browser * CacheStorage in the service worker * Edge Cache on playground.wordpress.net #### HTTP cache in the browser This service worker skips the browser HTTP cache for all network requests. This is because the HTTP cache caused a particularly nasty problem in Playground deployments. Installing a new service worker purged the CacheStorage and requested a new set of assets from the network. However, some of these requests were served from the HTTP cache. As a result, Playground would start loading a mix of old and new assets and quickly error out. What made it worse is that this broken state was cached in CacheStorage, breaking Playground for weeks until the cache was refreshed. See #1822 for more details. #### CacheStorage in the service worker This servive worker uses a **Cache only** strategy to ensure all the loaded assets come from the same webapp build. The **Cache only** strategy means Playground only loads each assets from the network once, caches it, and serves it from the cache from that point on. The only times Playground reaches to the network are: * Before the service worker is installed. * When the service worker is being activated. * On CacheStorage cache miss occurs. ### Edge Cache on playground.wordpress.net The remote server (playground.wordpress.net) has an Edge Cache that's populated with all static assets on every webapp deployment. All the assets served by playground.wordpress.net at any point in time come from the same build and are consistent with each other. The deployment process is atomic-ish so the server should never expose a mix of old and new assets. However, what if a new webapp version is deployed right when someone downloaded 10 out of 27 static assets required to boot Playground? Right now, they'd end up in an undefined state and likely see an error. Then, on a page refresh, they'd pick up a new service worker that would purge the stale assets and boot the new webapp version. This is not a big problem for now, but it's also not the best user experience. This can be eventually solved with push notifications. A new deployment would notify all the active clients to upgrade and pick up the new assets. ## Other changes In addition, this PR: * Adds E2E tests for app deployments and offline mode * Updates CI to run Playwright tests: * Firefox * Safari * Chrome * Fixes a few paper cuts * Fixed: Boot halted when OPFS isn't available due to error/success hooks never running (4e0ef74) * Fixed: "Save in this browser" option stays available even when there's no OPFS support (f6225a9) ## Paths not taken * Relying on build-time hashes in the filenames for all caching. We can't rely on that for the most important routes: `/`, `/index.html`, `/remote.html`, `/sw.js` – they need stable URLs for multiple reasons. * A different caching strategy, such as [network falling back to cache](https://web.dev/articles/offline-cookbook#network-falling-back-to-cache). ## Caveats and follow-up work * Let's find a way to leverage HTTP cache without breaking the offline cache. * There's no way to recover from a deployment happening during a page load – let's fix it. * A new service worker forcibly reloads other browser tabs and destroys their in-memory context. Let's solve it by storing temporary sites in OPFS. ## Testing Instructions (or ideally a Blueprint) CI. Yes, it sounds like a lame testing plan fur such a profound change. However, almost none of these changes can be tested in a local dev environment and a large part of this work was about covering app deployment in our E2E tests. If you want to try these tests locally and see what they do, you'll need this special setup: ```bash $ npx nx e2e:playwright:prepare-app-deploy-and-offline-mode playground-website $ npx playwright test --config=packages/playground/website/playwright/playwright.config.ts ./packages/playground/website/playwright/e2e/deployment.spec.ts --ui ``` ## Related resources * PR that turned off HTTP caching: #1822 * Exploring all the cache layers: #1774 * Cache only strategy: https://web.dev/articles/offline-cookbook#cache-only * Service worker caching and HTTP caching: https://web.dev/articles/service-worker-caching-and-http-caching --------- Co-authored-by: Bero <berislav.grgicak@gmail.com> Co-authored-by: Brandon Payton <brandon@happycode.net>
adamziel
added a commit
that referenced
this issue
Oct 4, 2024
…r tabs to prevent fatal errors after new deployments. (#1822) ## Motivation for the change, related issues Solves fatal Playground breakages after new version deployments by adopting the following version upgrade protocol: * Playground version is upgraded as early as possible after a new release * HTTP cache is skipped when fetching new assets * Stale Playground tabs are forcibly refreshed Related to #1821 ## The problem Playground got affected by HTTP caching and ended up loading assets from both the old release and the new release. This broke the app's dependency graph and led to fatal errors. See the visualisation below. When Playground v184 is released, the app will only work properly if all the loaded assets come from v184: ![371781531-608a780e-60b8-4ed4-969a-d7497c7500a7](https://github.com/user-attachments/assets/605cba58-8eba-4fdb-b527-8c1f6942ce24) ## The solution This PR ensures HTTP cache is skipped for assets that are cached offline. This isn't perfect as the browser will sometimes download the same file twice, but it's much better than breaking the app. We'll explore making the most out of both cache layers in the future. Here's a rundown of the caching strategy implemented in this PR: * Playground version is upgraded as early as possible after a new release * HTTP cache is skipped ### Playground version is upgraded as early as possible after a new release New service workers call .skipWaiting(), immediately claim all the clients that were controlled by the previous service worker, and forcibly refreshes them. Why? Because Playground fetches new resources asynchronously and on demand. However, deploying a new webapp version of the app destroys the resources referenced in the previous webapp version. Therefore, we can't allow the previous version to run when a new version becomes available. #### Push notifications It would be supremely useful to proactively notify the webapp after a fresh deployment. Playground doesn't do that yet but it likely will in the future. ### HTTP cache is skipped Playground relies on the **Cache only** strategy. It loads assets from the network, caches them, and serves them from the cache. The assumption is that all network requests yield the most recent version of the remote file. This helps us avoid the HTTP cache problem. #### Cache layers We're dealing with the following cache layers: * HTTP cache in the browser * CacheStorage in the service worker * Edge Cache on playground.wordpress.net #### HTTP cache in the browser This service worker skips the browser HTTP cache for all network requests. This is because the HTTP cache caused a particularly nasty problem in Playground deployments. Installing a new service worker purged the CacheStorage and requested a new set of assets from the network. However, some of these requests were served from the HTTP cache. As a result, Playground would start loading a mix of old and new assets and quickly error out. What made it worse is that this broken state was cached in CacheStorage, breaking Playground for weeks until the cache was refreshed. See #1822 for more details. #### CacheStorage in the service worker This servive worker uses a **Cache only** strategy to ensure all the loaded assets come from the same webapp build. The **Cache only** strategy means Playground only loads each assets from the network once, caches it, and serves it from the cache from that point on. The only times Playground reaches to the network are: * Before the service worker is installed. * When the service worker is being activated. * On CacheStorage cache miss occurs. ### Edge Cache on playground.wordpress.net The remote server (playground.wordpress.net) has an Edge Cache that's populated with all static assets on every webapp deployment. All the assets served by playground.wordpress.net at any point in time come from the same build and are consistent with each other. The deployment process is atomic-ish so the server should never expose a mix of old and new assets. However, what if a new webapp version is deployed right when someone downloaded 10 out of 27 static assets required to boot Playground? Right now, they'd end up in an undefined state and likely see an error. Then, on a page refresh, they'd pick up a new service worker that would purge the stale assets and boot the new webapp version. This is not a big problem for now, but it's also not the best user experience. This can be eventually solved with push notifications. A new deployment would notify all the active clients to upgrade and pick up the new assets. ## Other changes In addition, this PR: * Adds E2E tests for app deployments and offline mode * Updates CI to run Playwright tests: * Firefox * Safari * Chrome * Fixes a few paper cuts * Fixed: Boot halted when OPFS isn't available due to error/success hooks never running (4e0ef74) * Fixed: "Save in this browser" option stays available even when there's no OPFS support (f6225a9) ## Paths not taken * Relying on build-time hashes in the filenames for all caching. We can't rely on that for the most important routes: `/`, `/index.html`, `/remote.html`, `/sw.js` – they need stable URLs for multiple reasons. * A different caching strategy, such as [network falling back to cache](https://web.dev/articles/offline-cookbook#network-falling-back-to-cache). ## Caveats and follow-up work * Let's find a way to leverage HTTP cache without breaking the offline cache. * There's no way to recover from a deployment happening during a page load – let's fix it. * A new service worker forcibly reloads other browser tabs and destroys their in-memory context. Let's solve it by storing temporary sites in OPFS. ## Testing Instructions (or ideally a Blueprint) CI. Yes, it sounds like a lame testing plan fur such a profound change. However, almost none of these changes can be tested in a local dev environment and a large part of this work was about covering app deployment in our E2E tests. If you want to try these tests locally and see what they do, you'll need this special setup: ```bash $ npx nx e2e:playwright:prepare-app-deploy-and-offline-mode playground-website $ npx playwright test --config=packages/playground/website/playwright/playwright.config.ts ./packages/playground/website/playwright/e2e/deployment.spec.ts --ui ``` ## Related resources * PR that turned off HTTP caching: #1822 * Exploring all the cache layers: #1774 * Cache only strategy: https://web.dev/articles/offline-cookbook#cache-only * Service worker caching and HTTP caching: https://web.dev/articles/service-worker-caching-and-http-caching --------- Co-authored-by: Bero <berislav.grgicak@gmail.com> Co-authored-by: Brandon Payton <brandon@happycode.net>
adamziel
added a commit
that referenced
this issue
Oct 4, 2024
…r tabs to prevent fatal errors after new deployments. (#1822) ## Motivation for the change, related issues Solves fatal Playground breakages after new version deployments by adopting the following version upgrade protocol: * Playground version is upgraded as early as possible after a new release * HTTP cache is skipped when fetching new assets * Stale Playground tabs are forcibly refreshed Related to #1821 ## The problem Playground got affected by HTTP caching and ended up loading assets from both the old release and the new release. This broke the app's dependency graph and led to fatal errors. See the visualisation below. When Playground v184 is released, the app will only work properly if all the loaded assets come from v184: ![371781531-608a780e-60b8-4ed4-969a-d7497c7500a7](https://github.com/user-attachments/assets/605cba58-8eba-4fdb-b527-8c1f6942ce24) ## The solution This PR ensures HTTP cache is skipped for assets that are cached offline. This isn't perfect as the browser will sometimes download the same file twice, but it's much better than breaking the app. We'll explore making the most out of both cache layers in the future. Here's a rundown of the caching strategy implemented in this PR: * Playground version is upgraded as early as possible after a new release * HTTP cache is skipped ### Playground version is upgraded as early as possible after a new release New service workers call .skipWaiting(), immediately claim all the clients that were controlled by the previous service worker, and forcibly refreshes them. Why? Because Playground fetches new resources asynchronously and on demand. However, deploying a new webapp version of the app destroys the resources referenced in the previous webapp version. Therefore, we can't allow the previous version to run when a new version becomes available. #### Push notifications It would be supremely useful to proactively notify the webapp after a fresh deployment. Playground doesn't do that yet but it likely will in the future. ### HTTP cache is skipped Playground relies on the **Cache only** strategy. It loads assets from the network, caches them, and serves them from the cache. The assumption is that all network requests yield the most recent version of the remote file. This helps us avoid the HTTP cache problem. #### Cache layers We're dealing with the following cache layers: * HTTP cache in the browser * CacheStorage in the service worker * Edge Cache on playground.wordpress.net #### HTTP cache in the browser This service worker skips the browser HTTP cache for all network requests. This is because the HTTP cache caused a particularly nasty problem in Playground deployments. Installing a new service worker purged the CacheStorage and requested a new set of assets from the network. However, some of these requests were served from the HTTP cache. As a result, Playground would start loading a mix of old and new assets and quickly error out. What made it worse is that this broken state was cached in CacheStorage, breaking Playground for weeks until the cache was refreshed. See #1822 for more details. #### CacheStorage in the service worker This servive worker uses a **Cache only** strategy to ensure all the loaded assets come from the same webapp build. The **Cache only** strategy means Playground only loads each assets from the network once, caches it, and serves it from the cache from that point on. The only times Playground reaches to the network are: * Before the service worker is installed. * When the service worker is being activated. * On CacheStorage cache miss occurs. ### Edge Cache on playground.wordpress.net The remote server (playground.wordpress.net) has an Edge Cache that's populated with all static assets on every webapp deployment. All the assets served by playground.wordpress.net at any point in time come from the same build and are consistent with each other. The deployment process is atomic-ish so the server should never expose a mix of old and new assets. However, what if a new webapp version is deployed right when someone downloaded 10 out of 27 static assets required to boot Playground? Right now, they'd end up in an undefined state and likely see an error. Then, on a page refresh, they'd pick up a new service worker that would purge the stale assets and boot the new webapp version. This is not a big problem for now, but it's also not the best user experience. This can be eventually solved with push notifications. A new deployment would notify all the active clients to upgrade and pick up the new assets. ## Other changes In addition, this PR: * Adds E2E tests for app deployments and offline mode * Updates CI to run Playwright tests: * Firefox * Safari * Chrome * Fixes a few paper cuts * Fixed: Boot halted when OPFS isn't available due to error/success hooks never running (4e0ef74) * Fixed: "Save in this browser" option stays available even when there's no OPFS support (f6225a9) ## Paths not taken * Relying on build-time hashes in the filenames for all caching. We can't rely on that for the most important routes: `/`, `/index.html`, `/remote.html`, `/sw.js` – they need stable URLs for multiple reasons. * A different caching strategy, such as [network falling back to cache](https://web.dev/articles/offline-cookbook#network-falling-back-to-cache). ## Caveats and follow-up work * Let's find a way to leverage HTTP cache without breaking the offline cache. * There's no way to recover from a deployment happening during a page load – let's fix it. * A new service worker forcibly reloads other browser tabs and destroys their in-memory context. Let's solve it by storing temporary sites in OPFS. ## Testing Instructions (or ideally a Blueprint) CI. Yes, it sounds like a lame testing plan fur such a profound change. However, almost none of these changes can be tested in a local dev environment and a large part of this work was about covering app deployment in our E2E tests. If you want to try these tests locally and see what they do, you'll need this special setup: ```bash $ npx nx e2e:playwright:prepare-app-deploy-and-offline-mode playground-website $ npx playwright test --config=packages/playground/website/playwright/playwright.config.ts ./packages/playground/website/playwright/e2e/deployment.spec.ts --ui ``` ## Related resources * PR that turned off HTTP caching: #1822 * Exploring all the cache layers: #1774 * Cache only strategy: https://web.dev/articles/offline-cookbook#cache-only * Service worker caching and HTTP caching: https://web.dev/articles/service-worker-caching-and-http-caching --------- Co-authored-by: Bero <berislav.grgicak@gmail.com> Co-authored-by: Brandon Payton <brandon@happycode.net>
adamziel
added a commit
that referenced
this issue
Oct 7, 2024
…first strategy (#1849) ## Motivation for the change, related issues Related to #1821 Changes the webapp upgrade protocol proposed in #1822 to avoid forcibly refreshing the browser tabs with unsaved changes in them. ## Technical implementation **Before this PR**, the new service worker would clear the offline cache, claim all the active clients, and forcibly refresh them to ensure the latest Playground version is loaded everywhere. This worked, but every webapp upgrade would destroy any work the user may have done in their temporary Playground. We've explored [storing temporary Playgrounds in OPFS](#1838), but backtracked because 1) it created an uncanny amount of complexity, and 2) some browsers (e.g. Safari in private mode) don't support OPFS and must rely on a temporary in-memory site. **After this PR**, the service worker clears the offline cache, claims all the active clients, but it doesn't forcibly refresh them. Instead, it uses the network-first strategy for the `remote.html` route and the `/` route. All the other files are still loaded using the cache-first strategy. Every Playground that's already open, either temporary or stored, will remain functional. The heavy, asynchronously loaded resources such as PHP.wasm and WordPress.zip were already processed – there's no user flow that could lead to `import()`-ing a non-existing `php.js` file. Every newly opened Playground will be loaded using a freshly downloaded `remote.html` file containing references to freshly deployed Playground assets. Thus ## Other changes This PR inlines the reusable service worker utilities from `packages/php-wasm/web/src/lib/register-service-worker.ts` into `@wp-playground` packages. It turns out, they weren't as reusable and keeping them separate was annoying. I'm now convinced the service worker bits are application specific and splitting them between multiple packages just isn't useful. ## Testing instructions Review the app deployment E2E tests check what we need to check, and them confirm they are green in the CI.
This was referenced Oct 7, 2024
This was referenced Oct 24, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
[Aspect] Website
[Priority] High
[Type] Enhancement
New feature or request
[Type] Reliability
Playground uptime, reliability, not crashing
Playground deployments require clearing the cache and the service worker way too often. Let's ensure a high standard of stability. All deployments should always work on all browsers without having to clear the cache.
Done is
We have an E2E suite that tests a Playground website deployment from a very old version to a new version, and ensures the following things work:
There should be no intermittent failures, stale fetch() responses, or problems with stale service workers.
Root cause of the problem
Two reasons are at play:
Dependency graphs
Deploying a new Playground version does two things:
If the previous version of Playground is still running, it will attempt to fetch the old assets – and fail:
This wasn't a big deal a few months ago, since a page reload would solve this, but then we've introduced the offline support in #1483 .
Caching
The offline support keeps a copy of all the accessed old assets until the new service worker is installed. This might take 24 hours or sometimes longer! During that time, visiting playground.wordpress.net would load the cached index.html file and the rest of the stale dependency graph from the previous Playground release. Since some files are only loaded on demand, we'd get a mixture of cached assets and network errors – effectively putting the app in an undefined state.
The solution
I’m 95% convinced we must always force Playground to switch to the new service worker.
However, this would require refreshing all the open tabs and would trash any temporary Playgrounds.
Therefore, we might have to store all Playgrounds in OPFS, even the temporary ones. To maintain good UX, we'd add a cleanup mechanism to hide the "stored temporary" Playgrounds after a regular page refresh, and we'd keep them visible after a page refresh triggered by a new Playground release. We could also add a “Recently archived” button to recover anything archived during the last 24 hours – cc @jarekmorawski for thoughts.
Solutions without a forced page refresh
I couldn't find any solution that would keep the Playground site working without a forced page refresh:
cc @brandonpayton @bgrgicak
The text was updated successfully, but these errors were encountered: