Add native asset caching to Service Worker #556

Jaifroid · 2019-08-22T10:41:25Z

This addresses ~~#441~~ #414. It adds the native Cache API to service-worker.js for the caching of css and js assets. It significantly speeds up SW mode. The cached entries can be checked in the Cache entry of devtools in Chromium or Edge (I didin't try with Firefox). Example of running this on the most recent Ray Charles is shown in the image below (from Chromium / Edge 78). This is not for merging yet, but I would appreciate testing and any comments.

I think the main TODO is to provide a mechanism for deleting / pruning entries. We could just do it on change of ZIM, but as the quota on most devices is quite generous, and as the entries are unique to each ZIM, a more subtle mechanism could be found. We could provide usage statistics on the Configuration page and a button to clear the cache if it is quite large, for example. This would need to be decided.

Info for testing: To delete the cache and Service Worker after testing (in Chromium), delete kiwixjs-cache (right-click and delete) and unregister the Service Worker (under Application -> Service Workers ... ).

If there are code updates, a similar procedure would need to be followed, then a full page refresh (e.g. Ctrl-Shift-R), before testing again.

mossroy · 2019-08-22T20:33:37Z

I suppose this addresses #414, not #441 (wrong link in the description).
As you know, I believe the use of emscripten version of kiwix-lib/libzim is the right way to improve performance (and libzim already has its own cache).
On the other hand, implementing #414 might be easy, maybe easier than #411 or #415. And it could still be useful with kiwix-lib/libzim.
So why not, if it's easy to unplug later (it seems to be the case), if we feel it becomes useless in the future.

It's hard to see the changes in github, that tends to say that everything has been modified.
But I saw the most important parts, which look good.

I quickly tested on Firefox, and it seems to work well too (I see the cache content in devtools, and the log properly says it uses it). This is promising.
I tested in a Chromium extension, but it gives some errors :

Uncaught (in promise) TypeError: Request scheme 'chrome-extension' is unsupported
at service-worker.js:192

It fails because we can not cache.put a request with this scheme.
I don't know if it would be the same in Firefox (with a schema 'moz-extension') because, for now, the SW mode is not available at all with Firefox extensions.
I don't know what is the best way to handle that : blacklist this scheme in the SW cache code? But maybe Google might allow it in the future, who knows.
If we put a whitelist instead, for specific schemes (like file, http and https), we might miss the opportunity to activate the cache if Mozilla allows it (or in some other context).
Maybe we could trap the error, keep inside the SW a list of schemes that failed, add an info log, and then do not attempt to cache any of these schemes. But maybe it becomes a bit complicated.

The Cache API is not available on all platforms, see https://developer.mozilla.org/en-US/docs/Web/API/Cache#Browser_compatibility. I think we don't have any supported platform that supports SW, but not Cache API. But, to be cleaner, it would be better to test for this API availability before using it. If it's not available, we can simply fallback to standard behavior (without cache), and put an information message in the console.

Regarding the cache eviction strategy, things are quite open.
I agree that, in any case, we should add a button in settings to manually clear the SW cache. If it's easy to add the statistics, it would be great.
I also agree that we can keep cache from different ZIM files (not necessary to clear the cache on ZIM change)
If we use the cache only for javascript and CSS, maybe it's not necessary to add an eviction strategy, and simply rely on the standard strategy of the browser (it can clear the cache if it feels it's necessary, and can ask the user if he allows to use more cache in case it becomes big). Because these files are not numerous and won't use a lot of space, it might be enough. Except for PhET, that we should test more.

But we also might activate the cache for some small images (that could be useful for some icons), for which a specific eviction strategy could be used (with a smaller available space). But well, that could be for later, and maybe too complicated for the need.

Jaifroid · 2019-08-22T21:00:28Z

I'm pleased you're interested in this, @mossroy ! It really does make a relatively easy performance jump in SW mode, while we wait for #116. Up till now, jQuery mode has been much faster than SW mode, due to the fact that we have memory caching for jQuery, but none for SW (and also because SW decompresses scripts). This equalizes things, and makes using SW mode much more feasible for daily use.

It's a real shame that the Chrome Extension schema doesn't work with Cache, as Chrome Extension is a major use-case for this app. I wonder if there's a solution for that, or whether it's just outright not supported. It's worth investigating.

Regarding an interface to control the Cache, I did develop one when I worked on a comprehensive cache based on indexDB, and it looks like the image below. This could be adapted to our needs.

I didn't know libzim comes with a cache. But it'll take some time for that code to be developed, so I think it's worth using this optimization.

I agree PhET is a problem. JS is unique to each experiment and shouldn't be cached. I'm sure we can find a way to distinguish.

Thanks for the detailed and thoughtful comments, which all seem very sensible.

Jaifroid · 2019-08-23T08:38:55Z

I've checked support in chrome-extension, and unfortunately it's not supported full stop. Chrome-extension has its own caching strategy that devs can use via chrome.storage, but this is only available in the background script, which seems like a huge complication. People seem to recommend using localStorage or indexedDB as an alternative.

Because localStorage is synchronous, it cannot be accessed from a Service Worker. IndexedDB is, however, asynchronous and apparently can be used inside Service Worker.

So the question is: is it worth adding a fallback of indexedDB caching in SW specifically for Chrome extension usage? I have a ready-made indexedDB function.

It should be noted that the Cache API does not need to be accessed from the SW. It is also available in the window object, and can be accessed from the main thread. However, it seemed to me to be neater to keep it all in the SW (also running on a separate process, so maybe faster).

Jaifroid · 2019-08-23T18:17:30Z

For now, I've added a (volatile) memory cache (assetsCache) to the code for excluded URL schemata. This works well to cache assets during a session, just like cssCache in the main thread, which should be a gain for Chrome extensions.

To test, see the comment in the new code (commit 5f37306) about changing example-extension to http and running the code on localhost. Remember to clear any existing Service Worker first, as it won't auto-update. You should then get messages in console log showing "[SW] Adding EXCLUDED schema URL...". Shown below is the snapshot of assetsCache (which is just a Map). Note we can't actually store a Response in a Map because Responses are special objects that get annulled easily (I wasn't successful storing and retrieving them). Instead I store the constituents of the Response we construct.

Jaifroid · 2019-08-23T18:19:03Z

Ah, the dreaded Firefox 45 Travis error...

mossroy · 2019-08-23T18:36:33Z

No, it's not necessary to add anything when inside the chrome-extension scheme, because Chromium already has a built-in in-memory cache (see #411 (comment)) when inside an extension.
It should even not go through the SW more than once, and bypass it to read its in-memory cache.
No need to make this more complicated.

I think we simply need to properly avoid the console error messages in this case

Jaifroid · 2019-08-23T18:41:27Z

Ah OK, I'd forgotten about that comment (haven't used Chrome extension much). So should I remove the memory chache I've added in the last commit? (It only took me 30mins or so to add it, so no great loss if you don't see a use for it.) The only use I can think of might be to speed up cache access generally.

Jaifroid · 2019-08-23T18:43:24Z

What I mean by that is that, when I was experimenting with IndexedDB last year, I found it was noticeably slower to access assets stored there than from cssCache. So I opted to keep cssCache for fast retrieval, and use IndexedDB to store assets between sessions. But I'm not sure it's the case with Cache API. It seems "fast enough".

mossroy · 2019-08-23T19:29:24Z

IMHO you should remove this last commit and only keep the Cache API (without introducing IndexedDB either)

Jaifroid · 2019-08-24T07:42:02Z

IMHO you should remove this last commit and only keep the Cache API (without introducing IndexedDB either)

Done. I've kept the structural improvements to the code: providing the relevant regexes at the top of the code with a comment for developers on what they do. It's better than having regexes buried further down the code which would be difficult to maintain.

NB this code excludes ^chrome-extension: from caching, so we don't get the console log message (as you suggested above) and unnecessary CPU cycles. It can easily be changed in the future if extensions start to support Cache. See comments at top of file.

Jaifroid · 2019-08-24T12:57:38Z

Just to say that it would be best for me to delay working on adding a cache control panel (like that above) until we've merged #527, because otherwise I'd have to do double work (Bootstrap 3 panels are rather different from Bootstrap 4 cards). But in due course it would be good to know whether that design seems right, and whether you'd be interested in having the "Return to last visited page when you open an archive" functionality. That functionality is useful, for example, in a NWJS app. It's less useful if the user still has to pick the archive, which is the case in most contexts. The Cache used field would show "Memory cache" for jQuery mode and "Cache API" for SW mode.

Jaifroid · 2019-08-28T07:21:38Z

I had some time yesterday evening, so added the cache evacuation infrastructure and UI anyway. The checkbox for remembering the last page between sessions is currently non-functional and can be removed if that functionality is not required, in which case the panel title would change from "Performance and privacy settings" to "Performance settings". Asset counting works for both the memory cache and Cache API, but only the assets in the respective cache being used are shown. However, on deleting assets, the sum of all assets deleted is displayed. Users might end up with both Cache API assets and memory cache assets if they switch from SW to jQuery mid-session, but clearing the caches will clear them both.

Jaifroid · 2019-08-28T07:26:20Z

As a side issue, the "random" Firefox 45 Travis failures must I guess be caused by a race condition somewhere in our code. Is there a way to debug Travis builds? I suppose the Chrome failure for #527 is also a race condition (but a different one).

mossroy · 2019-08-28T18:16:08Z

It would be better to discuss remembering the last page in a different PR

Jaifroid · 2019-08-28T20:58:18Z

It would be better to discuss remembering the last page in a different PR

OK, I've added issue #561 for this.

Jaifroid · 2019-08-29T06:00:20Z

Remember last page checkbox removed. This is what the interface looks like straight after the user turns caching off:

mossroy · 2019-09-04T12:25:58Z

Choose the wording you prefer, no problem.

About the cache status, my goal is mainly to avoid making the code more complicated, duplicated and take the risk to give a wrong information to the user.
Besides that, no "normal user" cares about how the cache behaves and is implemented. So it's no use duplicating code to give this kind of information.

Having an option to disable the cache can still be a good idea for debugging purpose (or in some edge cases), but it's still an "expert setting", and could be moved in the "expert settings" zone.

I understand your concern about code diverging between kiwix-js and kiwix-js-windows.
What about, in the case of kiwix-js-windows, using something like this on the right :

Cache usage (# of assets) : 
Memory cache : 18
Cache API : 23
IndexedDB : 12
LocalStorage : 26

When some API is unavailable, or unusable, you could simply put a "N/A" instead of the asset count
But maybe I'm missing something, as I don't really know what it is like (I did not see it on https://kiwix.github.io/kiwix-js-windows)

Regarding PhET, it's a good use case to test what happens when many assets are put in the cache (at some point, the browser might ask if the user wants to grow the cache or not). But we must not do anything that is PhET specific.
I don't currently see a generic and reliable way to distinguish the javascript files that should be kept for long (like the ones of wikimedia) and the ones that should not (like the ones of PhET).
Except by implementing our own LRU eviction strategy, or something similar.
But I think it's not necessary to do that for now. Let's see how it behaves.

Based on https://developers.google.com/web/fundamentals/instant-and-offline/web-storage/offline-for-pwa , the browsers have their own safe-guards when too much cache is used : it might be enough to rely on them.

Jaifroid · 2019-09-04T13:22:41Z

OK, I'll have a think about what can be done to simplify. As it is, we currently have three different caching strategies: "Memory", "Cache API", and "Custom" (Chrome-Extension, which is a black box). Unless we go for some automatic cache evacuation strategy, I'd like users to be aware that items are being cached potentially for a long time in the case of Cache API and that they can end up taking significant space.

Jaifroid · 2019-09-04T13:28:22Z

PS This system is not implemented in Kiwix JS Windows yet. My plan is to replace the file-based caching currently used there (hard to maintain) with this automatic one (much easier to maintain), when it's ready. But I also have to provide indexedDB for jQuery mode because of the lack of general support for Service Worker in UWP JS apps (SW support only exists for content served from https: , i.e. if I implement it as a PWA inside UWP).

Jaifroid · 2019-09-05T15:01:33Z

So this is my best attempt to prevent duplication of variables, regexes, and functions across app.js and SW, and to simplify the logic. The latest code works like this:

When app.js needs to display cache details, it queries the Service Worker, if it is available, to determine the attributes of the cache in use (type, description, assetCount);
If we are in jQuery mode, then it merely returns the attributes of cssCache (because it cannot presume that it can access SW, even if sometimes it technically could);
When user turns off caching, all caches are deleted (except the internal Chrome extension cache of course);
When user switches from jQuery mode to SW mode, cssCache is emptied, as it is no longer needed;
When user switches from SW mode to jQuery mode, CACHE is emptied, as it will no longer be used, and we should not have items taking significant space that are useless (and that the user may not be aware of) [I realize this is an assumption, but I think it's correct from a user perspective, as users will not switch modes frequently, and the app should not hog memory or storage space];
Therefore, the cache count only shows the count of the cache in use, or '-' if it cannot be detemined (i.e. Chrome extension). It would be useless to show any other counts, because they will always be 0;
CACHE is defined in app.js, and is passed to Service Worker early in its initialization. It has to be this way round, because otherwise we have no way to delete CACHE from app.js when Service Worker has been deactivated (we need the CACHE constant). But we only define CACHE once.

I think this simplifies the decision logic significantly. However, the amount of code in app.js has not decreased by much, because messaging the Service Worker and receiving a response is not completely trivial. The big benefit is that we no longer have to tell DEVs to ensure that duplicated CACHE definitions and regexes have to be kept in sync, because there is no duplication. So the main benefit is maintainability.

Please let me know what you think @mossroy.

PS If testing, please force refresh, then simple refresh your browser to be sure you've loaded the new Service Worker code, or else unregister the old SW in dev tools, exit browser and restart.

mossroy

Thanks for this refactoring. This removes all the duplicated code, which is great.
I've made a few comments but it looks good.

As we have an "API status" zone, maybe the availability of Cache API could be displayed there? (it's minor too)

mossroy · 2019-09-06T19:49:21Z

service-worker.js

+ * The value is defined in app.js and will be passed to Service Worker on initialization (to avoid duplication)
+ * @type {String}
+ */
+var CACHE;


The variable might be renamed to CACHE_NAME or something similar, to avoid assuming it's where the cache is stored

OK, will do.

mossroy · 2019-09-06T19:55:25Z

service-worker.js

+            fetchCaptureEnabled = false;
+        }
+    }
+    if (event.data.useCache) {


It might be good to use the same way of passing information between app.js and the SW.
I initially used the event.data.action variable to distinguish each kind of message app.js passes to the SW, while you test here on other variables.
Both ways are perfectly valid, but it would be better to use the same.
This is minor, and can be dealt with in another issue/PR

I did consider that, but the problem was (if I remember correctly) that event.data.action is a string (or at least it is tested as a string) and I wanted a psuedo-Boolean. However, I can think of a solution that uses event.data.action without having to have two code blocks, so will try that, and if it's easy, will do it in this PR.

It's really not crucial, and my initial implementation might not be the best

It's really not crucial, and my initial implementation might not be the best

It turns out to be very easy to do (I'm about to push a commit with it) and it does make sense to have all "actions" under this key.

service-worker.js

www/js/app.js

service-worker.js

…types

Jaifroid · 2019-09-07T09:54:40Z

Commit dd18a67 implements what was discussed in code review. Given that some changes arose from a small misunderstanding over return types, what's your view on whether a Promise should always be described as a {Promise} in a JSDoc return type annotation, or whether it is better, if we are returning an Object via a Promise to describe the return type as {Object} and then make clear in the description that it's via a Promise. I read people saying we should only use "Primitive JavaScript types", but that may be outdated, and I can't find JSDoc official advice.

Jaifroid · 2019-09-07T10:21:49Z

Last question: should I remove console.log lines (not console.error) from the code before merging, or is it useful to have them for a while and remove them later?

Jaifroid · 2019-09-07T11:18:59Z

It seems the syntax {Promise<T>} has become the accepted usage. (And VSCode JSLint recognizes this as valid.) See 301bee8 .

mossroy · 2019-09-07T12:57:32Z

Maybe we could keep the console.debug lines for now, and remove them later.
Regarding the way to write annotations for Promises, I don't know. Our existing code usually uses @return {Promise} followed by a textual description of what it will return.
If there is a more common and standardized way of doing it, please do (and maybe open another issue to refactor the existing code)

service-worker.js

Jaifroid · 2019-09-07T15:08:13Z

After a bit more testing (including in Chrome extension, but not Firefox extension because I don't know how), can I squash and merge?

mossroy · 2019-09-07T15:12:13Z

Yes you can

Jaifroid added the enhancement label Aug 22, 2019

Jaifroid added this to the v2.7 milestone Aug 22, 2019

Jaifroid requested a review from mossroy August 22, 2019 10:41

Jaifroid self-assigned this Aug 22, 2019

Jaifroid mentioned this pull request Aug 28, 2019

Optionally reopen last visited page if user loads the same ZIM #561

Open

Jaifroid force-pushed the native-caching-of-assets-in-SW branch from 8152392 to a0da824 Compare August 30, 2019 07:01

Jaifroid added 7 commits August 31, 2019 17:37

Add native asset caching to Service Worker

fe8bd3e

Place remaining regExp at head of file

bcb29a6

Add memory cache for excluded extensions

00adfa1

Remove memory cache

d3152f7

Add infrastructure and UI for Cache evacuation

3c5edc0

Remove remember last pages checkbox and controls

09cbf00

Adapt cache control panel to Bootstrap 4

ed6c8a6

Jaifroid force-pushed the native-caching-of-assets-in-SW branch from a0da824 to ed6c8a6 Compare September 1, 2019 12:37

Jaifroid mentioned this pull request Sep 5, 2019

Reading a property of null in some cases in SW mode #520

Closed

Jaifroid added 2 commits September 5, 2019 15:18

Deduplicate variables and functions across app.js and SW

b1c7a9c

Tidy up

4184cb3

Jaifroid added 4 commits September 5, 2019 16:09

Small tidy

24d90a2

Better comments and regexp naming

703a60a

More commenting

718264f

Minor simplification and spacing cleanup

8e3c373

mossroy approved these changes Sep 6, 2019

View reviewed changes

Jaifroid commented Sep 7, 2019

View reviewed changes

service-worker.js Show resolved Hide resolved

Change CACHE name, tidy event.data actions, report underlying return …

dd18a67

…types

Use {Promise<T>} JSDoc syntax

301bee8

Jaifroid mentioned this pull request Sep 7, 2019

Add the promised return type to JSDoc Promise annotations #568

Open

Add one meaningful @throws annotation and use CACHE_NAME in comments

97440ec

mossroy reviewed Sep 7, 2019

View reviewed changes

service-worker.js Outdated Show resolved Hide resolved

Remove @throws annotation and add rejects info to @returns

cfa1a75

Jaifroid merged commit 7226d62 into master Sep 7, 2019

Jaifroid deleted the native-caching-of-assets-in-SW branch September 7, 2019 15:55

Jaifroid mentioned this pull request Sep 7, 2019

Make use of the browser standard cache in ServiceWorker mode #411

Closed

mossroy mentioned this pull request Sep 9, 2019

Switch to ServiceWorker mode by default #196

Closed

Jaifroid mentioned this pull request Sep 10, 2019

Cache resources through localStorage/indexedDB on the backend side #415

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add native asset caching to Service Worker #556

Add native asset caching to Service Worker #556

Jaifroid commented Aug 22, 2019 •

edited

Loading

mossroy commented Aug 22, 2019

Jaifroid commented Aug 22, 2019

Jaifroid commented Aug 23, 2019

Jaifroid commented Aug 23, 2019

Jaifroid commented Aug 23, 2019

mossroy commented Aug 23, 2019

Jaifroid commented Aug 23, 2019

Jaifroid commented Aug 23, 2019

mossroy commented Aug 23, 2019

Jaifroid commented Aug 24, 2019

Jaifroid commented Aug 24, 2019

Jaifroid commented Aug 28, 2019 •

edited

Loading

Jaifroid commented Aug 28, 2019

mossroy commented Aug 28, 2019

Jaifroid commented Aug 28, 2019

Jaifroid commented Aug 29, 2019

mossroy commented Sep 4, 2019

Jaifroid commented Sep 4, 2019

Jaifroid commented Sep 4, 2019

Jaifroid commented Sep 5, 2019 •

edited

Loading

mossroy left a comment

mossroy Sep 6, 2019

Jaifroid Sep 7, 2019

mossroy Sep 6, 2019

Jaifroid Sep 7, 2019

mossroy Sep 7, 2019

Jaifroid Sep 7, 2019

Jaifroid commented Sep 7, 2019

Jaifroid commented Sep 7, 2019

Jaifroid commented Sep 7, 2019

mossroy commented Sep 7, 2019

Jaifroid commented Sep 7, 2019

mossroy commented Sep 7, 2019

Add native asset caching to Service Worker #556

Add native asset caching to Service Worker #556

Conversation

Jaifroid commented Aug 22, 2019 • edited Loading

mossroy commented Aug 22, 2019

Jaifroid commented Aug 22, 2019

Jaifroid commented Aug 23, 2019

Jaifroid commented Aug 23, 2019

Jaifroid commented Aug 23, 2019

mossroy commented Aug 23, 2019

Jaifroid commented Aug 23, 2019

Jaifroid commented Aug 23, 2019

mossroy commented Aug 23, 2019

Jaifroid commented Aug 24, 2019

Jaifroid commented Aug 24, 2019

Jaifroid commented Aug 28, 2019 • edited Loading

Jaifroid commented Aug 28, 2019

mossroy commented Aug 28, 2019

Jaifroid commented Aug 28, 2019

Jaifroid commented Aug 29, 2019

mossroy commented Sep 4, 2019

Jaifroid commented Sep 4, 2019

Jaifroid commented Sep 4, 2019

Jaifroid commented Sep 5, 2019 • edited Loading

mossroy left a comment

Choose a reason for hiding this comment

mossroy Sep 6, 2019

Choose a reason for hiding this comment

Jaifroid Sep 7, 2019

Choose a reason for hiding this comment

mossroy Sep 6, 2019

Choose a reason for hiding this comment

Jaifroid Sep 7, 2019

Choose a reason for hiding this comment

mossroy Sep 7, 2019

Choose a reason for hiding this comment

Jaifroid Sep 7, 2019

Choose a reason for hiding this comment

Jaifroid commented Sep 7, 2019

Jaifroid commented Sep 7, 2019

Jaifroid commented Sep 7, 2019

mossroy commented Sep 7, 2019

Jaifroid commented Sep 7, 2019

mossroy commented Sep 7, 2019

Jaifroid commented Aug 22, 2019 •

edited

Loading

Jaifroid commented Aug 28, 2019 •

edited

Loading

Jaifroid commented Sep 5, 2019 •

edited

Loading