Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add native asset caching to Service Worker #556

Merged
merged 26 commits into from
Sep 7, 2019

Conversation

Jaifroid
Copy link
Member

@Jaifroid Jaifroid commented Aug 22, 2019

This addresses #441 #414. It adds the native Cache API to service-worker.js for the caching of css and js assets. It significantly speeds up SW mode. The cached entries can be checked in the Cache entry of devtools in Chromium or Edge (I didin't try with Firefox). Example of running this on the most recent Ray Charles is shown in the image below (from Chromium / Edge 78). This is not for merging yet, but I would appreciate testing and any comments.

image

I think the main TODO is to provide a mechanism for deleting / pruning entries. We could just do it on change of ZIM, but as the quota on most devices is quite generous, and as the entries are unique to each ZIM, a more subtle mechanism could be found. We could provide usage statistics on the Configuration page and a button to clear the cache if it is quite large, for example. This would need to be decided.

Info for testing: To delete the cache and Service Worker after testing (in Chromium), delete kiwixjs-cache (right-click and delete) and unregister the Service Worker (under Application -> Service Workers ... ).

If there are code updates, a similar procedure would need to be followed, then a full page refresh (e.g. Ctrl-Shift-R), before testing again.

@Jaifroid Jaifroid added this to the v2.7 milestone Aug 22, 2019
@Jaifroid Jaifroid requested a review from mossroy August 22, 2019 10:41
@Jaifroid Jaifroid self-assigned this Aug 22, 2019
@mossroy
Copy link
Contributor

mossroy commented Aug 22, 2019

I suppose this addresses #414, not #441 (wrong link in the description).
As you know, I believe the use of emscripten version of kiwix-lib/libzim is the right way to improve performance (and libzim already has its own cache).
On the other hand, implementing #414 might be easy, maybe easier than #411 or #415. And it could still be useful with kiwix-lib/libzim.
So why not, if it's easy to unplug later (it seems to be the case), if we feel it becomes useless in the future.

It's hard to see the changes in github, that tends to say that everything has been modified.
But I saw the most important parts, which look good.

I quickly tested on Firefox, and it seems to work well too (I see the cache content in devtools, and the log properly says it uses it). This is promising.
I tested in a Chromium extension, but it gives some errors :

Uncaught (in promise) TypeError: Request scheme 'chrome-extension' is unsupported
at service-worker.js:192

It fails because we can not cache.put a request with this scheme.
I don't know if it would be the same in Firefox (with a schema 'moz-extension') because, for now, the SW mode is not available at all with Firefox extensions.
I don't know what is the best way to handle that : blacklist this scheme in the SW cache code? But maybe Google might allow it in the future, who knows.
If we put a whitelist instead, for specific schemes (like file, http and https), we might miss the opportunity to activate the cache if Mozilla allows it (or in some other context).
Maybe we could trap the error, keep inside the SW a list of schemes that failed, add an info log, and then do not attempt to cache any of these schemes. But maybe it becomes a bit complicated.

The Cache API is not available on all platforms, see https://developer.mozilla.org/en-US/docs/Web/API/Cache#Browser_compatibility. I think we don't have any supported platform that supports SW, but not Cache API. But, to be cleaner, it would be better to test for this API availability before using it. If it's not available, we can simply fallback to standard behavior (without cache), and put an information message in the console.

Regarding the cache eviction strategy, things are quite open.
I agree that, in any case, we should add a button in settings to manually clear the SW cache. If it's easy to add the statistics, it would be great.
I also agree that we can keep cache from different ZIM files (not necessary to clear the cache on ZIM change)
If we use the cache only for javascript and CSS, maybe it's not necessary to add an eviction strategy, and simply rely on the standard strategy of the browser (it can clear the cache if it feels it's necessary, and can ask the user if he allows to use more cache in case it becomes big). Because these files are not numerous and won't use a lot of space, it might be enough. Except for PhET, that we should test more.

But we also might activate the cache for some small images (that could be useful for some icons), for which a specific eviction strategy could be used (with a smaller available space). But well, that could be for later, and maybe too complicated for the need.

@Jaifroid
Copy link
Member Author

I'm pleased you're interested in this, @mossroy ! It really does make a relatively easy performance jump in SW mode, while we wait for #116. Up till now, jQuery mode has been much faster than SW mode, due to the fact that we have memory caching for jQuery, but none for SW (and also because SW decompresses scripts). This equalizes things, and makes using SW mode much more feasible for daily use.

It's a real shame that the Chrome Extension schema doesn't work with Cache, as Chrome Extension is a major use-case for this app. I wonder if there's a solution for that, or whether it's just outright not supported. It's worth investigating.

Regarding an interface to control the Cache, I did develop one when I worked on a comprehensive cache based on indexDB, and it looks like the image below. This could be adapted to our needs.

I didn't know libzim comes with a cache. But it'll take some time for that code to be developed, so I think it's worth using this optimization.

I agree PhET is a problem. JS is unique to each experiment and shouldn't be cached. I'm sure we can find a way to distinguish.

Thanks for the detailed and thoughtful comments, which all seem very sensible.

image

@Jaifroid
Copy link
Member Author

I've checked support in chrome-extension, and unfortunately it's not supported full stop. Chrome-extension has its own caching strategy that devs can use via chrome.storage, but this is only available in the background script, which seems like a huge complication. People seem to recommend using localStorage or indexedDB as an alternative.

Because localStorage is synchronous, it cannot be accessed from a Service Worker. IndexedDB is, however, asynchronous and apparently can be used inside Service Worker.

So the question is: is it worth adding a fallback of indexedDB caching in SW specifically for Chrome extension usage? I have a ready-made indexedDB function.

It should be noted that the Cache API does not need to be accessed from the SW. It is also available in the window object, and can be accessed from the main thread. However, it seemed to me to be neater to keep it all in the SW (also running on a separate process, so maybe faster).

@Jaifroid
Copy link
Member Author

For now, I've added a (volatile) memory cache (assetsCache) to the code for excluded URL schemata. This works well to cache assets during a session, just like cssCache in the main thread, which should be a gain for Chrome extensions.

To test, see the comment in the new code (commit 5f37306) about changing example-extension to http and running the code on localhost. Remember to clear any existing Service Worker first, as it won't auto-update. You should then get messages in console log showing "[SW] Adding EXCLUDED schema URL...". Shown below is the snapshot of assetsCache (which is just a Map). Note we can't actually store a Response in a Map because Responses are special objects that get annulled easily (I wasn't successful storing and retrieving them). Instead I store the constituents of the Response we construct.

image

@Jaifroid
Copy link
Member Author

Ah, the dreaded Firefox 45 Travis error...

@mossroy
Copy link
Contributor

mossroy commented Aug 23, 2019

No, it's not necessary to add anything when inside the chrome-extension scheme, because Chromium already has a built-in in-memory cache (see #411 (comment)) when inside an extension.
It should even not go through the SW more than once, and bypass it to read its in-memory cache.
No need to make this more complicated.

I think we simply need to properly avoid the console error messages in this case

@Jaifroid
Copy link
Member Author

Ah OK, I'd forgotten about that comment (haven't used Chrome extension much). So should I remove the memory chache I've added in the last commit? (It only took me 30mins or so to add it, so no great loss if you don't see a use for it.) The only use I can think of might be to speed up cache access generally.

@Jaifroid
Copy link
Member Author

What I mean by that is that, when I was experimenting with IndexedDB last year, I found it was noticeably slower to access assets stored there than from cssCache. So I opted to keep cssCache for fast retrieval, and use IndexedDB to store assets between sessions. But I'm not sure it's the case with Cache API. It seems "fast enough".

@mossroy
Copy link
Contributor

mossroy commented Aug 23, 2019

IMHO you should remove this last commit and only keep the Cache API (without introducing IndexedDB either)

@Jaifroid
Copy link
Member Author

IMHO you should remove this last commit and only keep the Cache API (without introducing IndexedDB either)

Done. I've kept the structural improvements to the code: providing the relevant regexes at the top of the code with a comment for developers on what they do. It's better than having regexes buried further down the code which would be difficult to maintain.

NB this code excludes ^chrome-extension: from caching, so we don't get the console log message (as you suggested above) and unnecessary CPU cycles. It can easily be changed in the future if extensions start to support Cache. See comments at top of file.

@Jaifroid
Copy link
Member Author

Just to say that it would be best for me to delay working on adding a cache control panel (like that above) until we've merged #527, because otherwise I'd have to do double work (Bootstrap 3 panels are rather different from Bootstrap 4 cards). But in due course it would be good to know whether that design seems right, and whether you'd be interested in having the "Return to last visited page when you open an archive" functionality. That functionality is useful, for example, in a NWJS app. It's less useful if the user still has to pick the archive, which is the case in most contexts. The Cache used field would show "Memory cache" for jQuery mode and "Cache API" for SW mode.

@Jaifroid
Copy link
Member Author

Jaifroid commented Aug 28, 2019

I had some time yesterday evening, so added the cache evacuation infrastructure and UI anyway. The checkbox for remembering the last page between sessions is currently non-functional and can be removed if that functionality is not required, in which case the panel title would change from "Performance and privacy settings" to "Performance settings". Asset counting works for both the memory cache and Cache API, but only the assets in the respective cache being used are shown. However, on deleting assets, the sum of all assets deleted is displayed. Users might end up with both Cache API assets and memory cache assets if they switch from SW to jQuery mid-session, but clearing the caches will clear them both.

@Jaifroid
Copy link
Member Author

As a side issue, the "random" Firefox 45 Travis failures must I guess be caused by a race condition somewhere in our code. Is there a way to debug Travis builds? I suppose the Chrome failure for #527 is also a race condition (but a different one).

@mossroy
Copy link
Contributor

mossroy commented Aug 28, 2019

It would be better to discuss remembering the last page in a different PR

@Jaifroid
Copy link
Member Author

It would be better to discuss remembering the last page in a different PR

OK, I've added issue #561 for this.

@Jaifroid
Copy link
Member Author

Remember last page checkbox removed. This is what the interface looks like straight after the user turns caching off:

image

@Jaifroid Jaifroid force-pushed the native-caching-of-assets-in-SW branch from 8152392 to a0da824 Compare August 30, 2019 07:01
@Jaifroid Jaifroid force-pushed the native-caching-of-assets-in-SW branch from a0da824 to ed6c8a6 Compare September 1, 2019 12:37
@mossroy
Copy link
Contributor

mossroy commented Sep 4, 2019

Choose the wording you prefer, no problem.

About the cache status, my goal is mainly to avoid making the code more complicated, duplicated and take the risk to give a wrong information to the user.
Besides that, no "normal user" cares about how the cache behaves and is implemented. So it's no use duplicating code to give this kind of information.

Having an option to disable the cache can still be a good idea for debugging purpose (or in some edge cases), but it's still an "expert setting", and could be moved in the "expert settings" zone.

I understand your concern about code diverging between kiwix-js and kiwix-js-windows.
What about, in the case of kiwix-js-windows, using something like this on the right :

Cache usage (# of assets) : 
Memory cache : 18
Cache API : 23
IndexedDB : 12
LocalStorage : 26

When some API is unavailable, or unusable, you could simply put a "N/A" instead of the asset count
But maybe I'm missing something, as I don't really know what it is like (I did not see it on https://kiwix.github.io/kiwix-js-windows)

Regarding PhET, it's a good use case to test what happens when many assets are put in the cache (at some point, the browser might ask if the user wants to grow the cache or not). But we must not do anything that is PhET specific.
I don't currently see a generic and reliable way to distinguish the javascript files that should be kept for long (like the ones of wikimedia) and the ones that should not (like the ones of PhET).
Except by implementing our own LRU eviction strategy, or something similar.
But I think it's not necessary to do that for now. Let's see how it behaves.

Based on https://developers.google.com/web/fundamentals/instant-and-offline/web-storage/offline-for-pwa , the browsers have their own safe-guards when too much cache is used : it might be enough to rely on them.

@Jaifroid
Copy link
Member Author

Jaifroid commented Sep 4, 2019

OK, I'll have a think about what can be done to simplify. As it is, we currently have three different caching strategies: "Memory", "Cache API", and "Custom" (Chrome-Extension, which is a black box). Unless we go for some automatic cache evacuation strategy, I'd like users to be aware that items are being cached potentially for a long time in the case of Cache API and that they can end up taking significant space.

@Jaifroid
Copy link
Member Author

Jaifroid commented Sep 4, 2019

PS This system is not implemented in Kiwix JS Windows yet. My plan is to replace the file-based caching currently used there (hard to maintain) with this automatic one (much easier to maintain), when it's ready. But I also have to provide indexedDB for jQuery mode because of the lack of general support for Service Worker in UWP JS apps (SW support only exists for content served from https: , i.e. if I implement it as a PWA inside UWP).

@Jaifroid
Copy link
Member Author

Jaifroid commented Sep 5, 2019

So this is my best attempt to prevent duplication of variables, regexes, and functions across app.js and SW, and to simplify the logic. The latest code works like this:

  • When app.js needs to display cache details, it queries the Service Worker, if it is available, to determine the attributes of the cache in use (type, description, assetCount);
  • If we are in jQuery mode, then it merely returns the attributes of cssCache (because it cannot presume that it can access SW, even if sometimes it technically could);
  • When user turns off caching, all caches are deleted (except the internal Chrome extension cache of course);
  • When user switches from jQuery mode to SW mode, cssCache is emptied, as it is no longer needed;
  • When user switches from SW mode to jQuery mode, CACHE is emptied, as it will no longer be used, and we should not have items taking significant space that are useless (and that the user may not be aware of) [I realize this is an assumption, but I think it's correct from a user perspective, as users will not switch modes frequently, and the app should not hog memory or storage space];
  • Therefore, the cache count only shows the count of the cache in use, or '-' if it cannot be detemined (i.e. Chrome extension). It would be useless to show any other counts, because they will always be 0;
  • CACHE is defined in app.js, and is passed to Service Worker early in its initialization. It has to be this way round, because otherwise we have no way to delete CACHE from app.js when Service Worker has been deactivated (we need the CACHE constant). But we only define CACHE once.

I think this simplifies the decision logic significantly. However, the amount of code in app.js has not decreased by much, because messaging the Service Worker and receiving a response is not completely trivial. The big benefit is that we no longer have to tell DEVs to ensure that duplicated CACHE definitions and regexes have to be kept in sync, because there is no duplication. So the main benefit is maintainability.

Please let me know what you think @mossroy.

PS If testing, please force refresh, then simple refresh your browser to be sure you've loaded the new Service Worker code, or else unregister the old SW in dev tools, exit browser and restart.

Copy link
Contributor

@mossroy mossroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this refactoring. This removes all the duplicated code, which is great.
I've made a few comments but it looks good.

As we have an "API status" zone, maybe the availability of Cache API could be displayed there? (it's minor too)

* The value is defined in app.js and will be passed to Service Worker on initialization (to avoid duplication)
* @type {String}
*/
var CACHE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable might be renamed to CACHE_NAME or something similar, to avoid assuming it's where the cache is stored

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will do.

fetchCaptureEnabled = false;
}
}
if (event.data.useCache) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to use the same way of passing information between app.js and the SW.
I initially used the event.data.action variable to distinguish each kind of message app.js passes to the SW, while you test here on other variables.
Both ways are perfectly valid, but it would be better to use the same.
This is minor, and can be dealt with in another issue/PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider that, but the problem was (if I remember correctly) that event.data.action is a string (or at least it is tested as a string) and I wanted a psuedo-Boolean. However, I can think of a solution that uses event.data.action without having to have two code blocks, so will try that, and if it's easy, will do it in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really not crucial, and my initial implementation might not be the best

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really not crucial, and my initial implementation might not be the best

It turns out to be very easy to do (I'm about to push a commit with it) and it does make sense to have all "actions" under this key.

service-worker.js Outdated Show resolved Hide resolved
www/js/app.js Outdated Show resolved Hide resolved
@Jaifroid
Copy link
Member Author

Jaifroid commented Sep 7, 2019

Commit dd18a67 implements what was discussed in code review. Given that some changes arose from a small misunderstanding over return types, what's your view on whether a Promise should always be described as a {Promise} in a JSDoc return type annotation, or whether it is better, if we are returning an Object via a Promise to describe the return type as {Object} and then make clear in the description that it's via a Promise. I read people saying we should only use "Primitive JavaScript types", but that may be outdated, and I can't find JSDoc official advice.

@Jaifroid
Copy link
Member Author

Jaifroid commented Sep 7, 2019

Last question: should I remove console.log lines (not console.error) from the code before merging, or is it useful to have them for a while and remove them later?

@Jaifroid
Copy link
Member Author

Jaifroid commented Sep 7, 2019

It seems the syntax {Promise<T>} has become the accepted usage. (And VSCode JSLint recognizes this as valid.) See 301bee8 .

@mossroy
Copy link
Contributor

mossroy commented Sep 7, 2019

Maybe we could keep the console.debug lines for now, and remove them later.
Regarding the way to write annotations for Promises, I don't know. Our existing code usually uses @return {Promise} followed by a textual description of what it will return.
If there is a more common and standardized way of doing it, please do (and maybe open another issue to refactor the existing code)

service-worker.js Outdated Show resolved Hide resolved
@Jaifroid
Copy link
Member Author

Jaifroid commented Sep 7, 2019

After a bit more testing (including in Chrome extension, but not Firefox extension because I don't know how), can I squash and merge?

@mossroy
Copy link
Contributor

mossroy commented Sep 7, 2019

Yes you can

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants