-
-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix incorrect processing of titles with question marks and anchor parameters #807
Fix incorrect processing of titles with question marks and anchor parameters #807
Conversation
Now added processing of anchor parameters on links in JQuery mode. Tested on many titles with question marks and quotation marks, both single and double. Tested on links with anchor parameter (e.g. ../An_article#Heading_to_jump_to) in both SW and JQuery modes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Congratulations for this PR.
I know these issues can be quite difficult to handle.
But I would suggest to try to use the URL object https://developer.mozilla.org/en-US/docs/Web/API/URL instead of regular expressions where/when it's possible (for sure in SW mode, maybe in jQuery mode too).
It would certainly be safer, and handle all cases.
For IE11 (that does not support this object properly), we could fallback to the regexp. And never mind if some edge cases are not properly supported for IE11 (let's not waste time on it)
I've been experimenting a bit with the URL option. It's not as clever as we might like to think. We're supposed to feed it a bare URL string, and it will return a parsed and encoded URL according to RFC3986. So I tried our problematic URL, and we get: It's reproduced the error we were trying to avoid, and it believes So we have to encode the URL, and then decode it: NB This constructor fails with the entity references that our In sum, I can work with this constructor, though it is not doing anything more sophisticated than finding everything after a |
I'm not surprised it's expecting an encoded URL (else it would have no way to see the difference between a question mark for the querystring and a question mark in the path/filename). Wouldn't be enough to do the following?
(untested) My concern is to make sure we don't miss edge cases like the one you found with encoded characters (ex: ' ). Are there any other ones we did not think of? Using the built-in object of the browser makes sure all are covered. But if it's becoming too complicated, never mind. NB : I don't mind if we keep the regexp for jQuery mode (for IE11 compatibility), and use URL object for SW mode |
I agree, and I'm testing a solution along these lines. |
OK, after many frustrating tests trying to create a universal function that could be used in the same way in SW mode and in jQuery mode, I came to the conclusion that the ONLY place that the In jQuery mode, we are working with relative URLs and ZIM URLs, and it is unreliable and complicated to construct artificial fully qualified URLs, since we are inside an iframe which has a fixed So I now agree with you that the most elegant solution is to use the URL function in the Service Worker, which is in fact very simple, and to use our road-tested regexes in jQuery mode. This is what I've done in the latest. I've finessed the stripping of or conserving of the querystring in the SW as well. I had been stripping it for the APP_:CACHE, but this also stripped them for ZIM assets, which is incorrect, given that a querystring on the end of a JS file can be significant. They are now being correctly conserved (there is a test case in StackExchange ZIMs that use MathJax, where a MathJax JS file requires a significant querystring and so it could not be found in the cache before). |
Just to explain why this doesn't work in jQuery mode (or with a dummy URL): because we lose the path information if we add the relative URL on to this dummy. It gets stripped back to a bare ZIM URL in the best scenario, and sometimes loses the namespace in the worst scenario. We in fact take advantage of the relative URL calculations to derive the ZIM URL once we've clicked on a link (but the dummy is added to the base URL, not to the asset URL). However, it is onerous to do it for every link in an article. In SW, we have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
And thanks for the unit test, too.
I did not test, but the code looks good
This looks good. |
I just want to do some careful manual testing in both modes and and different browsers, and some inspecting of the caches produced in SW mode to be sure it's all working as intended, as the code dependency is a bit tricky, as we've seen from the number of errors we've had to correct over the years coming from non-obvious gotchas in URLs.. It shouldn't take too long. Then, yes, I agree we can continue with the checklist at #802. I'll also need to add some changelog entries for what this PR does. It fixes two or three things to do with URL parameters, including scrolling to URLs with anchors in jQuery mode. |
My testing has detected an issue with some fonts that are being requested in jQuery mode that are not being detected as ZIM assets and so are being trapped as if they were requests for the app's own files. I need to sort this out before merging this. It's not hugely serious but is causing a lot of notified failures in console.debug. UPDATE: This is something of an edge case. It concerns the PWA (not within a Firefox extension). If a user had been running it in SW mode, but has switched to JQuery mode, then the Service Worker continues to be available to serve the app's own code in case the app is offline (this is basic PWA functionality). If, additionally, the ZIM contains incorrectly referenced assets, or assets that we don't correctly intercept in JQuery moe (e.g. assets referenced inside CSS stylesheets, which we currently don't process), then the Service Worker will see these as Fetch requests, and because they are not prefixed with the ZIM name, it will pass them through and attempt to fetch them from the cache or the Network. The responses, of course, fail. I've put a block on these failed responses being cached, by testing |
It is indeed better in terms of UX to cache a 404 response, as it prevents a slow Network request (slow to fail) when that asset is requested again. The symptom, when the app is offline, is that update of the UI is delayed and slow while the browser is waiting for Network. If we have cached the 404 response, then we retrieve the 404 much faster when the app is offline. To be clear, this issue only affects the PWA if the user switches to jQuery mode, and the Service Worker tries to retrieve an unprocessed asset from the Network (unprocessed because we do not currently process ZIM assets in CSS files, for example). Also, this will not affect the correct extraction of these assets in Service Worker mode, because in that mode the asset URL is prefixed with the ZIM archive's name. Caching 404s from the Network will also not affect the app's own code (APP_CACHE assets) because if one of those fails to fetch when the Service Worker is registered, then the caching of the entire app will throw and report an error (this is part of the PWA spec). So, the changes in d88b523 appear safe to me. But let me know if you have concerns. And let me know if it's OK to merge (pending final tests). |
Apologies, @mossroy, but I'm getting cold feet about caching 404 responses to work around an edge case. I'm worried about unintended consequences, and I note that the Cache API specification bars 404 responses from being cached when the It just feels wrong and is also out of scope of this PR, so I'm going to revert d88b523. |
eb1a119
to
e4503f0
Compare
Interesting error exposed by the rebase of this PR. We are not dealing correctly with a colon in the domain + port, at least in jQuery mode. It might explain #790. Or could be an artefact introduced by rebase. |
OK, this turned out to be nothing. It's just the way Chromium displays the failed network requests for these fonts, which are not processed by us at all. So, I've rebased this on master and have tidied up a bit, and as far as I'm concerned, it's ready to squash and merge. Tested multiple times on Edge Chromium, Firefox and IE11, and as a Chromium extension, both for titles with question marks in and for hash anchors, which with this PR now work in both jQuery and SW modes. @mossroy Let me know if you're happy for me to do squash/merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can merge
Fixes #806.
This initial fix needs very thorough testing.