Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wiktionary archives since 2023-02 contain a redirect which breaks out of iframe #1803

Closed
Jaifroid opened this issue Mar 3, 2023 · 11 comments · Fixed by #1811
Closed

Wiktionary archives since 2023-02 contain a redirect which breaks out of iframe #1803

Jaifroid opened this issue Mar 3, 2023 · 11 comments · Fixed by #1811
Assignees
Milestone

Comments

@Jaifroid
Copy link
Collaborator

Jaifroid commented Mar 3, 2023

Originally posted by @danielzgtg in kiwix/kiwix-js#972 (comment)

Kiwix-js no longer works with wiktionary_en_all_maxi_2023-02.zim. It was working fine with wiktionary_en_all_maxi_2022-09.zim. Now it just redirects to some other URL for the main page that doesn't have the search bar.

Editing the URL manually doesn't work around this. It goes to "404 Not Found nginx." when I replace Wiktionary%3AOffline with wiki. I also shows "404 Not Found" when I press the back button. My browser keeps on showing the page icon as the browser's loading animation for a long time before it switches to no icon at all. Pressing back at the original page goes back to the proper app with the search bar, but then it immediately redirects again. I need to close the tab and open another one to get something usable.

This affects ServiceWorker mode only. There is no redirect problem in JQuery mode and I am able to open articles. However, I do not like JQuery because it doesn't support my browser's CSS :visited, bfcache, nor Dark Reader extensions. This does not affect Android either.

I think the problem is that kiwix-js ServiceWorker mode is failing to intercept the redirect, while the other modes/apps were able to. > Here is a video of the problem:

2023-03-03.06-11-30.mp4
@Jaifroid
Copy link
Collaborator Author

Jaifroid commented Mar 3, 2023

Originally posted by @Jaifroid in kiwix/kiwix-js#972 (comment)

I've determined that this issue is not Kiwix JS specific. It affects https://library.kiwix.org/content/wiktionary_en_all_nopic_2023-02/ as well, where the ZIM breaks out of its frame. There is probably a rogue piece of JS that has crept in. I'll try to determine what, but I think I should move this issue to mwOffliner.

@danielzgtg
Copy link

danielzgtg commented Mar 3, 2023

While I agree that this needs to be fixed here in mwoffliner, I wouldn't say that kiwix-js doesn't need changes.

No other viewer except specifically kiwix-js ServiceWorker mode is having this problem. They tell new computer users to not click on suspicious links. The behavior of kiwix-js failing to remove the redirect is equivalent to making users' computers automatically click on links. That advice applies less to today's web as browsers have improved. Likewise I would prefer a strong sandbox in kiwix-js where I don't have to worry about whether the zim I open is malicious or not. Unless kiwix-js is based on loading each "piece of JS" that you are talking about and it's too hard to change?

This "rouge piece of JS" is likely the one at https://moz-extension.kiwix.org/current/wiktionary_en_all_maxi_2023-02.zim/-/mw/mediawiki.page.ready.js:

mw.loader.implement( "mediawiki.page.ready@", {
    "main": "ready.js",
    "files": {
    "ready.js": function ( require, module, exports ) {
var checkboxShift = require( './checkboxShift.js' );
var config = require( './config.json' );

// Break out of framesets
if ( mw.config.get( 'wgBreakFrames' ) ) {
	// Note: In IE < 9 strict comparison to window is non-standard (the standard didn't exist yet)
	// it works only comparing to window.self or window.window (https://stackoverflow.com/q/4850978/319266)
	if ( window.top !== window.self ) {
		// Un-trap us from framesets
		window.top.location.href = location.href;
	}
}

// ...

I tend to prefer client-side solutions as those would apply to all zims retroactively. It's a little past the 2 year anniversary of me opening #1033 originally for kiwix-android and that only got fixed this year. I expect a 3 month wait if this has to be fixed in mwoffliner and that still wouldn't make that zim usable in kiwix-js ServiceWorker like the other viewers.

@Jaifroid
Copy link
Collaborator Author

Jaifroid commented Mar 3, 2023

@danielzgtg Point taken! I agree that a rigorous sandbox is required, and I put the most rigorous one I could into the PWA version (pwa.kiwix.org), which blocks any access to content from the Web, as can be seen by looking at console.log for almost any ZIM opened (there are often font files, the odd svg, etc. that ZIMs try to access, especially Zimit ZIMs). However there was no consensus on my doing this for Kiwix JS when I last broached it.

Having said that, in this case a sandbox makes no difference: the JS doesn't attempt to navigate outside of the local origin, and no amount of sandboxing could stop this. Hence, three Kiwix apps that use iframes are affected:

  • Kiwix Serve (provided controls are broken out of)
  • Kiwix JS
  • Kiwix JS PWA

I can indeed put in a patch in both JS readers that gets rid of this code (and many thanks for identifying it), and this may be the only way to deal with it quickly, unless @kelson42 thinks this JS can be removed at source (which would be the cleanest option) in a relatively quick timescale? @kelson42 what is your advice?

@Jaifroid
Copy link
Collaborator Author

Jaifroid commented Mar 3, 2023

I opened kiwix/kiwix-js-pwa#376 to test a local patch for this, and re-opened kiwix/kiwix-js#972 so that other users noticing this error will see there is an issue raised.

@danielzgtg
Copy link

PWA version

The rogue redirect JS also affects pwa.kiwix.org

rigorous sandbox is required, [...] which blocks any access to content from the Web

Two hours later, I managed to break out of the sandbox to display an image from the web. That may waste users' data, and a malicious actor would much worse things.

I think we will have to settle with fixing non-malicious zims. A sandbox is only practical when enforced from C++/Java. A sandbox as Javascript code would either be leaky or very slow. I spend hours tying to Object.freeze(window.location) and Object.defineProperty in Tampermonkey, had to resort to <iframe sandbox="..."> when that didn't work.

Patching to disable specific scripts looks ugly, but it seems like it's the only viable solution. Unless you disable all scripts or have C++/Java, scripts can always be crafted to evade detection in a cat-and-mouse game. It will fix the non-malicious zims though. Removing it at source in addition to that is also nice at it will improve compression.

no amount of sandboxing

There is something at kiwix/kiwix-js#972 (comment) . Ideally we need to completely restrict window.top.

@Jaifroid
Copy link
Collaborator Author

Jaifroid commented Mar 4, 2023

@danielzgtg Regarding what is relevant for fixing the ZIM at source, the most obvious thing would be not to scrape that script, which would be relatively simple, as it is not a script that does anything that we need in an offline ZIM, and is in fact harmful (in terms of breaking JS-based controls that use iframes), as you have diagnosed.

The other issues you mention about sandboxing are relevant to specific readers/clients, so let's discuss those in the appropriate issues. NB I have a fix for the PWA version that I'm testing, but will let you know here when I have a publicly accessible build.

@kelson42
Copy link
Collaborator

kelson42 commented Mar 4, 2023

@pavel-karatsiuba Is there an elegant way to solve this without breaking other scrapes with other Mediawiki instances?

@pavel-karatsiuba
Copy link
Contributor

@kelson42
Adding a sandbox attribute to the iframe is the best way.
From mwoffliner side, I can simply replace window.top !== window.self with false. We doing the same with hackStartUpModule function.

@kelson42
Copy link
Collaborator

@kelson42 where exactly is this special behaviour comming from? How is that related to startup module?

@pavel-karatsiuba
Copy link
Contributor

@kelson42
The code with redirects coming from the script is "mediawiki.page.ready.js".
It is not related to the startup module but we can use the same function as we use for the startup module.
We can extend the condition and execute the function here: https://github.com/openzim/mwoffliner/blob/main/src/util/dump.ts#L124

@kelson42
Copy link
Collaborator

@pavel-karatsiuba Please rename the function has it is now more generic and handle the case. Just to be sure: how many times this function is called within a scrape?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants