Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In ServiceWorker mode, the external links are opened inside the iframe #404

Closed
mossroy opened this issue Aug 5, 2018 · 24 comments · Fixed by #862
Closed

In ServiceWorker mode, the external links are opened inside the iframe #404

mossroy opened this issue Aug 5, 2018 · 24 comments · Fixed by #862
Labels
Milestone

Comments

@mossroy
Copy link
Contributor

mossroy commented Aug 5, 2018

Some websites refuse to be opened inside an iframe, which leaves a blank page, and raises an error in the console like :
Refused to display 'https://www.nytimes.com/1984/01/19/us/carter-leaves-hospital.html' in a frame because it set 'X-Frame-Options' to 'sameorigin'.

The external links should be opened in a new tab (like in jQuery mode).

Using a <base target="_blank" /> would probably work, but I suppose it would also apply to internal links, which is not what we want.
I'd like to avoid parsing the DOM or using regexp on the HTML content, but don't see a better way for now...

@mossroy mossroy added this to the v2.4 milestone Aug 5, 2018
@mossroy mossroy added the bug label Aug 12, 2018
@mossroy mossroy modified the milestones: v2.4, v2.5 Sep 22, 2018
@Jaifroid
Copy link
Member

Jaifroid commented Sep 27, 2018

If we think of the extraction engine as a server that can have a rewrite engine or can do server-side parsing, then maybe it's not so awful to filter the html between extraction and posting to Service Worker. It would certainly be efficient and easy. No need to wait till the DOM is loaded:

html = html.replace(/(href\s*=\s*["']https?:\/\/[^"']+["'])/ig, '$1 target="_blank"');

[Code not tested - probably need to add <a\s[^>]* in front to make it specific to anchor links]

@mossroy mossroy modified the milestones: v2.5, v2.6 Jan 9, 2019
@mossroy mossroy modified the milestones: v2.6, v2.7 Jul 14, 2019
@Jaifroid
Copy link
Member

I would just repeat what I wrote above, that IMHO it is perfectly legitimate to think of the extraction engine as a traditional httpd server, which has server-side filtering and transformation options. There are only two solutions to this issue: use a regex on the html between extraction and sending to the Service Worker, or add an on-click event to every hyperlink in the onload event we have defined for the iframe (for Service Worker mode).

@mossroy
Copy link
Contributor Author

mossroy commented Sep 29, 2019

@kelson42 : how is it handled in other kiwix implementations?
On Android, it detects external links and displays a warning message when you click on one.
On Desktop (v2.0), it opens them in an external brower.
How do they detect that?
Are they parsing and modifying the HTML string (first solution suggested by @Jaifroid)?
Do they add javascript events on hyperlinks when the page is loaded (second solution suggested by @Jaifroid)?
Is the WebView (or equivalent) capable of detecting such cases, and gives a way to trap them?
Or some other way?

@Jaifroid
Copy link
Member

I've just seen https://developer.mozilla.org/en-US/docs/Web/API/Clients/openWindow , which might work to process this inside Service Worker (since it's a SW issue)?

@mossroy
Copy link
Contributor Author

mossroy commented Sep 29, 2019

Not sure this API is available in a SW, as it can not access the DOM.
Another complication is to be able to distinguish calls to a web page (which we would like to open in a new tab) and potential calls to external resources like fonts, images, css, js (for which we should decide what to do, but we would not open a new tab)

@Jaifroid
Copy link
Member

Jaifroid commented Sep 29, 2019

It does say that function is part of the Serviceworker API, though may not be available on all browsers (Edge just has a question mark, so I supposed it would need testing). I agree that testing the URL of the Fetch event to decide if it's an external request would not be 100% safe, but it'd probably be no different if we were to use one of the other methods.

EDIT: Though I could see it would be very annoying if we got it wrong, as Windows could start opening all over the place....

@mossroy
Copy link
Contributor Author

mossroy commented Sep 29, 2019

Interesting!
If it works, using this openWindow API would be much better than the first 2 methods, IMHO.

@mossroy mossroy modified the milestones: v2.7, v2.8 Mar 29, 2020
@mossroy mossroy modified the milestones: v2.8, v2.9 Jul 11, 2020
@Jaifroid Jaifroid modified the milestones: v3.0, v3.1 Oct 3, 2020
@Jaifroid Jaifroid modified the milestones: v3.1, v3.2 Dec 6, 2020
@Jaifroid
Copy link
Member

Jaifroid commented Jun 10, 2021

Maybe this has to be fixed before any of the other development on #196? I'm not so convinced that we can use the openWindow API here given the restriction that the URL must be of the same origin as the calling script, and I'm not sure how it could trap a link to an external URL anyway (it would be out of the SW's scope).

Another possibility: we could trap the onbeforeunload event, which would throw up a "Are you sure you want to leave this page?" prompt, but this would only work if the Event gives details of the URL the user wants to navigate to (so that we don't also trap internal links -- surely this is possible). Also, you can't customize the prompt in modern browsers.

One further possibility comes to mind: trap any click on the iframe, and analyse what is being clicked. It avoids touching the DOM, but it does introduce a "layer" between the DOM and the user.

To sum up, I can see three methods:

  1. Preprocessing: change the HTML string with a regex inside the Service Worker (probably safe for Wikipedia, may not be safe for other ZIMs)
  2. Postprocessing: analyse the user's click on the iframe (not much different from 1, but doesn't change DOM, still not necessarily safe for all ZIMs which might do their own thing with clicks)
  3. Navigate away prompt: (if we can get the requested URL) - gives user a chance to back out if navigating away from the ZIM, but prompt could be considered annoying.

Any preference, @mossroy? (I am guessing 3!)

@mossroy
Copy link
Contributor Author

mossroy commented Jun 11, 2021

  • "1. Preprocessing" looks bad. Because it's not safe outside of known ZIM file origins (as you said), and also because it can not handle cases of dynamically created links (in javascript) or dynamic URL changes (in javascript, too)
  • "2. Postprocessing" is much safer than 1. Because it should trap all clicks (even if the link has been created by javascript code) and can not break the HTML. I don't know if there's is a bullet-proof way to add this "layer" in terms of event order (in the case the ZIM content also handles this event and does a stopPropagation(), see https://www.quirksmode.org/js/events_order.html). And this method would also miss cases like <a href="#" onclick="window.location='http://anotherdomain'"> (the event would be trapped but we would not be able to detect that it will open a URL from another domain), or other clicks that would not be done on a <a> tag, but would trigger javascript execution leading to changing the window location.
  • Regarding "3. Navigate away prompt", I'm not sure we can get the requested URL (to know it's another domain). See https://stackoverflow.com/questions/1686687/how-can-i-get-the-destination-url-for-the-onbeforeunload-event or https://stackoverflow.com/questions/7162300/how-can-i-detect-when-the-user-is-leaving-my-site-not-just-going-to-a-different (which comes back to method 2, where we try to catch all the clicks)

Sorry, all this are only bad news and I don't see a good way to handle this for now.

@mossroy
Copy link
Contributor Author

mossroy commented Jun 11, 2021

@kelson42 : could you tell us how this his handled in other kiwix clients? See #404 (comment)

@Jaifroid
Copy link
Member

Jaifroid commented Jun 12, 2021

OK, thanks for the info on 3. I think that effectively rules it out.

On 2, I don't think there's any danger of contents in the ZIM handling the click (or mousedown/mouseup) events of the iframe, because there is no way the contents can (or should) know that it is inside a frame. The issue would be whether we can handle the event first and then release it or not to the content scripts. We can add the event either during the capture phase or the bubble phase, which I understand to be top-down and bottom-up respectively (this may be a simplistic understanding). We probably want the capture phase.

I get that we would miss click events that have been added dynamically to contents that are not, say, a or area, but we have to think what the point of adding this check would be. IMHO, it's a courtesy to the user who might get very annoyed that an external link has taken them away from the application, costing them time to get back. Is it better to be helpful most of the time, and miss a few edge cases, or never to be helpful?!

Obviously, it would only be helpful if we can do it in a way that does not interfere.

@mossroy
Copy link
Contributor Author

mossroy commented Jun 12, 2021

Let's let @kelson42 some time to answer : maybe there is a better approach that we did not think about.

We have to keep in mind that ZIM contents can be much more "dynamic" than the ones from wikimedia (in the sense of a much heavier use of javascript, and much less static HTML)
PhET is an example : on each exercise, there is a menu on the bottom right of the page (unfortunately blocked behind our bottom bar...). If you remove the bottom bar, you can click on the menu and it gives a few options. One of them is to visit the PhET website : it is a dynamically generated <a> tag (that would not be catched with option 1, but would be with option 2), and it has no href attribute : clicking on it seems to run a function with this content : var e=window.open("http://phet.colorado.edu/"+t.locale,"_blank");e.focus(). Here, there is already a _blank target : it's fortunate because we would not have been able to add it with option 2.

I wonder if we already have other ZIM files with content based on a Single Page Application framework (Angular, VueJs, React etc), that would do the same kind of things.

@kelson42
Copy link
Collaborator

kelson42 commented Jun 13, 2021

AFAIK, Kiwix ports for Linux, Windows, iOS, macOS and Android use a custom zim:// protocol sheme to access the ZIM cobtent. Therefore, no HTML DOM transformarion is needed. This is anyway only something we do, only a bit, only in Kiwix Serve. I might be wrong, but this is like this AFAIK.

@Jaifroid
Copy link
Member

It's still necessary to test the link in some way. We don't have a problem knowing whether the link is from the ZIM or from the Web. The problem is knowing how to trap the request when a user clicks on a Web link. We can do it by inspecting the click target (Option 2 above) but it then potentially leaves dynamic links (created by JS in the ZIM) that don't have an href, but which might load external content when clicked, like the case @mossroy points out in PhET ZIMs. We need to trap anything that takes the user to another domain and open it in a new Window, or warn the user with a confirmation prompt and then open a new Window if the user wants to open the link. Otherwise external URLs open inside the iframe, which is suboptimal.

I did a quick prototype using Option 2 on the Check-for-external-links-and-warn branch. In the case of the PhET example you gave, @mossroy, it at least doesn't interfere (as you say, it doesn't see the dynamically generated link).

image

@kelson42
Copy link
Collaborator

@Jaifroid I don't think anything is trapped in previously cited Kiwix ports. Kiwix is simply not able to handle http:// links and then this is forwarded to the OS which has to handle it.

@Jaifroid
Copy link
Member

OK, unfortunately for us (!), the browser can handle external links and puts them in the iframe... which in some cases causes major CORS issues and the whole application stops. It's a poor experience, hence this issue to try to find a 'clean' way (or as clean as possible) to redirect the external links to a separate brower window.

@mossroy
Copy link
Contributor Author

mossroy commented Jun 13, 2021

OK, unfortunately for us (!), the browser can handle external links and puts them in the iframe...

It might be a silly idea, but maybe we could add a https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP that blocks opening other domains inside the iframe? It would avoid this poor/inconsistent experience, while letting the user open the links in a new tab/window (even if it might not be easily discovered by the user)

@Jaifroid
Copy link
Member

Interesting idea, @mossroy . I actually did define a CSP for Kiwix JS Windows, and I can't remember why... It wasn't this specific issue, but something to do with errors that were being thrown in the UWP app which a CSP fixed. It looks like this (in index.html, but also in article.html). Ah, yes it's obviously because inline JS in the ZIM was causing issues once I enabled Service Worker mode:

    <meta http-equiv="Content-Security-Policy" content="default-src http: https: file: data: blob: chrome-extension: 
         ms-appx-web: 'unsafe-inline' 'unsafe-eval';">

In the UWP app, probably for the same reason that @kelson42 outlines, external links "automatically" (without my adding code) open in an external window. (This CSP doesn't affect that, it's because UWP uses a protocol/domain whitelist, so I have whitelisted only ms-appx-web: protocol and https://pwa.kiwix.org in the webapp manifest.)

@Jaifroid
Copy link
Member

Jaifroid commented Jun 13, 2021

It might be similar (or simpler?) in our case to add the sandbox attribute to the iframe, with appropriate permissions defined as we need:

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/iframe#attr-sandbox

It would need to be tested (either an explicit CSP, or the sandbox). If the effect on the end user is that they get an access denied message with no explanation as to how to open the external site they want to open, then I don't think it would be good UX, and probably worse than the current situation. If we can somehow test for this and tell the user to open a new tab with Ctrl-Click instead, it could be a good solution, more universal but slightly less user-friendly (not necessarily, depends on the implementation) than Option 2.

EDIT: This sandbox attribute looks promising:

  • allow-popups-to-escape-sandbox: Lets the sandboxed document open new windows without those windows inheriting the sandboxing. For example, this can safely sandbox an advertisement without forcing the same restrictions upon the page the ad links to.

@mossroy
Copy link
Contributor Author

mossroy commented Jun 13, 2021

This sandbox attribute looks like a good idea!

But I'm not sure it's technically possible to override the browser behavior in case of a cross-origin violation (enforced either by a CSP or the sandbox attribute). I think the browser will silently ignore the request (and log into the console). But it needs to be checked.

If we don't find a better solution, a compromise might be to :

  • implement the sandbox (or CSP), so that no inconsistent, buggy-like or insecure behavior happens : no URL opened inside the iframe, no cross-origin violation that would lead to a blank page, no interaction with the including window (that could modify or replace the DOM of our UI)
  • implement option 2 to handle the most simple ZIM files (if we're sure it can not have side effects : it's probably possible)
  • document in the About section that some links to online sites might not work, and should be manually opened in a new tab/window in this case (with a right-click, a middle-click, a long tap etc)

@Jaifroid
Copy link
Member

@mossroy That sounds like a plan! I may not have time over the next couple of weeks to advance/experiment, but will do after that. Of course in the meantime feel free to investigate further or try out a few things if you like.

@Jaifroid
Copy link
Member

Jaifroid commented Jun 15, 2021

Just a bit of research on this. Due to finding a Google Analytics script running in PhET experiments (from a PhET ZIM -- see openzim/phet#127), I tried out the sandbox element in KJSW to see if it can restrict access to this kind of thing. It's not fine-grained enough: you can either allow scripts or disallow them (from any source), and for something like PhET of course we need to allow scripts to run. Although there is the possibility of imposing a CSP in the iframe element, that option is experimental and only supported by Chromium browsers.

Therefore we would have to fall back to a CSP in the document. For this particular purpose (blocking scripts from disallowed domains inside the iframe -- and I realize it's not the specific issue here) a CSP has to be added to the document being loaded in the iframe, e.g. with a regular expression like this (note we can't add the CSP using DOM methods, becuase by the time the DOM is loaded it is too late: the foreign assets have also been loaded):

            // If there is no CSP, add one to prevent external scripts and content
            if (!/<meta\b[^>]+Content-Security-Policy/i.test(htmlArticle)) {
                htmlArticle = htmlArticle.replace(/(\s*<\/head>)/, '\n    <meta http-equiv="Content-Security-Policy" content="default-src \'self\' data: blob: \'unsafe-inline\' \'unsafe-eval\';"></meta>$1');
            }

This works very well to block the external resources loaded by the PhET experiment:

image

For the purposes of our specific issue here (preventing external links from opening in the iframe), we want to block the iframe from navigating to a disallowed external domain. For that, fortunately, it is enough to define a CSP in index.html, i.e. we don't have to inject it into the iframe document. The good news is that it works (see red text bottom-right of screenshot below). The bad news is that it's terrible UX as I suspected: a big grey error box with the message "This content is blocked: contact the site owner to fix the issue"!

image

@mossroy
Copy link
Contributor Author

mossroy commented Oct 28, 2021

After discussing this today with @kelson42 and @Jaifroid , here is what we decided :

  • we should not modify the HTML code through the ServiceWorker. It would be much too risky
  • we should not modify the DOM either (after the page is rendered). It's still risky too
  • we should trap the click event on the iframe (and ensure we're the last event handler in the bubbling process, to let the ZIM content do whatever it needs to do), test if we're on a <a> tag with a href that points to an external page without any target attribute. And, in this case, warn the user that he's going to on online page, and open it in a new tab if confirmed
  • if it's possible, we also should implement the CSP/sandbox that prevents from opening an external page inside the iframe (Restrict what an iframe can do, for security reasons #753), but it might be not technically possible for now

This is not an ideal solution, but it's the least bad we could find.
It's possible to break some ZIM files that would rely on javascript for internal navigation, while having an external href (which would look like a strange idea).
It's also possible to not trap the opening of an external link, that would rely on javascript (like on PhET).
But this should handle the vast majority of cases.

Not trapping some external links is less bad than breaking a ZIM navigation. To cover this last case, we should add a setting to let the user disable external links trapping. This setting should be considered as a "power-user" setting, and external links trapping should be on by default

@Jaifroid
Copy link
Member

In relation to your comment above, last bullet point, it is possible to prevent all access to external content by the iframe, but ONLY by injecting a CSP into the incoming document, i.e. altering it. I do this in KJSW, and it works well. But more importantly, it is necessary for my current implementation of Zimit file reading. If we can be sure to convert ALL the external links, it might not be a prerequisite for Zimit file reading, but as I cannot be sure of this in my current implementation, blocking access to online font files, etc., is vital. This does not stop a link being opened in a new tab, by the way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants