-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is an "opaque origin" and why do we care? #321
Comments
See a little further down from the definition for a list of objects that have opaque origins: sandboxed documents, data urls, network schemes, cross-origin images, etc. |
The origin of that reference goes back to the web app manifest. I agree that it may be too much details to include in the lifecycle, but it does not harm either... |
That’s because it’s handled in different specs (HTML, fetch, URL): cf. “When browsers must internally set origin to a value that’ll get serialized as null”. But then I’m afraid you’re finding yourself in the cross/same-origin rabbit hole pretty fast. Note “this is majorly confusing” for a lot of people. |
So, we use the term "opaque origin" in the spec but none of us can define it? LOL What problem does the use of Opaque Origin solve in the spec and is the need something we can use to define it? |
I am fine reviewing this term altogether. We may always put it back if external reviews (e.g., security) makes it necessary. |
For documentation’s sake, here’s a list of references that helped me get the details of the Web Origin Concept:
But yeah there’s no way you can get around it as it’s part of the browsers’ (and WebViews’) security model and you usually discover it the hard way. Hope that can help, independently of removing/keeping the term in the spec of course. Note:
So I guess it’s the default people must refer to anyway. |
That said, it would be nice to have a reworked (i.e. author-understandable) definition in the HTML spec, esp. as it is used pretty often in the issues… |
As far as I can tell, the issue isn't that the term isn't defined (it is, in the HTML standard), it's more that the Web's security model (and associated specs) –for which the 'origin' concept is fundamental– is rather complex and few people have a good understanding of it (I for one consider its details are way above my head 😄).
This term comes from copy/pasting the algorithm from Web App Manifest (there's a little monkey-patching smell to this, btw). As far as I understand they need it to process the In our own algorithm, the origin is used as the value of the manifest URL when the manifest is embedded in the document. I'm not sure exactly what this entails, what are our needs in terms of same-origin checks etc. But rather than rewording, discarding or keeping this term as the result of an uninformed consensus, I think it would be very important to get an expert security review! (in other words, my suggestion is a bit similar to Ivan's but the other way around: get a security review first). |
Nevermind, I edited this post as I don’t want to introduce even more opacity – you can check the history though, if you do want to make it even more opaque. |
To come back to the context in which "opaque origin" is used, i.e. How to obtain a manifest: as
=> therefore I propose that we remove this clause "2." from the algorithm, and don't try to solve html issues that are not in our scope. To put is differently:
none, therefore delete. |
Hmmmm, I’d respectfully beg to differ there: the same-origin policy will apply whatever you spec. Nowadays, browsers may not even allow devs to disable it with a flag – that authors must then use CORS to handle some use cases is another issue. But even
If you’re running a local server, it’s If you’re relying on the So it depends how the Reading App handles it. More generally, there’s a significant amount of issues related to origin open, cf. https://github.com/whatwg/html/issues?q=is%3Aissue+label%3A%22topic%3A+origin%22+is%3Aopen So it’s definitely something user agents are paying attention to. [Edit] Sorry, wrong link for |
That said, I’d vastly prefer UAs (e.g. browsers/webviews) to weigh in since they are probably the only ones having a complete understanding of (opaque) origin. |
Update: https://twitter.com/annevk/status/1048642347800649729?s=20 Note the following issue is probably the best one to do so: whatwg/html#2761 This is a personal opinion, really, but I guess you’re currently on the safe side i.e. “if the document can’t be trusted (because of its origin), don’t try to get the manifest.” |
This probably impacts #205 BTW. On a related note, Safari’s Reader Mode, Pocket, etc. let users cache articles so it could be interesting to check how they are dealing with different origins, as regards the bounds of a publication… |
Update: https://twitter.com/annevk/status/1048642347800649729?s=20 <https://twitter.com/annevk/status/1048642347800649729?s=20>
Note the following issue is probably the best one to do so: whatwg/html#2761 <whatwg/html#2761>
This is a personal opinion, really, but I guess you’re currently on the safe side i.e. “if the document can’t be trusted (because of its origin), don’t try to get the manifest.”
Taking into account that the issue has been open since more than a year, I would not hold my breath.
Your last sentence seems to be a very good possible replacement text and it is probably better than to either to use what is currently in the spec or my earlier proposal to simply remove a reference to the opaque origin. At least for the moment, it is a better text and, if at some point the situation becomes clearer on the HTML/URL front, we can do better.
|
@iherman, I'm not sure I understand: are you suggesting to replace step (2) in the "obtaining the manifest" algorithm? by which sentence exactly? Jiminy's text “if the document can’t be trusted (because of its origin), don’t try to get the manifest.” is good to get the gist of the issue, but can't work as a drop-in replacement for step (2). |
Yeah I can confirm that’s an oversimplification. For instance, Anne van Kesteren used the terms “isolated” and “restricted” yesterday. My biggest worry is that the Origin Concept is fundamental – it is already creating issues and/or complications in EPUB for instance, including security issues –, and if it’s not addressed in details when applicable, that’s something you’ll have to do later anyway – if left to the appreciation of implementers, then there’s a huge risk of interoperability issues. |
@rdeltour I must admit I did not think it through in details but, I would think, step (2) may be replaced by essentially the sentence of @JayPanoz:
I realize it remains fairly vague, and may need further review (who can do that?), but at least it makes the algorithm a little bit more understandable. |
Maybe it would help (it would certainly help me...) if you could give some example of issue or complications with EPUB as of today. Thanks. |
I’ll stick to examples I am familiar with because I learnt it the hard way and I have huge scars to show. Note I’ll restrict those examples to Blink + WebKit, as they are enough differences already. Say app A is using the
Say app B is running a local server e.g.
App C therefore uses a custom scheme to solve the Web Storage API’s For the record, I was the one to report the Let’s now turn to cloud readers. Typically, the EPUB ressource is loaded in an
So at this point the person in charge of the server is already hating you with a passion. How that goes when the cloud reader and EPUB files are not @ the same company in the first place: typically, the whole HTML document is loaded as It’s worth mentioning I’ve seen some cloud readers fail @ fetching the stylesheet for instance – because of the Content Security Policy. So it’s definitely not trivial, even for backend engineers – and you’ll need those people for WPUB, or else it will be painful. Now how does this translate to WPUB, I honestly can’t tell as I didn’t read the spec with the origin considerations in mind. What’s for sure though, is that if some (say for the lack of better word) features depend on the same-origin policy, no exception will be made. In the best-case scenario, the manifest can’t even have an opaque origin and you can remove it entirely. In the worst-case scenario, a lot of issue resolutions/design choices/etc. are impacted. But it’d be nice to see whether there is a risk in the first place. A quick question to illustrate that: a subdomain is not the same origin as the parent domain by default (i.e. it’s opaque), I could find like 4 instances of the “subdomain” term in 3 issues. But is there any guidance for authors anywhere? Because that one will surprise a lot of people for sure – note you can make it non-opaque with CORS. |
On a superficial sight though – which is provided AS-IS, comment under MIT license –, how I understand it right now: if your website is compromised and the manifest results in an opaque origin – don’t ask me why, I’m not a black hat –, then the UA should abort. edit: if you received that comment by mail, I edited it as I’d once again defer to UAs because I don’t want to oversimplify things but my gut feeling is “if it doesn’t fit into the security model, it will be a huge issue.” |
Since this is a legacy of the app manifest, here’s some context:
At this point though, it becomes difficult, I guess for everyone, to keep track there – even I have issues at times. So I’d personally be in favor of sticking to treat web security model/same-origin policies/etc. as another issue if needed – but to check whether it is, I’m afraid you’ll need someone with a very good understanding of such topics, who could also explain Finally, the more I think about it, the more I dig Anne van Kesteren’s “isolated/restricted” explanation. Essentially, this is what UAs are doing: there are “sandboxing” objects whose origin can’t be trusted under the security policy, and restricting some APIs/features accordingly (e.g. DOM access in an |
@JayPanoz trying to move on with the draft... I proposed to change the draft in #321 (comment) by replace step (2) by
(ie, essentially your text), and have a reference to the separate section on security. I am painfully aware that that section is currently empty, and something should be put there at some point, with additional explanation (I would not even dare to do it myself:-), but it may make the draft, editorially cleaner. We could then close this issue and, as you suggest, open a separate issue on this whole problem area... WDYT? (See you soon in Lyon!) |
Hmmm maybe if you want to remove “opaque” entirely, you could use
This would probably be the typical use-case for such a rule, with It drops some other opaque origins (e.g. document created using a |
Note the W3C manifest lifecycle rewriting is really interesting as well. It is being redesigned since Microsoft created a wrapper turning Web Apps into Packaged Web Apps to make them available in their app store – Twitter for Windows 10 is a Progressive Web App for instance. That sounds a lot like Package Web Publications. [Edit] See PWA Builder, esp. this doc ( |
See the discussion in #343; propose closing. Cc: @dauwhe? @TzviyaSiegman? @wareid? @GarthConboy? |
Possibly related: |
Also related #104 Browsing contexts and origins are intertwingled. |
@danielweck it is my understanding that it indeed is, cf. audio and video elements, and the terminology for CORS same-origin and CORS cross-origin. |
I'm curious: can a document have an opaque origin and belong to a web publication? An opaque origin document can't be identified as belonging to any web publication, as it can't be identified in the reading order or resource list without an address. Are there scenarios in which an opaque origin document can link to a manifest on another domain, or will sandboxing and security rules prevent this? If not, then the manifest has to be embedded and that leads to a web publication with an opaque address, since the manifest can only be embedded in the entry page... the address of the web publication. A web publication without an address isn't a web publication. The frosted side of me says to hell with restrictions, but the whole wheat side says maybe halting web publication initiation at the first sign of incompatibility is just a good thing to do. |
Consequence: since |
Isn't that a bit inevitable, though? file: URLs come with restrictions on scripting, API access, etc. How much will be crippled even if you can initiate it? |
Yeah objectively this wouldn’t be a bad thing, given it’s not even consistent across browsers (little interop based on lack of standardisation) so it would currently be a bad idea testing Other that that, the typical example others have been using is a web app/feature e.g. payment to be found in an Then just to be sure, I’d like to re-instate as formal-non-spec note that subdomains are opaque origin by default so you must use CORS & al. – and this always, always, raise issues from authors not familiar with the web security policies at first. I couldn’t necessarily keep up as I’ve ironically had to research and document origin for EPUB and all the nasty issues it might create… but the manifest can’t be on a subdomain anyway, right? |
Right, opaque origins are a signal of insecurity/untrustworthiness, so while I agree with @dauwhe that it would make life a lot simpler to be able to test from a file:// url, I'm not sure browsers will initiate a web publication given how they restrict other features. And if it's unrealistic that they would initiate a web publication for a document with an opaque origin (and I'm not the one to ask if they definitively will, of course, but I see it as probable), then we really have no choice but to live with the restriction. We're not enabling anything by removing it. |
@mattgarrish to clarify as I feel it might not have been clear enough, I was just adding this example to your previous comment, which I am agreeing with. |
Ya, I was just expanding on my earlier answer, as I wrote it a bit hastily last night. I'm not unsympathetic to the file: use case, but I think it's probably unrealistic we could enable it even if we wanted, and then the second hurdle is what else gets disabled. Your example just adds to the evidence of how opaque origins will be treated, which reinforces leaving the step that terminates the initiation of a web publication when a document has an opaque origin. This issue came up on the call yesterday and there was an open question at the end whether there were any useful scenarios we were disallowing by having this restriction, so I wanted to spend a bit more time searching for an answer to that. |
To be clear, I'm completely fine with |
In |
Another thought: the opaque origin (or from a browser engine perspective: "internally unique origin that gets serialized to null") generated for a |
Sorry, brain dead right now but isn’t that why you have * which also happens to be a thing in service workers. |
I am not sure I see the problem. The handling of relative url-s is handled by the json-ld spec. |
@JayPanoz is the Web Publication manifest an extension of Web App manifest? |
@iherman what if the Web Publication manifest has no URL to root itself onto? (isn't that a corollary issue to the opaque origin problem that arises from |
PS (sorry, didn't send in my previous message): |
@danielweck not that I know of but some parts happen to be a legacy of web app manifest so some clarification would be welcome – as a disclaimer, I’m lazy, which should explain why I’m also a huge believer in prior art and when several specs are taking the same path, I must admit I’m wondering why it happens to be this way for the sake of being a pita – OCD. |
per json-ld:
The latter is fresh in the json-ld wg, and was also a decision of the TAG recently, after discussion with the json-ld WG. If there are still uncertainties, it must be raised and solved by the json-ld WG, and not by this WG, imho. It is a good time, that wg is currently busy with details of exactly that, and must solve that issue due to the predominance of embedded json-ld in schema.org. |
@iherman indeed, thus why I am asking whether the manifest "obtention" / "processing" algorithm(s) in the Web Publication specification should explicitly terminate if the base URI cannot be computed (as my example ; albeit a freak edge-case one ; illustrates). Sorry if I am going off-tangent, but origin and base-URL as related concepts, so rather than having two parallel disconnected discussions, I raised the point here (I am opening a new issue nonetheless :) |
New issue for base URL: |
As discussed on Feb 4 2019, closing this issue, it is being worked on in #374. |
This comes up in obtaining a manifest and in the life cycle diagrams. But a trip to the HTML spec is no help, defining "opaque origin" as:
How would a document end up having an opaque origin?
The text was updated successfully, but these errors were encountered: