-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define target top-level origin for a navigation request #5491
Conversation
@annevk , PTAL, thanks! |
Since environment settings object is a subclass of environment, I think we also need to delete the field from environment settings objects, right? |
The top-level origin in the environment settings object is initialized based on the final origin after navigation commits. It is also different in definition: "...at the time this settings object was set up" vs "...at the time this environment object was set up". |
I think that's pretty confusing. Maybe we could use different names for the two fields? |
A different name "target top-level origin" in the environment class, sounds good. Will update. |
Update the name to "target top-level origin" in the environment class |
Will they be different though? It seems problematic if they can be different. |
I think the only difference would be in the case of top-level navigation request the origin is created using url's origin while in all other cases (subframes and subresources), it is created using determine the origin (which also includes sandbox flags) |
Thanks! So if you navigate to This might well be reasonable, but we should document this tidbit. If we already use that as the key for the top-level document, I'm not sure it makes sense to use a different key for embedded documents though. Or am I missing something else? |
That's right that the navigation request itself does not use the opaque origin as the key. But I don't see how sandboxed content can poison the HTTP cache.
|
(Replying the 2nd part of the earlier response again for better formatting)
Sure, I will try to document it in this change.
I think after document commit, since it is known that the origin is opaque, we might as well use that information for the network partition to respect the privacy/security boundaries implied by opaqueness. (Additionally, Chromium's implementation does not persist opaque origin partitions in the http cache, since they won't be able to be reused anyways) |
So you navigate to What I mean by poisoned is that |
That's the idea. Implementation wise, Chromium code currently uses the initial origin (without the opaque information that came in the response), but work is ongoing to be able to use the final origin.
Agree that it is not avoidable. |
Okay, that does seem like a better model and ensures better isolation, even if it has some setup weirdness and comes at the cost of more complexity. @hober @johnwilander @englehardt any feedback on this (see #5491 (comment) for an example)? |
Added the target top-level origin comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, modulo nits below, but I'd really like @hober @johnwilander @englehardt to weigh in before we merge this as this is a pretty significant architecture decision.
source
Outdated
data-dfn-for="environment">target top-level origin</dfn></dt> | ||
<dd><p>The <span>origin</span> of the <span | ||
data-x="concept-environment-target-browsing-context">target browsing context</span>'s | ||
<span>top-level browsing context</span> at the time this environment object was set up.</p></dd> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't describe the nuance of sometimes using creation URL. It also suggests browsing contexts have an origin, but that's not true and since it's a common misconception we should not add instances of non-normative descriptions suggesting that I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably also add a warning here that this is almost always the wrong field to be using. I think it's only needed for networking really. (And maybe service workers, but it seems weird for service workers to create documents that are not same-origin with the service worker.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Added that this is only to be used for navigation requests. Let me know if you thing something more needs to be added in terms of the warning.
Addressed feedback, PTAL, thanks! |
@annevk : PTAL at the latest patch, thanks! |
A few questions so I understand the example:
|
It implies an actual navigation.
This change only computes the top-level origin and its consumers will derive a site from it, if needed. For instance, the http cache partitioning changes here makes use of "obtain a site" to derive scheme://eTLD+1, given the top-level origin.
As mentioned above, here the complete origin will be computed including any subdomain.
Have called out in line 86125 that origin is a function of the response and thus not completely created when we are just dealing with the request. I'm ok with either going with this general statement or going into the details if there is an already existing spec section that can be linked to for this special case. @annevk wdyt? |
There's been a recent set of changes to standards (HTML and URL primarily) where we decided to name that concept site (defined at https://html.spec.whatwg.org/#site). We still have "schemelessly same site" to do a comparison that ignores the scheme, but no standalone concept that matches what is commonly understood as eTLD+1 (registrable domain or origin). I strongly suspect we won't need it going forward, but if a case presents itself we could add it.
As @shivanigithub mentions this defines the origin. We can then derive a key from that for usage in partitioning efforts as desired, most likely using "obtain a site" on that top-level origin. But by having the concept itself be the origin we can also allow certain features to have a stricter boundary as might be needed for security. Taking sandboxing into account would ultimately yield a different site, hence me raising this with you all.
A document can sandbox itself using CSP's sandbox functionality. And yes, a framed sandboxed document would typically popup sandboxed documents as well. To answer @shivanigithub's question, I think it's probably useful to note in the note that sandboxing is the reason the two top-level origins can diverge. I think it's the only reason for now, but who knows what the future brings. |
Added sandboxing as the reason for the two top-level origins to diverge. |
I pushed a commit that I think captures the intent of the design I outlined in #5558. It was a lot more work than I anticipated however. |
I tried to clean up the wording a bit to be more similar to what already existed. And also made it more explicit about the null cases. I think what it does not handle well are the cases where you navigate an embedded browsing context to a response (rather than a request) as in that case there reportedly isn't any reserved environment to copy top-level state from. I'm not sure how accurate that is however as it would mean it would bypass service workers and such, as I understand it. I filed #5577 on that. |
I'm going to squash this and then rebase on top of #5583 as that's landed. And will then address remaining comments. |
An environment's top-level origin is null during the initial navigation (before the response arrived) and otherwise represents the origin of the top-level document. It is currently implementation-defined for non-dedicated workers, but hopefully that can be sorted soon. An environment's top-level creation URL is the URL of the top-level document. It is null for workers as they do not need the concept. Needed for whatwg/fetch#943 and whatwg#5558.
Note to self: after this is done, I can try to tackle appropriately creating Windows before Documents and tying them together. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started reviewing but then realized some of my previous comments weren't yet addressed; sorry for jumping the gun. But I did have one question.
@annevk this doesn't build, so it's a bit hard to review; could you fix that? |
Sorry, I had missed that Travis was down or some such. It's good now according to PR Preview. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for the description of top-level origin still being a bit confusing. Aside from that, everything flows really beautifully now; great stuff.
Also I'm kind of lost on why it's OK for top-level creation URL to be null for workers and worklets? Is that just because we're not planning on using it for those cases? It seems a bit weird from a theoretical perspective.
To restate the design in #5558 from memory, to ensure it still makes sense to me, you all, and is accurate: For documents we need top-level creation URL for a) determining whether we're in a secure context in any document and b) when we haven't created the top-level document yet (we're in the process of fetching the top-level document which allocates all kinds of network caches). These use cases don't apply to workers. And for non-dedicated workers it would be a racy value, even more so if there's no partitioning as then the origin of that value could be different too. Top-level origin we need in all environments (including workers) for fetching and deciding upon network caches. It's null in one scenario, which is where we use top-level creation URL instead (b above). |
Thanks, that's helpful and makes sense. Maybe we should put something like that in the spec, although we could certainly wait until all the pieces fall into place to actually use the values, and maybe also until we figure out the double-keying strategy for shared/service workers. |
@shivanigithub could you take a look as well? |
Thanks @annevk , this looks good to me. |
LGTM too, let's merge this!! (Will let you do the squashing and commit message.) |
I should add here for the record that Mozilla supports a distinct key for sandboxed origins (i.e., the switch that is possible between the initial top-level navigation and the moment where the response arrives). |
This defines the top-level origin for a navigation request based on its target browsing context. This will then be used for cache partitioning changes in whatwg/fetch#943
/browsers.html ( diff )
/browsing-the-web.html ( diff )
/webappapis.html ( diff )
/window-object.html ( diff )
/workers.html ( diff )