Define target top-level origin for a navigation request #5491

shivanigithub · 2020-04-27T11:04:46Z

This defines the top-level origin for a navigation request based on its target browsing context. This will then be used for cache partitioning changes in whatwg/fetch#943

/browsers.html ( diff )
/browsing-the-web.html ( diff )
/webappapis.html ( diff )
/window-object.html ( diff )
/workers.html ( diff )

shivanigithub · 2020-04-27T11:05:10Z

@annevk , PTAL, thanks!

domenic · 2020-04-27T14:44:46Z

Since environment settings object is a subclass of environment, I think we also need to delete the field from environment settings objects, right?

shivanigithub · 2020-04-27T15:16:20Z

Since environment settings object is a subclass of environment, I think we also need to delete the field from environment settings objects, right?

The top-level origin in the environment settings object is initialized based on the final origin after navigation commits. It is also different in definition: "...at the time this settings object was set up" vs "...at the time this environment object was set up".
Can we let it be there and behave like an overridden method since it changes its value and definition? (Not too sure of spec rules for inheritance)

domenic · 2020-04-27T15:17:37Z

I think that's pretty confusing. Maybe we could use different names for the two fields?

shivanigithub · 2020-04-27T15:19:48Z

A different name "target top-level origin" in the environment class, sounds good. Will update.

shivanigithub · 2020-04-27T15:57:08Z

Update the name to "target top-level origin" in the environment class

annevk · 2020-04-28T13:05:15Z

Will they be different though? It seems problematic if they can be different.

shivanigithub · 2020-04-28T13:19:28Z

Will they be different though? It seems problematic if they can be different.

I think the only difference would be in the case of top-level navigation request the origin is created using url's origin while in all other cases (subframes and subresources), it is created using determine the origin (which also includes sandbox flags)

annevk · 2020-04-28T13:29:49Z

Thanks! So if you navigate to https://site-a.example you would use that as the top-level site for the HTTP cache, despite the response perhaps indicating that it uses an opaque origin. This means that sandboxed content can poison the HTTP cache (and other things).

This might well be reasonable, but we should document this tidbit.

If we already use that as the key for the top-level document, I'm not sure it makes sense to use a different key for embedded documents though. Or am I missing something else?

shivanigithub · 2020-04-28T14:25:22Z

Thanks! So if you navigate to https://site-a.example you would use that as the top-level site for the HTTP cache, despite the response perhaps indicating that it uses an opaque origin. This means that sandboxed content can poison the HTTP cache (and other things).

That's right that the navigation request itself does not use the opaque origin as the key. But I don't see how sandboxed content can poison the HTTP cache.

This might well be reasonable, but we should document this tidbit.
Sure, I will try to document it in this change.

If we already use that as the key for the top-level document, I'm not sure it makes sense to use a different key for embedded documents though. Or am I missing something else?
I think after document commit, since it is known that the origin is opaque, we might as well use that information for the network partition to respect the privacy/security boundaries implied by opaqueness. (Additionally, Chromium's implementation does not persist opaque origin partitions in the http cache, since they won't be able to be reused anyways)

shivanigithub · 2020-04-28T14:27:44Z

(Replying the 2nd part of the earlier response again for better formatting)

This might well be reasonable, but we should document this tidbit.

Sure, I will try to document it in this change.

If we already use that as the key for the top-level document, I'm not sure it makes sense to use a different key for embedded documents though. Or am I missing something else?

I think after document commit, since it is known that the origin is opaque, we might as well use that information for the network partition to respect the privacy/security boundaries implied by opaqueness. (Additionally, Chromium's implementation does not persist opaque origin partitions in the http cache, since they won't be able to be reused anyways)

annevk · 2020-04-29T12:32:12Z

So you navigate to https://example.com/sandboxed. This uses ("https", "example.com") as top-level site. But then once you get a response you realize the top-level site is an opaque origin instead. Does this mean you would start using new connections and such for subsequent requests?

What I mean by poisoned is that /sandboxed will be in the ("https", "example.com") cache, but that doesn't really seem avoidable.

shivanigithub · 2020-04-29T13:13:43Z

So you navigate to https://example.com/sandboxed. This uses ("https", "example.com") as top-level site. But then once you get a response you realize the top-level site is an opaque origin instead. Does this mean you would start using new connections and such for subsequent requests?

That's the idea. Implementation wise, Chromium code currently uses the initial origin (without the opaque information that came in the response), but work is ongoing to be able to use the final origin.

What I mean by poisoned is that /sandboxed will be in the ("https", "example.com") cache, but that doesn't really seem avoidable.

Agree that it is not avoidable.

annevk · 2020-04-29T13:48:07Z

Okay, that does seem like a better model and ensures better isolation, even if it has some setup weirdness and comes at the cost of more complexity.

@hober @johnwilander @englehardt any feedback on this (see #5491 (comment) for an example)?

shivanigithub · 2020-04-30T13:11:03Z

Added the target top-level origin comment

annevk

This looks good to me, modulo nits below, but I'd really like @hober @johnwilander @englehardt to weigh in before we merge this as this is a pretty significant architecture decision.

annevk · 2020-05-04T08:56:28Z

source

+   data-dfn-for="environment">target top-level origin</dfn></dt>
+   <dd><p>The <span>origin</span> of the <span
+   data-x="concept-environment-target-browsing-context">target browsing context</span>'s
+   <span>top-level browsing context</span> at the time this environment object was set up.</p></dd>


This doesn't describe the nuance of sometimes using creation URL. It also suggests browsing contexts have an origin, but that's not true and since it's a common misconception we should not add instances of non-normative descriptions suggesting that I think.

We should probably also add a warning here that this is almost always the wrong field to be using. I think it's only needed for networking really. (And maybe service workers, but it seems weird for service workers to create documents that are not same-origin with the service worker.)

Done. Added that this is only to be used for navigation requests. Let me know if you thing something more needs to be added in terms of the warning.

source

shivanigithub · 2020-05-07T13:20:46Z

Addressed feedback, PTAL, thanks!

shivanigithub · 2020-05-12T04:25:39Z

@annevk : PTAL at the latest patch, thanks!
@hober @johnwilander @englehardt : PTAL as per comment , thanks!

johnwilander · 2020-05-12T22:07:54Z

So you navigate to https://example.com/sandboxed. This uses ("https", "example.com") as top-level site. But then once you get a response you realize the top-level site is an opaque origin instead. Does this mean you would start using new connections and such for subsequent requests?

What I mean by poisoned is that /sandboxed will be in the ("https", "example.com") cache, but that doesn't really seem avoidable.

A few questions so I understand the example:

Does "you navigate" imply any user activity here is it just "a navigation occurs?"
Are we talking about site as in SameSite? If so, the protocol is not included afaik. It's just the registrable domain or eTLD+1. We may want to partition on protocol too but then we should have a name for it or say "site and protocol."
If we're going with an example URL, I'd prefer to include a subdomain and whether it is included in the partition or not. Which is it here?
I assume the top level site being sandboxed is for child windows opened from a sandboxed, opaque origin. True? Are there more cases? When it comes to top level navigations, most think of regular navigations and not window.open(). Is there use in calling out how top frames can become sandboxed?

shivanigithub · 2020-05-13T13:07:29Z

So you navigate to https://example.com/sandboxed. This uses ("https", "example.com") as top-level site. But then once you get a response you realize the top-level site is an opaque origin instead. Does this mean you would start using new connections and such for subsequent requests?
What I mean by poisoned is that /sandboxed will be in the ("https", "example.com") cache, but that doesn't really seem avoidable.

A few questions so I understand the example:

Does "you navigate" imply any user activity here is it just "a navigation occurs?"

It implies an actual navigation.

Are we talking about site as in SameSite? If so, the protocol is not included afaik. It's just the registrable domain or eTLD+1. We may want to partition on protocol too but then we should have a name for it or say "site and protocol."

This change only computes the top-level origin and its consumers will derive a site from it, if needed. For instance, the http cache partitioning changes here makes use of "obtain a site" to derive scheme://eTLD+1, given the top-level origin.

If we're going with an example URL, I'd prefer to include a subdomain and whether it is included in the partition or not. Which is it here?

As mentioned above, here the complete origin will be computed including any subdomain.

I assume the top level site being sandboxed is for child windows opened from a sandboxed, opaque origin. True? Are there more cases? When it comes to top level navigations, most think of regular navigations and not window.open(). Is there use in calling out how top frames can become sandboxed?

Have called out in line 86125 that origin is a function of the response and thus not completely created when we are just dealing with the request. I'm ok with either going with this general statement or going into the details if there is an already existing spec section that can be linked to for this special case. @annevk wdyt?

annevk · 2020-05-14T12:52:49Z

We may want to partition on protocol too but then we should have a name for it or say "site and protocol."

There's been a recent set of changes to standards (HTML and URL primarily) where we decided to name that concept site (defined at https://html.spec.whatwg.org/#site).

We still have "schemelessly same site" to do a comparison that ignores the scheme, but no standalone concept that matches what is commonly understood as eTLD+1 (registrable domain or origin). I strongly suspect we won't need it going forward, but if a case presents itself we could add it.

If we're going with an example URL, I'd prefer to include a subdomain and whether it is included in the partition or not. Which is it here?

As @shivanigithub mentions this defines the origin. We can then derive a key from that for usage in partitioning efforts as desired, most likely using "obtain a site" on that top-level origin. But by having the concept itself be the origin we can also allow certain features to have a stricter boundary as might be needed for security.

Taking sandboxing into account would ultimately yield a different site, hence me raising this with you all.

I assume the top level site being sandboxed is for child windows opened from a sandboxed, opaque origin. True? Are there more cases?

A document can sandbox itself using CSP's sandbox functionality. And yes, a framed sandboxed document would typically popup sandboxed documents as well. To answer @shivanigithub's question, I think it's probably useful to note in the note that sandboxing is the reason the two top-level origins can diverge. I think it's the only reason for now, but who knows what the future brings.

shivanigithub · 2020-05-18T12:29:31Z

Added sandboxing as the reason for the two top-level origins to diverge.

annevk · 2020-05-26T13:23:45Z

I pushed a commit that I think captures the intent of the design I outlined in #5558. It was a lot more work than I anticipated however.

source

annevk · 2020-05-27T12:01:33Z

I tried to clean up the wording a bit to be more similar to what already existed. And also made it more explicit about the null cases.

I think what it does not handle well are the cases where you navigate an embedded browsing context to a response (rather than a request) as in that case there reportedly isn't any reserved environment to copy top-level state from. I'm not sure how accurate that is however as it would mean it would bypass service workers and such, as I understand it. I filed #5577 on that.

source

annevk · 2020-06-03T16:02:03Z

I'm going to squash this and then rebase on top of #5583 as that's landed. And will then address remaining comments.

An environment's top-level origin is null during the initial navigation (before the response arrived) and otherwise represents the origin of the top-level document. It is currently implementation-defined for non-dedicated workers, but hopefully that can be sorted soon. An environment's top-level creation URL is the URL of the top-level document. It is null for workers as they do not need the concept. Needed for whatwg/fetch#943 and whatwg#5558.

domenic · 2020-06-03T19:19:19Z

Note to self: after this is done, I can try to tackle appropriately creating Windows before Documents and tying them together.

domenic

I started reviewing but then realized some of my previous comments weren't yet addressed; sorry for jumping the gun. But I did have one question.

source

domenic · 2020-06-04T15:56:36Z

@annevk this doesn't build, so it's a bit hard to review; could you fix that?

annevk · 2020-06-04T16:26:51Z

Sorry, I had missed that Travis was down or some such. It's good now according to PR Preview.

domenic

LGTM except for the description of top-level origin still being a bit confusing. Aside from that, everything flows really beautifully now; great stuff.

Also I'm kind of lost on why it's OK for top-level creation URL to be null for workers and worklets? Is that just because we're not planning on using it for those cases? It seems a bit weird from a theoretical perspective.

source

annevk · 2020-06-04T17:14:02Z

To restate the design in #5558 from memory, to ensure it still makes sense to me, you all, and is accurate:

For documents we need top-level creation URL for a) determining whether we're in a secure context in any document and b) when we haven't created the top-level document yet (we're in the process of fetching the top-level document which allocates all kinds of network caches).

These use cases don't apply to workers. And for non-dedicated workers it would be a racy value, even more so if there's no partitioning as then the origin of that value could be different too.

Top-level origin we need in all environments (including workers) for fetching and deciding upon network caches. It's null in one scenario, which is where we use top-level creation URL instead (b above).

domenic · 2020-06-04T17:20:18Z

Thanks, that's helpful and makes sense. Maybe we should put something like that in the spec, although we could certainly wait until all the pieces fall into place to actually use the values, and maybe also until we figure out the double-keying strategy for shared/service workers.

annevk · 2020-06-05T10:29:23Z

@shivanigithub could you take a look as well?

shivanigithub · 2020-06-05T13:23:27Z

Thanks @annevk , this looks good to me.

domenic · 2020-06-05T15:32:07Z

LGTM too, let's merge this!! (Will let you do the squashing and commit message.)

annevk · 2020-06-05T15:44:47Z

I should add here for the record that Mozilla supports a distinct key for sandboxed origins (i.e., the switch that is possible between the initial top-level navigation and the moment where the response arrives).

shivanigithub mentioned this pull request Apr 27, 2020

HTTP cache partitioning whatwg/fetch#943

Merged

annevk added the security/privacy There are security or privacy implications label Apr 28, 2020

annevk mentioned this pull request Apr 30, 2020

Tie state to agent clusters. Fixes #28. privacycg/storage-access#29

Merged

annevk reviewed May 4, 2020

View reviewed changes

annevk mentioned this pull request May 18, 2020

Secure Contexts integration #5558

Closed

annevk force-pushed the master branch from e7829bf to 05e5c4a Compare May 26, 2020 10:36

shivanigithub commented May 27, 2020

View reviewed changes

source Outdated Show resolved Hide resolved

shivanigithub commented May 27, 2020

View reviewed changes

source Outdated Show resolved Hide resolved

annevk requested a review from domenic May 27, 2020 12:02

domenic reviewed May 27, 2020

View reviewed changes

This was referenced May 28, 2020

Restructure create a new browsing context #5583

Merged

Add cross-origin opener policy #5334

Merged

annevk force-pushed the master branch from 0257916 to 7c6312a Compare June 3, 2020 16:10

make window environment more static

0548f13

domenic reviewed Jun 3, 2020

View reviewed changes

source Show resolved Hide resolved

attempt to address review feedback

1022fd8

annevk requested a review from domenic June 4, 2020 14:56

where is CI when you need it

b798f40

domenic approved these changes Jun 4, 2020

View reviewed changes

source Show resolved Hide resolved

attempt to clarify potential

08bf96a

annevk merged commit 31b264a into whatwg:master Jun 5, 2020

Define target top-level origin for a navigation request #5491

Define target top-level origin for a navigation request #5491

Conversation

shivanigithub commented Apr 27, 2020 • edited by pr-preview bot Loading

shivanigithub commented Apr 27, 2020

domenic commented Apr 27, 2020

shivanigithub commented Apr 27, 2020

domenic commented Apr 27, 2020

shivanigithub commented Apr 27, 2020

shivanigithub commented Apr 27, 2020

annevk commented Apr 28, 2020

shivanigithub commented Apr 28, 2020

annevk commented Apr 28, 2020

shivanigithub commented Apr 28, 2020

shivanigithub commented Apr 28, 2020

annevk commented Apr 29, 2020

shivanigithub commented Apr 29, 2020

annevk commented Apr 29, 2020

shivanigithub commented Apr 30, 2020

annevk left a comment

Choose a reason for hiding this comment

annevk May 4, 2020

Choose a reason for hiding this comment

annevk May 4, 2020

Choose a reason for hiding this comment

shivanigithub May 7, 2020

Choose a reason for hiding this comment

shivanigithub commented May 7, 2020

shivanigithub commented May 12, 2020

johnwilander commented May 12, 2020

shivanigithub commented May 13, 2020

annevk commented May 14, 2020

shivanigithub commented May 18, 2020

annevk commented May 26, 2020

annevk commented May 27, 2020

annevk commented Jun 3, 2020

domenic commented Jun 3, 2020

domenic left a comment

Choose a reason for hiding this comment

domenic commented Jun 4, 2020

annevk commented Jun 4, 2020

domenic left a comment

Choose a reason for hiding this comment

annevk commented Jun 4, 2020

domenic commented Jun 4, 2020

annevk commented Jun 5, 2020

shivanigithub commented Jun 5, 2020

domenic commented Jun 5, 2020

annevk commented Jun 5, 2020

shivanigithub commented Apr 27, 2020 •

edited by pr-preview bot

Loading