Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always fetch the origin policy on every request #44

Open
annevk opened this issue Jun 3, 2019 · 16 comments
Open

Always fetch the origin policy on every request #44

annevk opened this issue Jun 3, 2019 · 16 comments

Comments

@annevk
Copy link

annevk commented Jun 3, 2019

Thinking more about some of the privacy issues I'm wondering if we should require HTTP/2 or later and have a fixed URL for the policy. That way we might be able to address some of the performance issues by fetching the policy in parallel with whatever is requested from that origin.

(If we assume that everyone eventually needs a policy we could even do away with the response header and use 4xx / 200 + application/json as signal, plus HTTP cache semantics for updates?)

@michael-oneill
Copy link

I like it.
Legacy HTTP would still work but with added latency first time (presumably policy would be cached with the target). Encourages people to move to HTTP/2.

@domenic
Copy link
Collaborator

domenic commented Oct 24, 2019

Is the implication of this that every HTTP/2 request to an origin for which we haven't yet cached an origin policy is now accompanied by a second request (on the same connection) to /.well-known/origin-policy? (Or maybe just navigation requests?)

That feels expensive, but perhaps that is my HTTP/1 brain thinking...

@domenic
Copy link
Collaborator

domenic commented Oct 24, 2019

Nevermind. I wrote this out in more detail in #47 and I can see how to avoid the extra request in many cases.

However in my writeup I didn't see any reason to restrict this to HTTP/2. It'll just be slower on HTTP/1 since round-trips are more costly and push doesn't exist. That seems fine.

@domenic
Copy link
Collaborator

domenic commented Oct 25, 2019

In #47 @annevk said:

FWIW, my idea behind requiring H/2 or higher was that we would immediately fetch the policy in parallel with fetching url as sending out an additional request over H/2 that results in a 404 is not that expensive (I think).

Would you do this on every request? (Maybe every navigation request?) Would you do it even if we have cached an origin policy, in order to get potential updates (on the theory that 304s are also cheap)?

@annevk
Copy link
Author

annevk commented Oct 25, 2019

Ideally, I think it would be for each new origin that the session encounters, starting with the top-level origin. And ideally it's also up-to-date, but perhaps there needs to be room for configuration there down the line. I'm not sure how realistic this is, but I wanted to throw the idea out there as I rather like the simplicity of it.

@domenic
Copy link
Collaborator

domenic commented Oct 25, 2019

And ideally it's also up-to-date

How would you accomplish this part?

@annevk
Copy link
Author

annevk commented Oct 25, 2019

@domenic oh sorry, that was meant as a yes to your suggestion. And we could use normal HTTP cache semantics + scope for which the policy won't be updated anyway (if a document stays open for hours and we decide policies are immutable as they well should be pretty please) as a signal when to refetch for a particular origin.

@domenic
Copy link
Collaborator

domenic commented Oct 25, 2019

Got it. Then yeah, this feels expensive, but I'd like someone more familiar with actual implementation costs to weigh in... I'll try to rustle Chrome networking folks; could you ask some Mozilla ones?

domenic added a commit that referenced this issue Oct 31, 2019
This is inspired by #44, although the base proposal does not send an origin policy request along with every main request. (That _is_ discussed as a future extension, in a section at the bottom.)

This fixes #23 and fixes #40 by eliminating the Sec-Origin-Policy header. It double-keys the origin policy store to solve privacy concerns. It introduces async update, fixing #10. It uses structured header syntax since that seems to be the new hotness. It uses sets of acceptable policies per "considered alternative two" in https://docs.google.com/document/d/1hYCPowNFjESqJWZ3xItDwjMHmvMwuN_0lB3zedDkZaY/edit#heading=h.8699kqd24hbh.
@domenic
Copy link
Collaborator

domenic commented Oct 31, 2019

So in talking with the Chrome networking folks, the general feeling was that this was expensive, especially potentially for server operators. It still might be worth experimenting with, and there are discussions around potential alternatives (e.g. an "extension frame" is apparently a thing we could use?). But the general sentiment is that it'd be better not to push for this immediately, instead waiting to see how important the sync-update case ends up being.

I tried to capture this all in https://github.com/WICG/origin-policy/blob/master/version-negotiation.md#potential-extension . There I note there that this proposal is a compatible extension of the design in https://github.com/WICG/origin-policy/blob/master/version-negotiation.md. (In particular, this proposal doesn't really make the Origin-Policy response header redundant.)

I think as we go to write the spec, we might want to explicitly allow user agents to request the origin policy out of band, or concurrently with the main request, or similar, so they can experiment with strategies like this, or strategies like updating the user's often-visited sites' origin policies.

@annevk
Copy link
Author

annevk commented Nov 1, 2019

The drawbacks there mention that it doubles server load, but that assumes these policies basically have no lifespan whatsoever. I would expect policies to last quite a bit longer than not at all.

Fair point on Origin-Policy still having value though.

@domenic
Copy link
Collaborator

domenic commented Nov 1, 2019

Well, most importantly it doubles server load for any server that hasn't been updated to deploy a long-lasting origin policy, i.e. every server in existence today. Over time servers could update themselves, but it might be a rough transition.

@annevk
Copy link
Author

annevk commented Nov 1, 2019

Well, if there's a 404 we could just wait a day before trying again (unless there's an Origin-Policy header in between).

@domenic
Copy link
Collaborator

domenic commented Nov 4, 2019

That gets a bit far away from the "just use HTTP semantics" strategy though, and more into the "browser heuristics" territory.

(I did a quick spec check: HTTP leaves it up to the client whether it considers 404s cacheable or not---Chrome currently considers them uncachable---but if the client does, it needs to follow the usual caching rules with regard to respecting headers etc. Note that e.g. https://facebook.com/.well-known/origin-policy has headers cache-control: private, no-cache, no-store, must-revalidate, pragma: no-cache, and expires: Sat, 01 Jan 2000 00:00:00 GMT. I'm not sure how typical that is, but it's at least one data point.)

@annevk
Copy link
Author

annevk commented Nov 5, 2019

I'd be curious to hear @mnot's take on that, but I suspect we'll need some additional logic either way.

@domenic domenic changed the title Fixed .well-known policy URL Always fetch the origin policy on every request Nov 19, 2019
@domenic
Copy link
Collaborator

domenic commented Nov 19, 2019

I've renamed this issue to "Always fetch the origin policy on every request", to reflect the part of the discussion that isn't yet in the explainer. The idea of using a single location was incorporated into the version negotiation doc.

@mnot
Copy link

mnot commented Jan 7, 2020

404 allows clients to heuristically cache responses, meaning that if they don't have an explicit freshness lifetime, the client can synthesise one.

So, OP could specify a heuristic for this resource -- e.g., if there isn't an explicit freshness lifetime, consider it to be one hour.

Also, it's possible to specify a caching layer "above" HTTP -- such as has been done with the image cache. So even if there is an explicit lifetime, you might specify that it has a minimum freshness lifetime of something like ten seconds, or allow it to be used by multiple responses on a page, etc.

WRT Facebook - it looks like they've chosen to make its 404s explicitly uncacheable. shrug. I'm sure there are other examples of sites like this out there, but I suspect they'll adjust (rather quickly).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants