Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FAQ: Why not use SRI-based caching instead? #18

Open
littledan opened this issue Dec 17, 2018 · 12 comments
Open

FAQ: Why not use SRI-based caching instead? #18

littledan opened this issue Dec 17, 2018 · 12 comments

Comments

@littledan
Copy link
Member

This idea is so frequently cited that it might be worth including in an FAQ. The idea is to use some version of signature-based SRI-based caching which crosses origins instead (with fixes for versioning and rollbacks)? This could bring the performance benefits of built-in modules, while not biasing us towards the browser-provided modules and instead deferring to the ecosystem.

There are two issues I know of to this approach:

  • In practice, the use of different, potentially incompatible versions of the same library limits the potential for reuse through caching.
  • To avoid security and privacy risks, many browsers "double-key" the caches, based on not just the resource requested but also the origin which requested the resource. Without including the requesting origin, an "attacker" could time how long the request takes, leaking information about what was previously visited.

If anyone has good references on these topics, it'd be good to include them.

cc @lukewagner

@ljharb
Copy link
Member

ljharb commented Dec 17, 2018

What are “origins” in a browser-agnostic context? or would the approach only be addressing browsers?

@Mouvedia
Copy link

Mouvedia commented Dec 17, 2018

By SRI you mean what? SubResource Integrity?

@littledan
Copy link
Member Author

See past discussion of this idea at w3c/webappsec-subresource-integrity#22

@Mouvedia Yes.

@ljharb This is an idea for browsers; for Node or other embedders, the hope might be to enable bytecode caching at all, since it doesn't have to worry about origin separation. @joyeecheung is working on this (starting with Node Core itself).

@bmeck
Copy link
Member

bmeck commented Dec 17, 2018

This seems a generic issue for all modules. What makes builtin modules special here?

@littledan
Copy link
Member Author

littledan commented Dec 17, 2018

@bmeck What makes this related to built-in modules is, part of the motivation for built-in modules is to reduce the overhead from downloading them over the network. Cross-origin SRI-based caching is a potential mitigation to that same download size issue, which unfortunately doesn't seem to be feasible.

@bmeck
Copy link
Member

bmeck commented Dec 17, 2018

@littledan I'm still unsure I understand on why this differs from other modules, I guess I'll wait on it. The bandwidth/time savings being similar to ecosystem is fine, but I remain unclear why SRI is problematic for builtin modules.

@littledan
Copy link
Member Author

@bmeck (Sorry, I misunderstood your question.) Yes, you're right, cross-origin SRI-based caching faces this barrier whether or not it's caching something that's part of a built-in module polyfill.

@tabatkins
Copy link

So, the problems I've come to understand that people have with SRI-based cross-origin caching:

  1. Timing attacks. The full set of libraries that a given site uses tends to be fairly unique; while lots of sites might load jQuery (ignoring all the different versions for a moment...), the full set of extensions they additionally load tends to form a pretty unique fingerprint. As such, a hostile page that loads up a whole bunch of libraries and times them to figure out which came from cache and which hit the network would function as a pretty effective determiner of what sites have been recently visited by the user. (This is effectively a single-use attack; once one page does this, it poisons the cache for any other page trying to do it. But it's still considered dangerous.)

  2. A library that is likely to be cached is more attractive to use than one which probably needs to be fetched from the network; this encourages a minor "the rich get richer" effect where popular libraries remain popular because they're popular, and newer better libraries have trouble gaining traction.

  3. Cache-poisoning attacks. If you can engineer a hostile file that has the same SRI signature as a popular library, you can feed it to users and then have it unexpectedly loaded on other sites, getting a persistent XSS on them without them doing anything wrong. While the hashes SRI uses aren't expected to be attackable in this way in the reasonable future, things sometimes change!


I've given a lot of thought to these, and I think 1 and 2 can be reasonably mitigated by imposing a degree of randomness on the caching behavior. Basically:

  1. Randomly expire libraries from the cache regularly, increasing false-negative errors. Having almost all of the libraries on your page pre-cached automatically is still very worthwhile
  2. Randomly pre-load libraries into the cache based on usage data, increasing false-positive errors. Prefer libraries with low usage among sites, but used on sites with high usage. (This requires a degree of use-tracking, which would fall under the existing anonymous stat collection browsers already do.)

However, I don't see how to mitigate 3 without double-keying, which defeats the entire point. That said, we can counter-intuitively recover most of the cross-origin benefits in a double-keyed world if we just continue to use CDNs to load libraries; if everyone gets the library from the same 3rd-party origin, then keying to that origin doesn't defeat caching.

@littledan
Copy link
Member Author

Hmm, I don't know what kind of math to use to understand the relationship between the "amount" of privacy preserved vs the performance degradation due to those two techniques... This reminds me a bit of checking 1/100 of the people in the airport and letting through 99% of the risk.

@bkardell
Copy link

@tabatkins last I checked though people don't get from the same origin and misses are pretty high, right? Wouldn't some extent of this require a kind of 'official' URL in order for that to actually work out?

@tabatkins
Copy link

An "official" url would help, sure. But centralizing effects would occur regardless, due to the value of using the same CDN origin as others. Right now there's not any particular reason to centralize.

@jikkujose
Copy link

@tabatkins wow I was thinking exactly the same thing. I believe the biggest problem in this whole idea is privacy. Depending on the downloads an observer system can get fair understanding of a user's browsing history.

Cache poisoning: Is this a serious issue in browser contexts? Manually replacing already downloaded files is close to impossible. And a hash collision for something like this can be easily mitigated by using a better hashing algorithm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants