Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider tying background fetches to storage buckets #135

Open
asutherland opened this issue May 20, 2019 · 3 comments
Open

Consider tying background fetches to storage buckets #135

asutherland opened this issue May 20, 2019 · 3 comments

Comments

@asutherland
Copy link

Problem Statement: Managing storage quota for origins in a browser is problematic. Although the storage spec now has a concept of buckets as the atomic unit of storage for eviction purposes, there is still only a single bucket per origin. Browsers potentially have a very good understanding of how much a user uses a site which the user allows to be persisted in their history, but no understanding of how that usage maps to storage. Keeping an origin's storage is all or nothing. This results in Quota Management effectively being a combination of prompting (navigator.storage.persist(), Firefox's former 50 MB limit on IndexedDB unless the user said yes to a prompt) and LRU-style eviction with quotas set based on available disk space.

Motivating Concern: While thinking about whatwg/storage#70 as it relates to Firefox I recognized it's hard to cap quota for an origin to a reasonable size by default unless there's a way for the user to grant revocable portions of quota.

Modest Proposal: Tie background fetch downloads to storage buckets. The background fetch spec has already figured out how to expose long-running actions that can use a lot of disk space to the user via the browser's downloads UI. Downloads are something users are (more) able to reason about, both in the act and if explicitly cleaning things up.

Spec-wise, we need content to be prepared for data to disappear from disk and potentially able to detect and recover from it. While a browser could treat background-fetch granted downloads as explicit quota grants, there's little benefit if there's no way to actually reclaim that space related to those grants without breaking the origin.

Practical Considerations: Spec'ing a full storage bucket API at this time would be a lot of work and there's potential ergonomics issues that could spell doom for adoption of background-fetch if consumers have to jump through a ton of hoops. A baby step might be to support branding the returned background-fetch Response with a storage bucket that is also propagated to the Blob returned by response.blob(). The storage bucket is considered to exist as long as one of the branded objects exists, and it would have an explicit storage grant of the size requested for the download.

In the event the user or browser chooses to evict the bucket, the following behaviors would occur:

  1. Cache API: The Response would disappear from any Caches in which it exists. Any existing Response handles would error out in the same way they would if the origin bucket was evicted.
  2. IndexedDB: Any existing records referencing the Blob would continue to exist, but the Blob would error out it used in the same way it would if the origin bucket was evicted. Alternately, explicit additional state handling could be added.

The future plan would be that:

  • Explicit storage buckets could be created, each with their own IndexedDB and Cache API factories. UI flow would presumably build on the download model.
  • Bucket-branded responses could be easily used to open a related storage bucket with some level of temporary extra quota budget to process the download. For example, an offline map app processing compressed map tiles into whatever format it actually wants locally.
  • For sanity, branded Blobs/Responses could only be stored in the default bucket for an origin or in their own storage area. Attempting to store a branded Blob/Response into another bucket would just copy the Blob/Response and be charged against the target storage bucket's quota.

Discussion Hoped For:

  • How crazy is this? (Is it not crazy enough?)
  • Are there better ways to get us towards having and using storage buckets? Or better ways to avoid always offering quota based on how much free storage there is? I would presume Google's Project Fugu would have plans here, but https://bugs.chromium.org/p/chromium/issues/detail?id=897276 seems to currently just be intending to grant unlimited storage like Firefox OS did for installed apps.
  • How can this API evolve into supporting storage buckets if it doesn't start out supporting them?
@jakearchibald
Copy link
Collaborator

The all-or-nothing approach we have for storage feels easy to explain. If I use my own system to link together an entry in session storage, the cache API, and indexeddb, it's nice that I can rely on those things existing together, or none existing.

It seems weird that an entry could disappear from my 'users' IDB store, just because that entry includes a avatar blob that came from a background fetch. It might not be obvious that the blob originated from a bg fetch, as it might have gone into the cache API, and later be returned and put into IDB by independent bits of code.

I guess I'm not fully understanding the problem. Is it something to do with the limits on one origin giving clues to the storage of another?

That said, I like the idea of creating storage with configurable automatic eviction, and it's something folks have asked from the cache API before, but I think it should be explicitly requested by the developer.

How can this API evolve into supporting storage buckets if it doesn't start out supporting them?

What would it mean to evict background fetch storage? The lifetime of bgfetch storage is pretty limited. Once the notification is gone, storage can be freed as JS loses references to the associated requests and responses. If the developer stores items somewhere more persistent (eg cache API), then it takes on the persistency of the new storage. So, if we create a way to make an instance/entry in the cache API auto-evictable, it would take on those rules.

I guess an early eviction of background fetch storage would mean aborting the fetch. Is that something the developer would want to happen out-of-the-blue?

@asutherland
Copy link
Author

I guess I'm not fully understanding the problem. Is it something to do with the limits on one origin giving clues to the storage of another?

That's the motivation behind my thought process, yes.

Right now Firefox says that the maximum amount of space a group (eTLD+1, shared amongst all origins in a group) is ~1/5 of free storage on the disk capped at 2GiB, inclusive of space already tracked by quota management. This is somewhat of a cop-out approach and navigator.storage.persist() ends up as a band-aid on top of that. Deriving the quota limit from free space is also a (somewhat-mitigated) side-channel information leak.

Starting from first principles, one might propose a strategy for quota management like:

  1. We'll give each origin a "reasonable" starting size that is decoupled from available disk space.
  2. We'll let the origin request additional storage and the user can decide.

The eternal browser UX problems with prompting the user are of course:

  • The user is trying to do something at the current time and prompts are likely to just irritate them and result in them just clicking "yes" until the pop-ups go away.
  • Unless the site is malicious in nature, the user probably does want to let the site do what it wants to do right now, so asking is somewhat pointless. The bigger issue is that they may not need/want the storage to stick around forever.
  • It's frequently hard to explain decisions to the user in a way that they can make an informed decision, especially if they're trying to get something else done.

File downloads are an interesting case where I expect users are more likely to understand what's happening. In the ideal case, the user understands that the site wants to store some specific data on their computer and this will consume network data and use up disk space until deleted. Browsers have also begun adopting UX flows where the download starts automatically, the user is alerted to the fact that it's started, but aren't necessarily prompted (at least not after the first time, unless they want to be).

File downloads also potentially map exactly to storage buckets. Which we can then explain to the user and allow them to revoke the quota grant and reclaim the storage space without completely wiping out the origin.

My proposal about magically disappearing Blobs was hand-waving as to whether there's a way to take baby steps in terms of having background-sync use storage buckets but not require API consumers to be aware of the existence of other storage buckets than the origin default. It might be best to ignore that.

What would it mean to evict background fetch storage? The lifetime of bgfetch storage is pretty limited. Once the notification is gone, storage can be freed as JS loses references to the associated requests and responses. If the developer stores items somewhere more persistent (eg cache API), then it takes on the persistency of the new storage. So, if we create a way to make an instance/entry in the cache API auto-evictable, it would take on those rules.

My question as it relates to this is that if background-fetch lets you download a 1GB file and guarantees it stays alive until the success event... how does that 1GB file interact with the quota system? It's straightforward if the SW was a bit-torrent app and the download is handed off as a Blob to the normal system-download mechanism where the downloaded file exists outside of the browser itself.

But if the SW is part of a video streaming site that allows offlining and it tries to store it in the Cache API, how does that work? Assuming we boost the quota, how do we get that quota back if we can't tell what is part of that download once it goes into the opaque box of content JS and content storage APIs? Do we end up evicting every origin that was foolish enough not to immediately request navigator.storage.persist() on first load? Do we end up evicting the video streaming site and the many gigs of videos they user just offlined when they browse to twitter next because we had to perform all-or-nothing eviction?

Firefox's strategy is very naive LRU-based eviction at this point, so we can certainly become a lot more clever to avoid worst-case pathological eviction cycles, but there are limits to how clever we can be with all-or-nothing eviction. (And the reality is that sites will code against the cleverness of the browsers they develop/test against.)

@jakearchibald
Copy link
Collaborator

My question as it relates to this is that if background-fetch lets you download a 1GB file and guarantees it stays alive until the success event... how does that 1GB file interact with the quota system?

I figured it would be part of the same quota as the cache API. Ideally, the downloadTotal should be used to check space available. I thought I did that, but I can't see it in the spec. Filed #136.

But if the SW is part of a video streaming site that allows offlining and it tries to store it in the Cache API, how does that work? Assuming we boost the quota, how do we get that quota back if we can't tell what is part of that download once it goes into the opaque box of content JS and content storage APIs? Do we end up evicting every origin that was foolish enough not to immediately request navigator.storage.persist() on first load? Do we end up evicting the video streaming site and the many gigs of videos they user just offlined when they browse to twitter next because we had to perform all-or-nothing eviction?

This does create potential for storage duplication when adding fetched items into the cache API. I guess browsers could be 'smart' and dedupe a blob stored in two places. Otherwise, I guess we'd need some kind of storage 'transfer'.

Once the bgfetch operation is complete, the browser can delete its storage (unless it's doing the blob deduping above, in which case it just decrements the reference count or whatever).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants