Split streaming, content-addressable storage from transactional, mutable storage #704
Description
Instead of having a single storage backend (see #616 and #643), I think it would be better to have two storage backends, one for content-addressable storage (where a given key always refers to the same content, e.g. the tarsummed layer tarballs) and one for mutable storage (e.g. the ‘latest’ tag for a given repository or registry-internal information like dependent tag tracking). For convenience, I've called the content-addressable storage “streaming storage” and the mutable storage “atomic storage”. Earlier thoughts on this issue:
- Initial AtomicStorage / StreamingStorage bifurcation.
- A transactional registry based on transactional atomic storage and garbage collected streaming storage.
- Independent config namespaces for each storage type.
- An API-versioning scheme to allow easy transition to this model if/when we decide that we don't want to keep everything behind a single storage API.
For similar APIs in other systems, see:
- Git, which has
.git/objects
for content-addressable data, and.git/logs
and.git/refs
for mutable data. - dat, which uses uses fs-blob-store for content-addressable data and LevelDB for mutable data. You can swap in an alternative content-addressable store, because fs-blob-store implements the abstract-blob-store API and passes its tests. This is the sort of “API and tests an an external module” approach that I've proposed before.
- IPFS, which has a mutable IPNS layer pointing into a content-addressable IPFS layer.
I've opened this issue so we can focus discussion here without without diluting other issues. There's a lot of stuff up in the air right now, so I'm fine if this issue just sits as a placeholder until we have time to kick it around some more.