layout: sharding the blob store #449

cyphar · 2016-11-05T23:50:35Z

One issue that I'm quite worried about is the performance impact of having too many blobs inside an OCI image. Now, practically speaking I would be surprised if n > 20 in most cases, but some people have expressed that they would like to have the entire universe bottled into an OCI image. I will refrain from commenting on how good of an idea I think that is, but if it's going to be a "valid usecase" then we should reconsider how we've organised the blob directory.

Namely, the current method of blobs/<algo>/<digest> will cause problems if the number of digests becomes quite large, due to implementation issues of filesystems. Essentially all filesystems are not designed to handle accesses of directories with many files well. If you look at how git, camlistore and many other such projects implement their blob storage it looks more like blobs/<algo>/<prefix>/<suffix> (or in camlistore's case, three sets of <prefix>/).

Naturally this would be a backwards incompatible change (you can't really implement this scheme as well as retaining the old one because then you have an exponential number of ways to read the same blob data, almost certainly leading to countless implementation bugs). So we should probably consider this for post-1.0.0.

The text was updated successfully, but these errors were encountered:

Signed-off-by: Aleksa Sarai <asarai@suse.com>

wking · 2016-11-06T04:17:58Z

On Sat, Nov 05, 2016 at 04:50:36PM -0700, Aleksa Sarai wrote:

… but some people have expressed that they would like to have the
entire universe bottled into an OCI image…

That may be me ;). I'd rather phrase this as “I'd like the whole
universe in one flat CAS namespace, with individual CAS engines biting
off as large a chunk of that universe as they like”. What I've tried
to supply in opencontainers/runtime-tools#5 is an API that works
regardless of the number of blobs in CAS.

Whether a particular implemenation of that API (e.g. image-layout)
scales to huge blob counts (clearly the tar-backed image-layout does
not) is a less important question. Folks will just use a different
ref/CAS engine when they have large stores. But ref/CAS consumers
shouldn't have to worry about that sort of implementation detail.

Namely, the current method of blobs/<algo>/<digest> will cause
problems if the number of digests becomes quite large, due to
implementation issues of filesystems. Essentially all filesystems
are not designed to handle accesses of directories with many files
well.

This has come up before in #94 and #208, with the bulk of the
discussion based on 1. The consensus (as I understood it) was that
we shouldn't worry about this for now because modern filesystems don't
mind and tar isn't going to care either way. Having stable, scalable
APIs buffers downstream consumers from any future CAS-storage
optimizations.

philips · 2016-11-16T21:22:22Z

Agreed this is a dupe of #208.

cyphar · 2016-11-17T05:30:12Z

@philips It's not a dupe of #208. #208 was about blobs/sha256/<the full digest> rather than blobs/sha256/<three byte>/<rest of digest> (which is what this is about). But I don't have strong opinions because I don't agree with @wking's wish to stuff everything into a single CAS.

jonboulle · 2016-11-21T11:49:53Z

@cyphar I guess in particular #208 (comment) challenges the premise of this issue

AkihiroSuda · 2017-02-13T07:50:50Z

This seems not a dupe of #208.

Even though pulling operation should never call readdir(), pushing may call readdir() depending on the distribution protocol and its implementation, and likely to result in poor performance.

Also, there can be 3rd party tools (e.g. malware scanner, back-up) that are not aware of OCI manifest and hence result in calling readdir().

Can we reconsider this issue?

AkihiroSuda · 2017-02-13T08:30:09Z

Since the layout of blobs/<algo> can be no longer changed, we might need to come up with some alternative layout.

Some my ideas and pros/cons:

blobs-sharded/<algo>/<prefix>/<digest>
Pro: Does not contaminate the existing blobs directory
Con: Maybe it is confusing to have two blobs directory? (blobs and blobs-sharded)
blobs/<algo>-sharded/<prefix>/<digest>
Pro: Single blobs directory
Con: sha256-sharded looks as if it is an algortithm, and can cause some implementation issue
blobs/<algo>/<prefix>/<digest> (identical to the original proposal)
Pro: Single blobs directory, no algorithm namespace contamination
Con: It can be 2X slower to scan the content of blobs/sha256, because the directory is likely to contain traditional blobs as well for compatibility

My preference is 1.

Also, we would need to define new field for the list of supported blob layouts in the oci-layout file. (or index.json maybe)

e.g.

{
    "imageLayoutVersion": "42.0.0"
    "supportedBlobLayouts": [ // if empty, "v1compat" is implicitly selected
        "v1compat",
	"sharded"
        // there can be other layouts that is specific to the distribution protocol? (e.g. "ipfs")
    ]
}

For folks who want to diverge as little as possible from things already in image-spec. Downsides to this approach include: * Non-sharded blobs [1], although it's not clear to me that modern filesystems suffer from having many entries in one directory [2]. * Possible duplicate blobs between two layouts. You can address this with symlinks or similar, but you'd need extra tooling to do that. With a single CAS bucket, there's only one place that the blob could be, so deduping is free (but garbage collection becomes more complicated). [1]: opencontainers/image-spec#449 [2]: opencontainers/image-spec#94 (comment)

Bump the layout to v1.1 to support this. This makes it possible to distribute layouts that use other protocols, for example new ref-engine protocols or a sharded blob store [1]. You can also reference external ref- and CAS-engines, although obviously the utility of such depends on the availability of those external engines. [1]: opencontainers#449 Signed-off-by: W. Trevor King <wking@tremily.us>

With this change, users can configure their blob storage once at init time with an optional --blob-uri. Most other commands (which do not need path -> blob conversion) can load the blob location from the oci-layout layout file (the 1.1.0 format is in flight with [1,2]). The only other user-facing change is to 'umoci gc', which gains a --digest-regexp. Folks who customized their blob URI will need to supply --digest-regexp to reverse whichever blob URI they're using. This seems like a more convenient interface to me than requiring all callers to provide the custom blob location [3]. And it is more powerful as well, allowing users to shard their blob storage [4], etc. if they feel moved to do so. [1]: xiekeyang/oci-discovery#20 [2]: https://github.com/wking/image-spec/blob/ref-engine-discovery-layout/image-layout.md [3]: https://github.com/openSUSE/umoci/pull/190 [4]: opencontainers/image-spec#449 Signed-off-by: W. Trevor King <wking@tremily.us>

cyphar referenced this issue in opencontainers/umoci Nov 6, 2016

image: cas: implement CAS engine

667271b

Signed-off-by: Aleksa Sarai <asarai@suse.com>

philips closed this as completed Nov 16, 2016

xiekeyang mentioned this issue Sep 1, 2017

Proposals of Policies of OCI Image Discovery xiekeyang/oci-discovery#1

Closed

wking mentioned this issue Sep 18, 2017

Include the layout spec if/when this becomes and OCI Project xiekeyang/oci-discovery#20

Open

wking mentioned this issue Oct 18, 2017

*: add ability to use a shared CAS directory opencontainers/umoci#190

Closed

wking mentioned this issue Nov 3, 2017

oci/cas/dir: Load blob URI from oci-layout opencontainers/umoci#214

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

layout: sharding the blob store #449

layout: sharding the blob store #449

cyphar commented Nov 5, 2016

wking commented Nov 6, 2016

philips commented Nov 16, 2016

cyphar commented Nov 17, 2016 •

edited

Loading

jonboulle commented Nov 21, 2016

AkihiroSuda commented Feb 13, 2017 •

edited

Loading

AkihiroSuda commented Feb 13, 2017

layout: sharding the blob store #449

layout: sharding the blob store #449

Comments

cyphar commented Nov 5, 2016

wking commented Nov 6, 2016

philips commented Nov 16, 2016

cyphar commented Nov 17, 2016 • edited Loading

jonboulle commented Nov 21, 2016

AkihiroSuda commented Feb 13, 2017 • edited Loading

AkihiroSuda commented Feb 13, 2017

cyphar commented Nov 17, 2016 •

edited

Loading

AkihiroSuda commented Feb 13, 2017 •

edited

Loading