Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

layout: sharding the blob store #449

Closed
cyphar opened this issue Nov 5, 2016 · 6 comments
Closed

layout: sharding the blob store #449

cyphar opened this issue Nov 5, 2016 · 6 comments

Comments

@cyphar
Copy link
Member

cyphar commented Nov 5, 2016

One issue that I'm quite worried about is the performance impact of having too many blobs inside an OCI image. Now, practically speaking I would be surprised if n > 20 in most cases, but some people have expressed that they would like to have the entire universe bottled into an OCI image. I will refrain from commenting on how good of an idea I think that is, but if it's going to be a "valid usecase" then we should reconsider how we've organised the blob directory.

Namely, the current method of blobs/<algo>/<digest> will cause problems if the number of digests becomes quite large, due to implementation issues of filesystems. Essentially all filesystems are not designed to handle accesses of directories with many files well. If you look at how git, camlistore and many other such projects implement their blob storage it looks more like blobs/<algo>/<prefix>/<suffix> (or in camlistore's case, three sets of <prefix>/).

Naturally this would be a backwards incompatible change (you can't really implement this scheme as well as retaining the old one because then you have an exponential number of ways to read the same blob data, almost certainly leading to countless implementation bugs). So we should probably consider this for post-1.0.0.

cyphar referenced this issue in opencontainers/umoci Nov 6, 2016
Signed-off-by: Aleksa Sarai <asarai@suse.com>
@wking
Copy link
Contributor

wking commented Nov 6, 2016

On Sat, Nov 05, 2016 at 04:50:36PM -0700, Aleksa Sarai wrote:

… but some people have expressed that they would like to have the
entire universe bottled into an OCI image…

That may be me ;). I'd rather phrase this as “I'd like the whole
universe in one flat CAS namespace, with individual CAS engines biting
off as large a chunk of that universe as they like”. What I've tried
to supply in opencontainers/runtime-tools#5 is an API that works
regardless of the number of blobs in CAS.

Whether a particular implemenation of that API (e.g. image-layout)
scales to huge blob counts (clearly the tar-backed image-layout does
not) is a less important question. Folks will just use a different
ref/CAS engine when they have large stores. But ref/CAS consumers
shouldn't have to worry about that sort of implementation detail.

Namely, the current method of blobs/<algo>/<digest> will cause
problems if the number of digests becomes quite large, due to
implementation issues of filesystems. Essentially all filesystems
are not designed to handle accesses of directories with many files
well.

This has come up before in #94 and #208, with the bulk of the
discussion based on 1. The consensus (as I understood it) was that
we shouldn't worry about this for now because modern filesystems don't
mind and tar isn't going to care either way. Having stable, scalable
APIs buffers downstream consumers from any future CAS-storage
optimizations.

@philips
Copy link
Contributor

philips commented Nov 16, 2016

Agreed this is a dupe of #208.

@philips philips closed this as completed Nov 16, 2016
@cyphar
Copy link
Member Author

cyphar commented Nov 17, 2016

@philips It's not a dupe of #208. #208 was about blobs/sha256/<the full digest> rather than blobs/sha256/<three byte>/<rest of digest> (which is what this is about). But I don't have strong opinions because I don't agree with @wking's wish to stuff everything into a single CAS.

@jonboulle
Copy link
Contributor

@cyphar I guess in particular #208 (comment) challenges the premise of this issue

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Feb 13, 2017

This seems not a dupe of #208.

Even though pulling operation should never call readdir(), pushing may call readdir() depending on the distribution protocol and its implementation, and likely to result in poor performance.

Also, there can be 3rd party tools (e.g. malware scanner, back-up) that are not aware of OCI manifest and hence result in calling readdir().

Can we reconsider this issue?

@AkihiroSuda
Copy link
Member

Since the layout of blobs/<algo> can be no longer changed, we might need to come up with some alternative layout.

Some my ideas and pros/cons:

  1. blobs-sharded/<algo>/<prefix>/<digest>
    Pro: Does not contaminate the existing blobs directory
    Con: Maybe it is confusing to have two blobs directory? (blobs and blobs-sharded)

  2. blobs/<algo>-sharded/<prefix>/<digest>
    Pro: Single blobs directory
    Con: sha256-sharded looks as if it is an algortithm, and can cause some implementation issue

  3. blobs/<algo>/<prefix>/<digest> (identical to the original proposal)
    Pro: Single blobs directory, no algorithm namespace contamination
    Con: It can be 2X slower to scan the content of blobs/sha256, because the directory is likely to contain traditional blobs as well for compatibility

My preference is 1.

Also, we would need to define new field for the list of supported blob layouts in the oci-layout file. (or index.json maybe)

e.g.

{
    "imageLayoutVersion": "42.0.0"
    "supportedBlobLayouts": [ // if empty, "v1compat" is implicitly selected
        "v1compat",
	"sharded"
        // there can be other layouts that is specific to the distribution protocol? (e.g. "ipfs")
    ]
}

wking added a commit to wking/oci-discovery that referenced this issue Sep 7, 2017
For folks who want to diverge as little as possible from things
already in image-spec.  Downsides to this approach include:

* Non-sharded blobs [1], although it's not clear to me that modern
  filesystems suffer from having many entries in one directory [2].

* Possible duplicate blobs between two layouts.  You can address this
  with symlinks or similar, but you'd need extra tooling to do that.
  With a single CAS bucket, there's only one place that the blob could
  be, so deduping is free (but garbage collection becomes more
  complicated).

[1]: opencontainers/image-spec#449
[2]: opencontainers/image-spec#94 (comment)
wking added a commit to wking/oci-discovery that referenced this issue Sep 7, 2017
For folks who want to diverge as little as possible from things
already in image-spec.  Downsides to this approach include:

* Non-sharded blobs [1], although it's not clear to me that modern
  filesystems suffer from having many entries in one directory [2].

* Possible duplicate blobs between two layouts.  You can address this
  with symlinks or similar, but you'd need extra tooling to do that.
  With a single CAS bucket, there's only one place that the blob could
  be, so deduping is free (but garbage collection becomes more
  complicated).

[1]: opencontainers/image-spec#449
[2]: opencontainers/image-spec#94 (comment)
wking added a commit to wking/oci-discovery that referenced this issue Sep 7, 2017
For folks who want to diverge as little as possible from things
already in image-spec.  Downsides to this approach include:

* Non-sharded blobs [1], although it's not clear to me that modern
  filesystems suffer from having many entries in one directory [2].

* Possible duplicate blobs between two layouts.  You can address this
  with symlinks or similar, but you'd need extra tooling to do that.
  With a single CAS bucket, there's only one place that the blob could
  be, so deduping is free (but garbage collection becomes more
  complicated).

[1]: opencontainers/image-spec#449
[2]: opencontainers/image-spec#94 (comment)
wking added a commit to wking/oci-discovery that referenced this issue Sep 7, 2017
For folks who want to diverge as little as possible from things
already in image-spec.  Downsides to this approach include:

* Non-sharded blobs [1], although it's not clear to me that modern
  filesystems suffer from having many entries in one directory [2].

* Possible duplicate blobs between two layouts.  You can address this
  with symlinks or similar, but you'd need extra tooling to do that.
  With a single CAS bucket, there's only one place that the blob could
  be, so deduping is free (but garbage collection becomes more
  complicated).

[1]: opencontainers/image-spec#449
[2]: opencontainers/image-spec#94 (comment)
wking added a commit to wking/oci-discovery that referenced this issue Sep 7, 2017
For folks who want to diverge as little as possible from things
already in image-spec.  Downsides to this approach include:

* Non-sharded blobs [1], although it's not clear to me that modern
  filesystems suffer from having many entries in one directory [2].

* Possible duplicate blobs between two layouts.  You can address this
  with symlinks or similar, but you'd need extra tooling to do that.
  With a single CAS bucket, there's only one place that the blob could
  be, so deduping is free (but garbage collection becomes more
  complicated).

[1]: opencontainers/image-spec#449
[2]: opencontainers/image-spec#94 (comment)
wking added a commit to wking/oci-discovery that referenced this issue Sep 8, 2017
For folks who want to diverge as little as possible from things
already in image-spec.  Downsides to this approach include:

* Non-sharded blobs [1], although it's not clear to me that modern
  filesystems suffer from having many entries in one directory [2].

* Possible duplicate blobs between two layouts.  You can address this
  with symlinks or similar, but you'd need extra tooling to do that.
  With a single CAS bucket, there's only one place that the blob could
  be, so deduping is free (but garbage collection becomes more
  complicated).

[1]: opencontainers/image-spec#449
[2]: opencontainers/image-spec#94 (comment)
wking added a commit to wking/oci-discovery that referenced this issue Sep 8, 2017
For folks who want to diverge as little as possible from things
already in image-spec.  Downsides to this approach include:

* Non-sharded blobs [1], although it's not clear to me that modern
  filesystems suffer from having many entries in one directory [2].

* Possible duplicate blobs between two layouts.  You can address this
  with symlinks or similar, but you'd need extra tooling to do that.
  With a single CAS bucket, there's only one place that the blob could
  be, so deduping is free (but garbage collection becomes more
  complicated).

[1]: opencontainers/image-spec#449
[2]: opencontainers/image-spec#94 (comment)
wking added a commit to wking/oci-discovery that referenced this issue Sep 8, 2017
For folks who want to diverge as little as possible from things
already in image-spec.  Downsides to this approach include:

* Non-sharded blobs [1], although it's not clear to me that modern
  filesystems suffer from having many entries in one directory [2].

* Possible duplicate blobs between two layouts.  You can address this
  with symlinks or similar, but you'd need extra tooling to do that.
  With a single CAS bucket, there's only one place that the blob could
  be, so deduping is free (but garbage collection becomes more
  complicated).

[1]: opencontainers/image-spec#449
[2]: opencontainers/image-spec#94 (comment)
wking added a commit to wking/image-spec that referenced this issue Sep 18, 2017
Bump the layout to v1.1 to support this.  This makes it possible to
distribute layouts that use other protocols, for example new
ref-engine protocols or a sharded blob store [1].  You can also
reference external ref- and CAS-engines, although obviously the
utility of such depends on the availability of those external engines.

[1]: opencontainers#449

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/umoci that referenced this issue Nov 3, 2017
With this change, users can configure their blob storage once at init
time with an optional --blob-uri.  Most other commands (which do not
need path -> blob conversion) can load the blob location from the
oci-layout layout file (the 1.1.0 format is in flight with [1,2]).
The only other user-facing change is to 'umoci gc', which gains a
--digest-regexp.  Folks who customized their blob URI will need to
supply --digest-regexp to reverse whichever blob URI they're using.

This seems like a more convenient interface to me than requiring all
callers to provide the custom blob location [3].  And it is more
powerful as well, allowing users to shard their blob storage [4],
etc. if they feel moved to do so.

[1]: xiekeyang/oci-discovery#20
[2]: https://github.com/wking/image-spec/blob/ref-engine-discovery-layout/image-layout.md
[3]: https://github.com/openSUSE/umoci/pull/190
[4]: opencontainers/image-spec#449

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/umoci that referenced this issue Nov 4, 2017
With this change, users can configure their blob storage once at init
time with an optional --blob-uri.  Most other commands (which do not
need path -> blob conversion) can load the blob location from the
oci-layout layout file (the 1.1.0 format is in flight with [1,2]).
The only other user-facing change is to 'umoci gc', which gains a
--digest-regexp.  Folks who customized their blob URI will need to
supply --digest-regexp to reverse whichever blob URI they're using.

This seems like a more convenient interface to me than requiring all
callers to provide the custom blob location [3].  And it is more
powerful as well, allowing users to shard their blob storage [4],
etc. if they feel moved to do so.

[1]: xiekeyang/oci-discovery#20
[2]: https://github.com/wking/image-spec/blob/ref-engine-discovery-layout/image-layout.md
[3]: https://github.com/openSUSE/umoci/pull/190
[4]: opencontainers/image-spec#449

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/umoci that referenced this issue Nov 4, 2017
With this change, users can configure their blob storage once at init
time with an optional --blob-uri.  Most other commands (which do not
need path -> blob conversion) can load the blob location from the
oci-layout layout file (the 1.1.0 format is in flight with [1,2]).
The only other user-facing change is to 'umoci gc', which gains a
--digest-regexp.  Folks who customized their blob URI will need to
supply --digest-regexp to reverse whichever blob URI they're using.

This seems like a more convenient interface to me than requiring all
callers to provide the custom blob location [3].  And it is more
powerful as well, allowing users to shard their blob storage [4],
etc. if they feel moved to do so.

[1]: xiekeyang/oci-discovery#20
[2]: https://github.com/wking/image-spec/blob/ref-engine-discovery-layout/image-layout.md
[3]: https://github.com/openSUSE/umoci/pull/190
[4]: opencontainers/image-spec#449

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/umoci that referenced this issue Nov 4, 2017
With this change, users can configure their blob storage once at init
time with an optional --blob-uri.  Most other commands (which do not
need path -> blob conversion) can load the blob location from the
oci-layout layout file (the 1.1.0 format is in flight with [1,2]).
The only other user-facing change is to 'umoci gc', which gains a
--digest-regexp.  Folks who customized their blob URI will need to
supply --digest-regexp to reverse whichever blob URI they're using.

This seems like a more convenient interface to me than requiring all
callers to provide the custom blob location [3].  And it is more
powerful as well, allowing users to shard their blob storage [4],
etc. if they feel moved to do so.

[1]: xiekeyang/oci-discovery#20
[2]: https://github.com/wking/image-spec/blob/ref-engine-discovery-layout/image-layout.md
[3]: https://github.com/openSUSE/umoci/pull/190
[4]: opencontainers/image-spec#449

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/umoci that referenced this issue Nov 4, 2017
With this change, users can configure their blob storage once at init
time with an optional --blob-uri.  Most other commands (which do not
need path -> blob conversion) can load the blob location from the
oci-layout layout file (the 1.1.0 format is in flight with [1,2]).
The only other user-facing change is to 'umoci gc', which gains a
--digest-regexp.  Folks who customized their blob URI will need to
supply --digest-regexp to reverse whichever blob URI they're using.

This seems like a more convenient interface to me than requiring all
callers to provide the custom blob location [3].  And it is more
powerful as well, allowing users to shard their blob storage [4],
etc. if they feel moved to do so.

[1]: xiekeyang/oci-discovery#20
[2]: https://github.com/wking/image-spec/blob/ref-engine-discovery-layout/image-layout.md
[3]: https://github.com/openSUSE/umoci/pull/190
[4]: opencontainers/image-spec#449

Signed-off-by: W. Trevor King <wking@tremily.us>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants