Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: add ability to use a shared CAS directory #190

Closed
wants to merge 3 commits into from
Closed

*: add ability to use a shared CAS directory #190

wants to merge 3 commits into from

Conversation

jonboulle
Copy link
Contributor

umoci currently always expects blobs to live within the image layout
it's working on, in the aptly-named "blobs" subdirectory. However, the
OCI specification 1 permits this directory to be empty and for
implementations to look in other locations for referenced blobs.
This is particularly useful when working with multiple image layouts and
wanting to share blobs between them.

This patch adds a --shared-cas flag to all umoci commands that work
with image layouts. The flag expects a directory to be passed, and if
set, this directory will be used for all blob operations.

In the first implementation, we extend the dirEngine implementation of
the CAS interface to accept another (optional) sharedCasPath field.
Analogously to the CLI flag, if this field is supplied, it will be used
internally by the dirEngine for all blob-related operations.

Signed-off-by: Jonathan Boulle jonathanboulle@gmail.com

@jonboulle
Copy link
Contributor Author

@cyphar this is a bit of a POC or strawman, would love some feedback on whether you'd take a feature like this and how you'd expect to see it implemented (I can think of about a thousand different approaches so not sure which you'd prefer).

@wking
Copy link
Contributor

wking commented Oct 11, 2017 via email

@cyphar
Copy link
Member

cyphar commented Oct 12, 2017

@wking As I've mentioned before, my issue with casEngines is because of precisely the layer violation you are referring to here. I've rewritten significant portions of parcel's specification, by the way.

@wking
Copy link
Contributor

wking commented Oct 12, 2017 via email

@cyphar
Copy link
Member

cyphar commented Oct 12, 2017

@wking

I expect you're referencing [1]. Can you provide more details on the
issue you see (possibly in an oci-discovery issue, if you feel it's
too off-topic here)?

My issue is that it doesn't "smell" right for semantic information about this sort of "shared CAS" to be stored in descriptors. I'm a huge proponent of doing it top-down, distribution style. Effectively it boils down to that I'm not convinced that mis-using descriptors as a way of trying to do something more peer-to-peer is better in the long run than having a federated distribution system (similar to the existing distribution system for packages).

Is it the lack of blob-media-type routing you
added to parcel [2]?

No, it's the above, but you're right that the reason I added multiple backend support was with the intention of allowing combining blobs at a higher level (similar to what the PR does, but at the pull stage and not at the actual operational stage).

But all of this is quite off-topic. If you want we can discuss this in the parcel issue tracker (or oci-discovery), or email.

@cyphar
Copy link
Member

cyphar commented Oct 12, 2017

@jonboulle I'm not opposed to the idea (as we discussed recently), but I'll have to get back to you on which layer this should be done. Also, out of interest, did you find any additional issues with symlinking blobs other than umoci gc -- or is this purely a make-users-lives-easier thing?

@jonboulle
Copy link
Contributor Author

jonboulle commented Oct 12, 2017

@cyphar manual symlink munging works, but we definitely want to get out of that business if at all possible and punt whatever we can into the OCI tooling - especially if there's any chance that something like opencontainers/image-tools#40 might happen. (You might consider this my attempt to create precedent, with a clear use case and active implementation..)

I should mention that since our particular use doesn't involve particularly long-running systems (i.e. there's a certain level of ephemerality in the CAS usage) we're happy for the implementation to change across some releases if we need to iterate towards a converged solution

// FIXME: We can make this a list.
BlobAlgorithm = digest.SHA256
// DefaultBlobAlgorithm is the default supported digest algorithm for blobs
DefaultBlobAlgorithm = digest.SHA256
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecate in favor of go-digest's Canonical?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecate in favor of go-digest's Canonical?

Spun off into #197.

wking referenced this pull request in wking/umoci Oct 18, 2017
This commit hard-codes the blobs/{algorithm}/{encoded} template [1],
but sets the stage for future work to relax that positioning [2].

I'm adding a PutIndex call in the tests, becase the CAS implementation
now has its own temp directory which is not known to the dirEngine.
Casengine's dir implementation does not flock its temporary directory,
but it is protected from Clean by the pruneExpire logic.

The "Deprecated:" syntax is discussed in [3,4,5].

[1]: https://github.com/opencontainers/image-spec/blob/v1.0.0/image-layout.md#blobs
[2]: https://github.com/openSUSE/umoci/pull/190
[3]: https://blog.golang.org/godoc-documenting-go-code
[4]: golang/blog@257114a
[5]: golang/go#10909

Signed-off-by: W. Trevor King <wking@tremily.us>
@wking
Copy link
Contributor

wking commented Oct 18, 2017

With that approach, you could add casEngines entries to the oci-layout file [5] to point at your central CAS engine:

I've walked a bit down this path with wking/umoci@2908a4a. While I still think decoupling the CAS and ref engines is useful, I no longer think you can configure a digest-listing CAS engine with just a URI Template. URI Templates are not reversible in general (jtacoma/uritemplates#2), and even if they were, there are issues with reversing reference resolution. You need to reverse expansion to list stored digests (casengines, umoci). In casengine, I'm getting around that by requiring the caller to provide their own reverser (wking/casengine@76d4062). But since you don't need digest-listing support for most operations and because the base URI for relative expansion is unclear, including a reversing regexp pattern in the engine config seems like too large a hack. That means any CAS engines configured solely via casEngines in oci-layout would not be able to list digests.

However, I'm still not sure we need to go as far as bdd8bdf and pass down location information from every command that might interact with CAS. Operations like umoci unpack … which only need to pull blobs would be fine loading a template URI from oci-layout. I'll continue working in this direction, passing the reversing regexp down from the callers in cases that need reversing logic, although I'm obviously just providing suggestions for umoci and not in charge of what gets landed.

If loading the CAS URI Template or base directory from oci-layout is off the table, an alternative to mitigate the tedium of --shared-cas entries would be to set EnvVar when declaring --shared-cas. That would be convenient for umoci users, but unless the environment variable catches on it wouldn't help other tools trying to pull from layers that used an alternate blob location.

And while the dirEngine.sharedCasPath base-path approach is certainly simpler than using URI Templates, the base-path approach does not allow sharding (opencontainers/image-spec#449), while the URI Template approach does. I'm not sure if modern filesystems care all that much about sharding for directory listings, so maybe this particular point is not a strong argument either way.

@cyphar
Copy link
Member

cyphar commented Oct 19, 2017

@wking

URI Templates are not reversible in general (jtacoma/uritemplates#2), and even if they were, there are issues with reversing reference resolution. You need to reverse expansion to list stored digests (casengines, umoci).

I personally think that ListBlobs should be something that is optionally supported. In parcel, I envision that you might want to have some repos that have a list of the blobs somewhere (which could be given as a URI template in some extension) but that is not necessary for normal operations. We could make umoci gc no-op if ListBlobs gives nothing (which is not the end of the world).

Aside from ListBlobs, I still feel that a URI Template approach is much more flexible and allows you to specify stronger top-down policy decisions on a domain. However, parcel discussions are a bit out-of-place here. 😉 On that topic, parcel is something I'm going to be working on more in the next few weeks (but I have exams coming up, so that's taking up a lot of my time).

@wking
Copy link
Contributor

wking commented Oct 19, 2017 via email

@cyphar
Copy link
Member

cyphar commented Oct 26, 2017

Sorry for the delays. I just re-read the spec and you're completely right that this is supported (though I anticipate that the reason for this wording is to allow for the "external artifact" feature that ACI has). I also now see why @wking was talking so much about parcel and oci-discovery -- it's quite clear if you stare at the text for a while that remote blob fetching would be a valid application (though I think for distribution parcel solves the problem better).

I will re-review this pull request next week (please note that I have some assignments and exams coming up, so I'll have less time to work than usual, so apologies upfront for any delays).

wking referenced this pull request in wking/umoci Nov 3, 2017
This commit hard-codes the blobs/{algorithm}/{encoded} template [1],
but sets the stage for future work to relax that positioning [2].

I'm adding a PutIndex call in the tests, becase the CAS implementation
now has its own temp directory which is not known to the dirEngine.
Casengine's dir implementation does not use .umoci-* temporary
directories (it uses .casengine-* temporary directories), so it's
protected from Clean.  And the .casengine-* implementation does not
currently provide it's own Clean() implementation, although I may add
that in the future.

The "Deprecated:" syntax is discussed in [3,4,5].

Also adjust Close() to return the first error it encounters, but to
continue to optimistically attempt the remaining cleanup, logging any
subsequent errors.

[1]: https://github.com/opencontainers/image-spec/blob/v1.0.0/image-layout.md#blobs
[2]: https://github.com/openSUSE/umoci/pull/190
[3]: https://blog.golang.org/godoc-documenting-go-code
[4]: golang/blog@257114a
[5]: golang/go#10909

Signed-off-by: W. Trevor King <wking@tremily.us>
wking referenced this pull request in wking/umoci Nov 3, 2017
With this change, users can configure their blob storage once at init
time with an optional --blob-uri.  Most other commands (which do not
need path -> blob conversion) can load the blob location from the
oci-layout layout file (the 1.1.0 format is in flight with [1,2]).
The only other user-facing change is to 'umoci gc', which gains a
--digest-regexp.  Folks who customized their blob URI will need to
supply --digest-regexp to reverse whichever blob URI they're using.

This seems like a more convenient interface to me than requiring all
callers to provide the custom blob location [3].  And it is more
powerful as well, allowing users to shard their blob storage [4],
etc. if they feel moved to do so.

[1]: xiekeyang/oci-discovery#20
[2]: https://github.com/wking/image-spec/blob/ref-engine-discovery-layout/image-layout.md
[3]: https://github.com/openSUSE/umoci/pull/190
[4]: opencontainers/image-spec#449

Signed-off-by: W. Trevor King <wking@tremily.us>
wking referenced this pull request in wking/umoci Nov 4, 2017
This commit hard-codes the blobs/{algorithm}/{encoded} template [1],
but sets the stage for future work to relax that positioning [2].

I'm adding a PutIndex call in the tests, becase the CAS implementation
now has its own temp directory which is not known to the dirEngine.
Casengine's dir implementation does not use .umoci-* temporary
directories (it uses .casengine-* temporary directories), so it's
protected from Clean.  And the .casengine-* implementation does not
currently provide it's own Clean() implementation, although I may add
that in the future.

The "Deprecated:" syntax is discussed in [3,4,5].

Also adjust Close() to return the first error it encounters, but to
continue to optimistically attempt the remaining cleanup, logging any
subsequent errors.

Bumping go-mtree pulls in [6] and gives us a lowercase sirupsen import
that is compatible with oci-discovery and casengine.

[1]: https://github.com/opencontainers/image-spec/blob/v1.0.0/image-layout.md#blobs
[2]: https://github.com/openSUSE/umoci/pull/190
[3]: https://blog.golang.org/godoc-documenting-go-code
[4]: golang/blog@257114a
[5]: golang/go#10909
[6]: vbatts/go-mtree#144

Signed-off-by: W. Trevor King <wking@tremily.us>
wking referenced this pull request in wking/umoci Nov 4, 2017
With this change, users can configure their blob storage once at init
time with an optional --blob-uri.  Most other commands (which do not
need path -> blob conversion) can load the blob location from the
oci-layout layout file (the 1.1.0 format is in flight with [1,2]).
The only other user-facing change is to 'umoci gc', which gains a
--digest-regexp.  Folks who customized their blob URI will need to
supply --digest-regexp to reverse whichever blob URI they're using.

This seems like a more convenient interface to me than requiring all
callers to provide the custom blob location [3].  And it is more
powerful as well, allowing users to shard their blob storage [4],
etc. if they feel moved to do so.

[1]: xiekeyang/oci-discovery#20
[2]: https://github.com/wking/image-spec/blob/ref-engine-discovery-layout/image-layout.md
[3]: https://github.com/openSUSE/umoci/pull/190
[4]: opencontainers/image-spec#449

Signed-off-by: W. Trevor King <wking@tremily.us>
wking referenced this pull request in wking/umoci Nov 4, 2017
With this change, users can configure their blob storage once at init
time with an optional --blob-uri.  Most other commands (which do not
need path -> blob conversion) can load the blob location from the
oci-layout layout file (the 1.1.0 format is in flight with [1,2]).
The only other user-facing change is to 'umoci gc', which gains a
--digest-regexp.  Folks who customized their blob URI will need to
supply --digest-regexp to reverse whichever blob URI they're using.

This seems like a more convenient interface to me than requiring all
callers to provide the custom blob location [3].  And it is more
powerful as well, allowing users to shard their blob storage [4],
etc. if they feel moved to do so.

[1]: xiekeyang/oci-discovery#20
[2]: https://github.com/wking/image-spec/blob/ref-engine-discovery-layout/image-layout.md
[3]: https://github.com/openSUSE/umoci/pull/190
[4]: opencontainers/image-spec#449

Signed-off-by: W. Trevor King <wking@tremily.us>
wking referenced this pull request in wking/umoci Nov 4, 2017
With this change, users can configure their blob storage once at init
time with an optional --blob-uri.  Most other commands (which do not
need path -> blob conversion) can load the blob location from the
oci-layout layout file (the 1.1.0 format is in flight with [1,2]).
The only other user-facing change is to 'umoci gc', which gains a
--digest-regexp.  Folks who customized their blob URI will need to
supply --digest-regexp to reverse whichever blob URI they're using.

This seems like a more convenient interface to me than requiring all
callers to provide the custom blob location [3].  And it is more
powerful as well, allowing users to shard their blob storage [4],
etc. if they feel moved to do so.

[1]: xiekeyang/oci-discovery#20
[2]: https://github.com/wking/image-spec/blob/ref-engine-discovery-layout/image-layout.md
[3]: https://github.com/openSUSE/umoci/pull/190
[4]: opencontainers/image-spec#449

Signed-off-by: W. Trevor King <wking@tremily.us>
wking referenced this pull request in wking/umoci Nov 4, 2017
With this change, users can configure their blob storage once at init
time with an optional --blob-uri.  Most other commands (which do not
need path -> blob conversion) can load the blob location from the
oci-layout layout file (the 1.1.0 format is in flight with [1,2]).
The only other user-facing change is to 'umoci gc', which gains a
--digest-regexp.  Folks who customized their blob URI will need to
supply --digest-regexp to reverse whichever blob URI they're using.

This seems like a more convenient interface to me than requiring all
callers to provide the custom blob location [3].  And it is more
powerful as well, allowing users to shard their blob storage [4],
etc. if they feel moved to do so.

[1]: xiekeyang/oci-discovery#20
[2]: https://github.com/wking/image-spec/blob/ref-engine-discovery-layout/image-layout.md
[3]: https://github.com/openSUSE/umoci/pull/190
[4]: opencontainers/image-spec#449

Signed-off-by: W. Trevor King <wking@tremily.us>
umoci currently always expects blobs to live within the image layout
it's working on, in the aptly-named "blobs" subdirectory. However, the
OCI specification [1][1] permits this directory to be empty and for
implementations to look in other locations for referenced blobs.
This is particularly useful when working with multiple image layouts and
wanting to share blobs between them.

This patch adds a `--shared-cas` flag to all umoci commands that work
with image layouts. The flag expects a directory to be passed, and if
set, this directory will be used for all blob operations.

In the first implementation, we extend the `dirEngine` implementation of
the CAS interface to accept another (optional) `sharedCasPath` field.
Analogously to the CLI flag, if this field is supplied, it will be used
internally by the dirEngine for all blob-related operations.

[1]: https://github.com/opencontainers/image-spec/blob/7c889fafd04a893f5c5f50b7ab9963d5d64e5242/image-layout.md#blobs

Signed-off-by: Jonathan Boulle <jonathanboulle@gmail.com>
Signed-off-by: Jonathan Boulle <jonathanboulle@gmail.com>
Signed-off-by: Jonathan Boulle <jonathanboulle@gmail.com>
@cyphar cyphar added the oci-spec Issue directly related to OCI image-spec. label Mar 22, 2018
@cyphar
Copy link
Member

cyphar commented Jun 7, 2020

This is fairly stale by now, closing. I'd be happy to re-discuss this issue again if it's still an issue you're having with runc.

@cyphar cyphar closed this Jun 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm/need 2 oci-spec Issue directly related to OCI image-spec.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants