Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add oci-archive-uncompressed-fd:5 #1209

Closed
cgwalters opened this issue Apr 23, 2021 · 8 comments
Closed

Add oci-archive-uncompressed-fd:5 #1209

cgwalters opened this issue Apr 23, 2021 · 8 comments

Comments

@cgwalters
Copy link
Contributor

See ostreedev/ostree-rs-ext#15

Right now it looks like oci-archive:// creates a temporary oci directory, and then re-tars it up, which is really inefficient.

In order to use containers/image to write to something that's not containers/storage (in my case ostree, but there are people putting raw disk images in container images for host updates too, etc.) it'd be nice to have support for streaming writes to e.g. a pipe.
Probably while we're here we should add oci-archive-fd://5 to output to file descriptor 5 instead of going via oci-archive:///proc/self/fd/5.

Or maybe going farther, add oci-archive-streaming-fd://5 which explicitly requires the caller to verify any layer digests, etc.

@cgwalters
Copy link
Contributor Author

Prior work in containers/podman#10075 that exposed a raw API for this, but it would be nice for c/i to handle things like converting to OCI format and manifest lists, etc.

@cgwalters
Copy link
Contributor Author

Here's a completely different idea: Maybe tools that want to do this type of stuff could expose an oci distribution endpoint, then skopeo copy docker://quay.io/exampleos/exampleos@sha256:a9ba215002080f8c7a699cd05b3d1ffb5447eba1b4574c43b3ce4fe2885f3046 docker://127.0.0.1:9001/exampleos/exampleos would work.

The control flow here is funky because it'd be: app ➡️ skopeo ➡️ app, but it seems manageable.

Ideally we avoid the need for local TCP and can pass down a pipe or socketpair, so this would then look like skopeo copy docker://quay.io/exampleos/exampleos@sha256:a9ba215002080f8c7a699cd05b3d1ffb5447eba1b4574c43b3ce4fe2885f3046 registry-pipe-fd://5 or so.

@mtrmac
Copy link
Collaborator

mtrmac commented Apr 27, 2021

In order to use containers/image to write to something that's not containers/storage (in my case ostree

(Note that there already is a native OSTree transport. It might well not fit your needs as is — but using a native c/image transport should be considered as a design option, instead of using an intermediate format. OTOH that might longer-term mean maintaining it inside the c/image repo — right now transports are fully pluggable via types.ImageReference and the like, but we are moving a bit away from that, especially for destinations, so that we have a way to add more features without breaking the public API.)


At a first glance, it should be quite possible to implement the OCI archive creation as a stream producer without an intermediate on-disk stage; we do that for docker-archive: already. (Compare #1072 , which I’ve been asking to make more similar to the docker-archive: implementation for other reasons.)

Moving to a streaming model would be somewhat disruptive to the oci-layout: transport implementation, which actually does the format-specific bits, but with the new io/fs.FS and the like it should be easy enough, just a bit of work.


Ideally we avoid the need for local TCP and can pass down a pipe or socketpair

Just to make sure this is considered, note that a buffering step needs to happen at least once in the current model, because the copy pipeline (and therefore the data going over a pipe) always sends the layer blobs first, and the manifests and other metadata later — while most consumers need to consume data in the opposite order. For very limited cases it might be possible to kludge around this (e.g. heuristically decide whether an incoming blob is a layer / config , and stream the layers directly to the destination in some representation that doesn’t require knowing the parent/child layer relationships), but making this work reliably and efficiently, e.g. multi-arch images, might be rather tricky.

Of course one unavoidable buffering step is not a reason not to try to avoid a second, probably entirely avoidable, buffering step.

(It’s not also obvious to me that a special one-off “multiplex data over a pipe” transport is easier to implement/maintain/debug than running a temporary HTTP server on localhost speaking the docker/distribution protocol, but we can’t know for sure before we do that work…)

@cgwalters
Copy link
Contributor Author

Note that there already is a native OSTree transport. It might well not fit your needs as is — but using a native c/image transport should be considered as a design option, instead of using an intermediate form

Right, it's almost the inverse of what I want though, that transport is trying to store a container image in the ostree data store, whereas I want to encapsulate an ostree commit into a container image, and have something else be able to unpack it. I think we should remove the ostree c/storage backend ultimately; last I heard that was blocked on some people using it still due to the deduplication, but I think it's not the right long term approach.

More broadly too, having c/image write to c/storage puts the process combining those two (e.g. skopeo) in a position of total privilege over the host filesystem, whereas I want to move more towards privilege separation. Admittedly, ostree today exactly combines fetching with writing, but I'm trying to avoid replicating that mistake here. The other argument for avoiding c/storage here is that what we're storing is not a container, it's the host OS; and I don't want to scope in host OS management into c/storage (or c/image).

The easiest way to understand this is to forget ostree exists for a second and imagine that we wanted to encapsulate a set of rpms/debs/etc in a container image to pass to yum/apt update.

At a first glance, it should be quite possible to implement the OCI archive creation as a stream producer without an intermediate on-disk stage; we do that for docker-archive: already. (Compare #1072 , which I’ve been asking to make more similar to the docker-archive: implementation for other reasons.)

Ah, I hadn't realized that. Will take a look at that code. Hmm, well I'd wanted my code to just deal with OCI, but eh if I can just use docker-archive: today I may just do that and circle back to oci-archive: later.

Just to make sure this is considered, note that a buffering step needs to happen at least once in the current model, because the copy pipeline (and therefore the data going over a pipe) always sends the layer blobs first, and the manifests and other metadata later — while most consumers need to consume data in the opposite order.

Why is that? Why not have a copy send the manifest first?

@mtrmac
Copy link
Collaborator

mtrmac commented Apr 28, 2021

Will take a look at that code. Hmm, well I'd wanted my code to just deal with OCI, but eh if I can just use docker-archive: today I may just do that and circle back to oci-archive: later.

Major downsides of docker-archive:

  • Forces conversion to a schema2-like format
  • Forces layer decompression
  • No multi-arch image support
  • No signature support

Just to make sure this is considered, note that a buffering step needs to happen at least once in the current model, because the copy pipeline (and therefore the data going over a pipe) always sends the layer blobs first, and the manifests and other metadata later — while most consumers need to consume data in the opposite order.

Why is that? Why not have a copy send the manifest first?

In a general case, the copy may have to modify the layers (to reuse already-present differently-compressed variants, or to compress/decompress/recompress/encrypt), i.e. the manifest contents are unknown until the layers exist; and the manifest might need format conversion (which is determined by trial and error to see what the registry doesn’t reject, and that can again only happen if the layers already exist on the registry).

Yes, there are cases where a byte-for-byte copy is exactly what is desired, but a generally usable transport must be ready to accept layers before metadata in order to support the more general copies.

(If it matters a lot, with ImageSource/ImageDestination it is possible to write a custom copy top-level without the conversion support that just shuffles data around — OTOH at least the c-storage-related parts of ImageDestination are tending to become private; so an external copy top-level would only work well for the fairy dumb transports like docker:// or oci*:. Or maybe the top-level doesn’t even need to be a “copy” — use c/image/docker as a registry client to read very-special-cased images/artifacts that are known to contain exactly one layer of specific format.)

@cgwalters
Copy link
Contributor Author

OK cool! With the named pipe hack and docker-archive: I got working streaming here: ostreedev/ostree-rs-ext#22 and it's as expected much faster and better.

And looking at this, I think we don't want oci-archive:// to be streaming actually - the fact that docker-archive: has uncompressed layers is actually what I want because then I don't need to care about gzip vs zstd etc. in my code.

So retitling this and here's the desired improvements:

  • OCI-like layout as a streamed tarball
  • metadata first
  • But do decompress for me, or at least don't decompress and recompress ever
  • Write to a provided file descriptor and not a file path (and skip containers-policy.json stuff? But I do want to support policies for the source path ideally, will dig into this more later. For now I'm just going to use ostree's GPG/ed25519 bits and not the container one)

@cgwalters cgwalters changed the title Change oci-archive:// to be actually streaming Add oci-archive-uncompressed-fd:5 Apr 28, 2021
@cgwalters
Copy link
Contributor Author

Yes, there are cases where a byte-for-byte copy is exactly what is desired, but a generally usable transport must be ready to accept layers before metadata in order to support the more general copies.

Conceptually, the goal here isn't a "copy" - it's using container images as a wrapper. That said, it would be nice to support preserving the wrapper data eventually but it's definitely not critical path.

But maybe we just want backends to expose a property bool SendMetadataFirst() and oci-archive-streaming-uncompressed-fd: returns true for that.

@cgwalters
Copy link
Contributor Author

I think in the end this will be a skopeo API probably.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants