Add oci-archive-uncompressed-fd:5 #1209

cgwalters · 2021-04-23T12:02:55Z

Right now it looks like oci-archive:// creates a temporary oci directory, and then re-tars it up, which is really inefficient.

In order to use containers/image to write to something that's not containers/storage (in my case ostree, but there are people putting raw disk images in container images for host updates too, etc.) it'd be nice to have support for streaming writes to e.g. a pipe.
Probably while we're here we should add oci-archive-fd://5 to output to file descriptor 5 instead of going via oci-archive:///proc/self/fd/5.

Or maybe going farther, add oci-archive-streaming-fd://5 which explicitly requires the caller to verify any layer digests, etc.

The text was updated successfully, but these errors were encountered:

cgwalters · 2021-04-23T12:06:04Z

Prior work in containers/podman#10075 that exposed a raw API for this, but it would be nice for c/i to handle things like converting to OCI format and manifest lists, etc.

cgwalters · 2021-04-26T16:38:59Z

Here's a completely different idea: Maybe tools that want to do this type of stuff could expose an oci distribution endpoint, then skopeo copy docker://quay.io/exampleos/exampleos@sha256:a9ba215002080f8c7a699cd05b3d1ffb5447eba1b4574c43b3ce4fe2885f3046 docker://127.0.0.1:9001/exampleos/exampleos would work.

The control flow here is funky because it'd be: app ➡️ skopeo ➡️ app, but it seems manageable.

Ideally we avoid the need for local TCP and can pass down a pipe or socketpair, so this would then look like skopeo copy docker://quay.io/exampleos/exampleos@sha256:a9ba215002080f8c7a699cd05b3d1ffb5447eba1b4574c43b3ce4fe2885f3046 registry-pipe-fd://5 or so.

mtrmac · 2021-04-27T15:06:55Z

In order to use containers/image to write to something that's not containers/storage (in my case ostree

(Note that there already is a native OSTree transport. It might well not fit your needs as is — but using a native c/image transport should be considered as a design option, instead of using an intermediate format. OTOH that might longer-term mean maintaining it inside the c/image repo — right now transports are fully pluggable via types.ImageReference and the like, but we are moving a bit away from that, especially for destinations, so that we have a way to add more features without breaking the public API.)

At a first glance, it should be quite possible to implement the OCI archive creation as a stream producer without an intermediate on-disk stage; we do that for docker-archive: already. (Compare #1072 , which I’ve been asking to make more similar to the docker-archive: implementation for other reasons.)

Moving to a streaming model would be somewhat disruptive to the oci-layout: transport implementation, which actually does the format-specific bits, but with the new io/fs.FS and the like it should be easy enough, just a bit of work.

Ideally we avoid the need for local TCP and can pass down a pipe or socketpair

Just to make sure this is considered, note that a buffering step needs to happen at least once in the current model, because the copy pipeline (and therefore the data going over a pipe) always sends the layer blobs first, and the manifests and other metadata later — while most consumers need to consume data in the opposite order. For very limited cases it might be possible to kludge around this (e.g. heuristically decide whether an incoming blob is a layer / config , and stream the layers directly to the destination in some representation that doesn’t require knowing the parent/child layer relationships), but making this work reliably and efficiently, e.g. multi-arch images, might be rather tricky.

Of course one unavoidable buffering step is not a reason not to try to avoid a second, probably entirely avoidable, buffering step.

(It’s not also obvious to me that a special one-off “multiplex data over a pipe” transport is easier to implement/maintain/debug than running a temporary HTTP server on localhost speaking the docker/distribution protocol, but we can’t know for sure before we do that work…)

cgwalters · 2021-04-27T17:05:54Z

Note that there already is a native OSTree transport. It might well not fit your needs as is — but using a native c/image transport should be considered as a design option, instead of using an intermediate form

Right, it's almost the inverse of what I want though, that transport is trying to store a container image in the ostree data store, whereas I want to encapsulate an ostree commit into a container image, and have something else be able to unpack it. I think we should remove the ostree c/storage backend ultimately; last I heard that was blocked on some people using it still due to the deduplication, but I think it's not the right long term approach.

More broadly too, having c/image write to c/storage puts the process combining those two (e.g. skopeo) in a position of total privilege over the host filesystem, whereas I want to move more towards privilege separation. Admittedly, ostree today exactly combines fetching with writing, but I'm trying to avoid replicating that mistake here. The other argument for avoiding c/storage here is that what we're storing is not a container, it's the host OS; and I don't want to scope in host OS management into c/storage (or c/image).

The easiest way to understand this is to forget ostree exists for a second and imagine that we wanted to encapsulate a set of rpms/debs/etc in a container image to pass to yum/apt update.

At a first glance, it should be quite possible to implement the OCI archive creation as a stream producer without an intermediate on-disk stage; we do that for docker-archive: already. (Compare #1072 , which I’ve been asking to make more similar to the docker-archive: implementation for other reasons.)

Ah, I hadn't realized that. Will take a look at that code. Hmm, well I'd wanted my code to just deal with OCI, but eh if I can just use docker-archive: today I may just do that and circle back to oci-archive: later.

Just to make sure this is considered, note that a buffering step needs to happen at least once in the current model, because the copy pipeline (and therefore the data going over a pipe) always sends the layer blobs first, and the manifests and other metadata later — while most consumers need to consume data in the opposite order.

Why is that? Why not have a copy send the manifest first?

mtrmac · 2021-04-28T15:39:50Z

Will take a look at that code. Hmm, well I'd wanted my code to just deal with OCI, but eh if I can just use docker-archive: today I may just do that and circle back to oci-archive: later.

Major downsides of docker-archive:

Forces conversion to a schema2-like format
Forces layer decompression
No multi-arch image support
No signature support

Just to make sure this is considered, note that a buffering step needs to happen at least once in the current model, because the copy pipeline (and therefore the data going over a pipe) always sends the layer blobs first, and the manifests and other metadata later — while most consumers need to consume data in the opposite order.

Why is that? Why not have a copy send the manifest first?

In a general case, the copy may have to modify the layers (to reuse already-present differently-compressed variants, or to compress/decompress/recompress/encrypt), i.e. the manifest contents are unknown until the layers exist; and the manifest might need format conversion (which is determined by trial and error to see what the registry doesn’t reject, and that can again only happen if the layers already exist on the registry).

Yes, there are cases where a byte-for-byte copy is exactly what is desired, but a generally usable transport must be ready to accept layers before metadata in order to support the more general copies.

(If it matters a lot, with ImageSource/ImageDestination it is possible to write a custom copy top-level without the conversion support that just shuffles data around — OTOH at least the c-storage-related parts of ImageDestination are tending to become private; so an external copy top-level would only work well for the fairy dumb transports like docker:// or oci*:. Or maybe the top-level doesn’t even need to be a “copy” — use c/image/docker as a registry client to read very-special-cased images/artifacts that are known to contain exactly one layer of specific format.)

cgwalters · 2021-04-28T16:25:17Z

OK cool! With the named pipe hack and docker-archive: I got working streaming here: ostreedev/ostree-rs-ext#22 and it's as expected much faster and better.

And looking at this, I think we don't want oci-archive:// to be streaming actually - the fact that docker-archive: has uncompressed layers is actually what I want because then I don't need to care about gzip vs zstd etc. in my code.

So retitling this and here's the desired improvements:

OCI-like layout as a streamed tarball
metadata first
But do decompress for me, or at least don't decompress and recompress ever
Write to a provided file descriptor and not a file path (and skip containers-policy.json stuff? But I do want to support policies for the source path ideally, will dig into this more later. For now I'm just going to use ostree's GPG/ed25519 bits and not the container one)

cgwalters · 2021-04-28T17:10:49Z

Yes, there are cases where a byte-for-byte copy is exactly what is desired, but a generally usable transport must be ready to accept layers before metadata in order to support the more general copies.

Conceptually, the goal here isn't a "copy" - it's using container images as a wrapper. That said, it would be nice to support preserving the wrapper data eventually but it's definitely not critical path.

But maybe we just want backends to expose a property bool SendMetadataFirst() and oci-archive-streaming-uncompressed-fd: returns true for that.

cgwalters · 2021-09-20T19:24:36Z

I think in the end this will be a skopeo API probably.

This was referenced Apr 26, 2021

container: Work on new skopeo backend ostreedev/ostree-rs-ext#18

Closed

Don't try to realpath() on /proc/$pid/fd magic links #1208

Closed

cgwalters changed the title ~~Change oci-archive:// to be actually streaming~~ Add oci-archive-uncompressed-fd:5 Apr 28, 2021

This was referenced Apr 29, 2021

pkg/compression: new zstd variant zstd:chunked #1084

Merged

ship quay.io/coreos/fedora-coreos coreos/fedora-coreos-tracker#812

Closed

cgwalters closed this as completed Sep 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add oci-archive-uncompressed-fd:5 #1209

Add oci-archive-uncompressed-fd:5 #1209

cgwalters commented Apr 23, 2021

cgwalters commented Apr 23, 2021

cgwalters commented Apr 26, 2021

mtrmac commented Apr 27, 2021

cgwalters commented Apr 27, 2021

mtrmac commented Apr 28, 2021

cgwalters commented Apr 28, 2021

cgwalters commented Apr 28, 2021

cgwalters commented Sep 20, 2021

Add oci-archive-uncompressed-fd:5 #1209

Add oci-archive-uncompressed-fd:5 #1209

Comments

cgwalters commented Apr 23, 2021

cgwalters commented Apr 23, 2021

cgwalters commented Apr 26, 2021

mtrmac commented Apr 27, 2021

cgwalters commented Apr 27, 2021

mtrmac commented Apr 28, 2021

cgwalters commented Apr 28, 2021

cgwalters commented Apr 28, 2021

cgwalters commented Sep 20, 2021