Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Layer relations/parent layer clarification #190

Closed
sgotti opened this issue Aug 2, 2016 · 20 comments
Closed

Layer relations/parent layer clarification #190

sgotti opened this issue Aug 2, 2016 · 20 comments

Comments

@sgotti
Copy link

sgotti commented Aug 2, 2016

Perhaps this is to some extents related to #39 (but it was closed and #102 talks about something different) but, just to dispel any doubt, I'd like to be sure that there's no relation between OCI image layers and so, that a layer (if not a base one) is not forced to always have the same parent layer.

To be clear, does the spec permits defining two images where the upper layer has a different bottom layer in the chain?

Image A manifest:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "config": ...,
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.serialization.rootfs.tar.gzip",
      "size": 32654,
      "digest": "sha256:e692418e4cbaf90ca69d05a66403747baa33ee08806650b51fab815ad7fc331f"
    },
    {
      "mediaType": "application/vnd.oci.image.serialization.rootfs.tar.gzip",
      "size": 73109,
      "digest": "sha256:ec4b8955958665577945c89419d1af06b5f7636b4ac3da7f12184802ad867736"
    }
  ]
}

Image B manifest:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "config": ...,
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.serialization.rootfs.tar.gzip",
      "size": 16724,
      "digest": "sha256:3c3a4604a545cdc127456d94e421cd355bca5b528f4a9c1905b15da2eb4a4c6b"
    },
    {
      "mediaType": "application/vnd.oci.image.serialization.rootfs.tar.gzip",
      "size": 73109,
      "digest": "sha256:ec4b8955958665577945c89419d1af06b5f7636b4ac3da7f12184802ad867736"
    }
  ]
}

(for simplicity I'm skipping the images config since I hope the manifest will be enough)

For example someone would like to just add an application on top of a set of common libraries (or just a base linux distribution image) and then upgrade the base libraries without rebuilding the "application" layer.

I'm asking this since the OCI image spec uses as a starting point the docker v2.2 distribution manifest and also:

My opinion is that there's no reason and there's nothing in the spec that says that a layer has a relation with another layer and that a layer (if not a base one) will always have the same parent layer.

@philips
Copy link
Contributor

philips commented Aug 10, 2016

I see no reason why this wouldn't be allowed.

@philips
Copy link
Contributor

philips commented Aug 10, 2016

cc @opencontainers/image-spec-maintainers

@stevvooe
Copy link
Contributor

This is more of a build time property than something we want to build into the specification. However, this requirement is clearly called out with the existence of a DiffID and ChainID. In fact, there are a number of security issues with treating a layer independently from the parent. Without these, your container system will be open to a number of exploits that can allow the injection of malicious code.

From a manifest perspective, this doesn't matter. The manifest just describes resources and provides a rough ordering for ideally resource fetch. The client shouldn't need to know anything about how they are assembled or even what the resources really are. They should just fetch the resource and dispatch them to a handler. It can be said that manifests are agnostic to the format.

The image configuration tells the story about how to assemble these into something usable. That includes the relationships between layers. If you want to switch out a layer, you'll have to fix up these identifiers to comply with these relationships such that they can be verified.

From the perspective of the specification, these are really two different images, which happen to share common resources. If you read through Creating an image filesystem changeset, you'll see why this interdependency is important. If the new layer wasn't built considering the resources in the old layer, it is easy to unintentionally expose extra or malicious data. In practice, there isn't a lot of expense to this, but it will result in a new layer.

From a high-level, layers aren't the right place to share common resources for this style of application. That is not to say that containers built on the same layer can't share that layer. This is more to say that this style of composition needs to happen at the container runtime, where these relationships can be expressed through naming, rather than content address. Shoehorning this functionality at this level is just going to lead to broken stuff. There are simply too many things that can go wrong when you switch out the base layer without re-building and testing the application layer.

That said, this style of composition fits in very well at the build level, where these components can be assembled, packaged and verified together, resulting an immutable artifact. At that stage, names can be used to reference changing artifacts that reflect actual, build time dependencies. By forcing that to be done at build time, you centralize the update, leading to a more secure, more reliable assembly that relies on existing packaging systems that already solve these dependency problems for us.

@sgotti
Copy link
Author

sgotti commented Aug 23, 2016

@stevvooe thanks for your detailed answer!

I can agree that this cannot be the correct way for doing this but I just tried to find a simple example to explain the question 😄

This is more of a build time property than something we want to build into the specification. However, this requirement is clearly called out with the existence of a DiffID and ChainID.

If you want to switch out a layer, you'll have to fix up these identifiers to comply with these relationships such that they can be verified.

So, let's say that someone wants (ignoring all the warnings) to follow this road and create a build tool that generates the correct DiffID and ChainID (is this possible or am I missing something?). Since the spec doesn't blocks this an oci image implementation (local store, registry) should also handle this case (a layer with different parent layers).
But this, currently, if I'm not wrong, will cause issues in the docker graph drivers and on some registries that are assuming that a layer can have only the same parent layer.

I'm not sure where's the line. Is the implementation that isn't image spec complaint or is the image spec not clear on how an implementation should manage layers?

BTW, the appc spec has the concept of dependencies between images (it doesn't have the layer concept), and, for the same top image, its dependencies may change (if not forced by its digest) since discovery is used to locate them. And, in the end, these images are rendered on disk (in a bit more complex way since its a DAG and not just a chain) just extracting the images in the DAG in the correct order and applying witheouts (PathWhiteList in the appc case) on them.

@stevvooe
Copy link
Contributor

So, let's say that someone wants (ignoring all the warnings) to follow this road and create a build tool that generates the correct DiffID and ChainID (is this possible or am I missing something?).

I'm considering this to be a "build time" operation.

From the perspective of the OCI specification, the result of this modification would be a separate image.

But this, currently, if I'm not wrong, will cause issues in the docker graph drivers and on some registries that are assuming that a layer can have only the same parent layer.

No. I'm not sure if I'm making my point accurately. The issue is that a layer may have opaque files that mean nothing when applied to an arbitrary parent.

BTW, the appc spec has the concept of dependencies between images (it doesn't have the layer concept), and, for the same top image, its dependencies may change (if not forced by its digest) since discovery is used to locate them. And, in the end, these images are rendered on disk (in a bit more complex way since its a DAG and not just a chain) just extracting the images in the DAG in the correct order and applying witheouts (PathWhiteList in the appc case) on them.

Yes, and this is effectively the same feature in docker and OCI. The difference is that we only point at the parent layer, not the image. The layer is just a tar file and the image is the configuration+layer parent chain.

@vbatts
Copy link
Member

vbatts commented Aug 30, 2016

There ought to be no issues with pointing to an image rather than later,
but the reconciliation of the configuration of the parent (ignore?, merge?,
Something else?)

On Wed, Aug 24, 2016, 15:56 Stephen Day notifications@github.com wrote:

So, let's say that someone wants (ignoring all the warnings) to follow
this road and create a build tool that generates the correct DiffID and
ChainID (is this possible or am I missing something?).

I'm considering this to be a "build time" operation.

From the perspective of the OCI specification, the result of this
modification would be a separate image.

But this, currently, if I'm not wrong, will cause issues in the docker
graph drivers and on some registries that are assuming that a layer can
have only the same parent layer.

No. I'm not sure if I'm making my point accurately. The issue is that a
layer may have opaque files that mean nothing when applied to an arbitrary
parent.

BTW, the appc spec has the concept of dependencies between images (it
doesn't have the layer concept), and, for the same top image, its
dependencies may change (if not forced by its digest) since discovery is
used to locate them. And, in the end, these images are rendered on disk (in
a bit more complex way since its a DAG and not just a chain) just
extracting the images in the DAG in the correct order and applying
witheouts (PathWhiteList in the appc case) on them.

Yes, and this is effectively the same feature in docker and OCI. The
difference is that we only point at the parent layer, not the image.
The layer is just a tar file and the image is the configuration+layer
parent chain.


You are receiving this because you are on a team that was mentioned.

Reply to this email directly, view it on GitHub
#190 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEF6W8BjeFZkIiNDLtiZkcaY1oya003ks5qjKHbgaJpZM4JajhW
.

@philips
Copy link
Contributor

philips commented Aug 30, 2016

@vbatts "rather than later". Having a hard time parsing your response.

@wking
Copy link
Contributor

wking commented Aug 30, 2016

On Tue, Aug 30, 2016 at 09:40:09AM -0700, Brandon Philips wrote:

@vbatts "rather than later". Having a hard time parsing your response.

I'm pretty sure he meant “rather than a layer”.

@vbatts
Copy link
Member

vbatts commented Aug 30, 2016

s/later/layer/

@stevvooe
Copy link
Contributor

@vbatts Theoretically, I agree. Could you show an example?

@vbatts
Copy link
Member

vbatts commented Sep 7, 2016

trivial example, but referencing object sha256:702ad90f705365227e902b42d91dd1a40e48ca7f67a2f4b2fd052aaa4295cd95, which is provided by https://storage.googleapis.com/golang/go1.7.linux-amd64.tar.gz
By having a child layer be this reference, after applying the above archive, there is now /go/... in the resulting filesystem.

So a application/vnd.oci.image.manifest.v1+json object that could look like:

{
    "annotations": null,
    "config": {
        "digest": "sha256:2b8fd9751c4c0f5dd266fcae00707e67a2545ef34f9a29354585f93dac906749",
        "mediaType": "application/vnd.oci.image.serialization.config.v1+json",
        "size": 1459
    },
    "layers": [
        {
            "digest": "sha256:702ad90f705365227e902b42d91dd1a40e48ca7f67a2f4b2fd052aaa4295cd95",
            "mediaType": "application/vnd.oci.image.layer.tar+gzip",
            "size": 81573766
        },
        {
            "digest": "sha256:8ddc19f16526912237dd8af81971d5e4dd0587907234be2b83e249518d5b673f",
            "mediaType": "application/vnd.oci.image.layer.tar+gzip",
            "size": 667590
        }
    ],
    "mediaType": "application/vnd.oci.image.manifest.v1+json",
    "schemaVersion": 2
}

@vbatts
Copy link
Member

vbatts commented Sep 7, 2016

(or perhaps with application/tar+gzip mimetype, but it could be applied as application/vnd.oci.image.layer.tar+gzip)

@stevvooe
Copy link
Contributor

stevvooe commented Sep 7, 2016

@vbatts Wouldn't the example call for a application/vnd.oci.image.manifest.v1+json as one of the layers? The specification already should handle the case that you are talking about.

@vbatts
Copy link
Member

vbatts commented Sep 7, 2016

@stevvooe i don't follow why a manifest would be one of the layers. Elaborate?

@wking
Copy link
Contributor

wking commented Sep 7, 2016

On Wed, Sep 07, 2016 at 12:52:15PM -0700, Vincent Batts wrote:

@stevvooe i don't follow why a manifest would be one of the
layers. Elaborate?

My reading of 1 was that you were suggesting:

"layers": [
    {
        "digest": "sha256:abc…"
        "mediaType": "application/vnd.oci.image.manifest.v1+json"
        …
    },
    {
        "digest": "sha256:def…"
        "mediaType": "application/vnd.oci.image.layer.tar+gzip",
        …
    }
],

which would have the same effect as a manifest which replaced the
sha256:abc… layer with all the layers contained in the sha256:abc…
manifest (recursively if that manifest in turn referenced other
manifests).

You could also require image-authors to flatten the layers array out
and not allow application/vnd.oci.image.manifest.v1+json layer
entries, but that makes “I just want to stick something small on top
of the image you already trust” less obvious. Still, putting a
reference to sha256:abc… in annotations would accomplish the same
goal, and keep the layers spec simpler, so I don't feel strongly
either way.

@stevvooe
Copy link
Contributor

stevvooe commented Sep 7, 2016

@vbatts I'm not suggesting that, but that seemed to be the request here. My point, under that premise, is that it is odd to place an image in the layers and this should really be a build time fixup.

@vbatts
Copy link
Member

vbatts commented Sep 13, 2016

Oh I see now. Yeah. Referencing an object that is a manifest with it's own objects too. That seems like a valuable use-case.

But how would that be a build time fixup tho?

@stevvooe
Copy link
Contributor

@vbatts My point is that you can't just swap these without fixing up the chain ids and diff ids to correlate. These are generally build time concerns.

Effectively, I'm saying this is already supported without referencing an image manifest as a layer. Adding this will just create another way to do the same thing without providing much value.

@vbatts
Copy link
Member

vbatts commented Oct 6, 2016

@sgotti have we confirmed that this is a non-issue?

@stevvooe
Copy link
Contributor

Closing after two weeks with no activity. Please re-open if there is more to discuss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants