Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

media-types.md: add foreign layer media type #216

Closed
wants to merge 1 commit into from

Conversation

runcom
Copy link
Member

@runcom runcom commented Aug 29, 2016

Relates to #169

This patch adds a new media type to indicate to implementation that the descriptor which it refers isn't supposed to be pushed or carried in an image layout.

Not sure everyone likes foreign - I don't have a strong opinion either so suggestions are welcome.

/cc @stevvooe @vbatts @philips

Are there other places I should update for this? Is there a place where we have a description of each media type?

Signed-off-by: Antonio Murdaca runcom@redhat.com

@runcom runcom changed the title media-types.md: add foreign descritor media type media-types.md: add foreign descriptor media type Aug 29, 2016
@@ -3,6 +3,7 @@
The following `mediaType` MIME types are used by the formats described here, and the resources they reference:

- `application/vnd.oci.descriptor.v1+json`: [Content Descriptor](descriptor.md)
- `application/vnd.oci.descriptor.foreign.v1+json`: [Content Descriptor](descriptor.md)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you figure out what the real type is? Docker uses application/vnd.docker.image.rootfs.foreign.diff.tar.gzip, which is clearly a foreign version of their application/vnd.docker.image.rootfs.diff.tar.gzip.

Stepping back, I don't think we need this information for pulling. However, “this content is not licensed for sharing” is a clear property that doesn't rely on vague ideas like “the usual channel”. Maybe we can add a new descriptor.license field (an array of SPDX identifiers?), and refuse to push blobs which don't have such an array (and are therefore not shareable).

Blob license information would be nice (more in #71), but if we want to scope this more narrowly we could add a boolean descriptor.shareable to make life easier for publishers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking Types of objects are not detected. The type of an object comes from trusted descriptors or a metadata storage system.

We are not adding licensing information in the course of this PR.

Please keep the conversation focused.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Tue, Aug 30, 2016 at 01:48:59PM -0700, Stephen Day wrote:

@@ -3,6 +3,7 @@
The following mediaType MIME types are used by the formats described here, and the resources they reference:

@wking Types of objects are not detected. The type of an object
comes from trusted descriptors or a metadata storage system.

If you have a descriptor holding type
application/vnd.oci.descriptor.foreign.v1+json, what does it point to?
Does it point to another descriptor, and that descriptor has a type
like application/vnd.oci.image.serialization.rootfs.tar.gzip? That
would be the first case I describe in 1. The current Docker
approach, as I understand it, is the third case I describe in 1. Or
are you intending this type be used in a way that I don't describe in
1 at all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no application/vnd.oci.descriptor.foreign.v1+json. That doesn't exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Tue, Aug 30, 2016 at 02:39:23PM -0700, Stephen Day wrote:

@@ -3,6 +3,7 @@
The following mediaType MIME types are used by the formats described here, and the resources they reference:

There is no application/vnd.oci.descriptor.foreign.v1+json. That doesn't exist.

It's in this PR, in the lines we're discussing now ;). Maybe we
should table this sub-thread until @runcom rerolls the PR?

@philips
Copy link
Contributor

philips commented Aug 29, 2016

I still don't understand why this is a new mime-type instead of a field on the manifest. cc @stevvooe

@stevvooe
Copy link
Contributor

@runcom This isn't at the right level. This should be for the layer itself, since the descriptor may be lost.

@philips This is a mediatype so that it doesn't require the datastructure to keep the information about whether or not it should be distributed.

@philips
Copy link
Contributor

philips commented Aug 29, 2016

@stevvooe right, but would I use this media-type in an Accept: header? What is the problem with keeping a datastructure?

@stevvooe
Copy link
Contributor

@philips One may store the layer without the data structure. The media type is for the layer itself, not the descriptor pointing to it.

@wking
Copy link
Contributor

wking commented Aug 29, 2016

On Mon, Aug 29, 2016 at 01:46:08PM -0700, Stephen Day wrote:

This should be for the layer itself, since the descriptor may be
lost.

What is “for the layer itself”? Blobs are opaque, so the first
possible place to store metadata like this is at the descriptor level.
This PR currently suggests a media type, and I'd prefer a new
descriptor field 1, but both of those are only stored at the
descriptor level.

@philips
Copy link
Contributor

philips commented Aug 29, 2016

@stevvooe Right, but the layer won't embed that it is 'foreign' either, right? I just don't understand why it is a media-type when really it is advising people about what they can/cannot do with the contents, not what the content format is.

@stevvooe
Copy link
Contributor

@philips It is a legal distinction, rather than a technical one. It has nothing to do with being "foreign". It is embedded as media type so that information isn't lost for storage targets that may drop other descriptor fields but not media type.

@philips
Copy link
Contributor

philips commented Aug 29, 2016

@stevvooe Why can't a storage target be taught to hold on to metadata besides the media-type?

@stevvooe
Copy link
Contributor

@philips It can, but embedding in the media type removes the requirement. Such images may also have slightly different properties for the platform and will be interpreted differently. It's the difference between using a flag and a type to decide behavior. In this case, we said these layers are a different type.

@wking
Copy link
Contributor

wking commented Aug 29, 2016

On Mon, Aug 29, 2016 at 03:59:58PM -0700, Stephen Day wrote:

Such images may also have slightly different properties for the
platform and will be interpreted differently.

Can this PR grow some docs about those differences?

I'm still missing how you figure out what the actual type of the
referenced content is, since this PR uses a generic descriptor base
1. Is the idea to have the following Merkle chain:

  1. Manifest A (type application/vnd.oci.image.manifest.v1+json) with
    an entry in layers pointing at Descriptor B with type
    application/vnd.oci.descriptor.foreign.v1+json.
  2. Descriptor B with type
    application/vnd.oci.image.serialization.rootfs.tar.gzip pointing at
    layer C.
  3. Layer C (the gzipped tar blob).

I'd rather have:

  1. Manifest A (type application/vnd.oci.image.manifest.v1+json) with
    an entry in layers pointing at layer C with type
    application/vnd.oci.image.serialization.rootfs.tar.gzip and a
    descriptor field with the “not licensed for sharing” information
    1.
  2. Layer C (the gzipped tar blob).

But if you can't bring yourself to add a descriptor field to cover
that case, you can still avoid the inefficiency of an intermediate
Descriptor B by using:

  1. Manifest A (type application/vnd.oci.image.manifest.v1+json) with
    an entry in layers pointing at layer C with type
    application/vnd.oci.image.foreign.serialization.rootfs.tar.gzip and a
    descriptor field with the “not licensed for sharing” information
    1.
  2. Layer C (the gzipped tar blob).

which is closer to Docker's current implementation 2.

@philips
Copy link
Contributor

philips commented Aug 29, 2016

@wking I agree. We need the "foreign" types added to the compatibility matrix and an explanation of how these types should be handled differently.

I am still skeptical about encoding policy into a mime-type. Particularly policy that might change between platforms.

@stevvooe
Copy link
Contributor

@philips I agree that this isn't ideal, but I am not sure if this is the right venue for design review.

@wking That documentation of docker's current implementation is wrong. The relationship between the mediatype and URLs is incidental but urls may be used for any object. I just filed distribution/distribution#1931 address this inaccuracy.

@stevvooe
Copy link
Contributor

PR that fixes the upstream docker specification: distribution/distribution#1932

@wking
Copy link
Contributor

wking commented Aug 30, 2016

On Mon, Aug 29, 2016 at 05:04:10PM -0700, Stephen Day wrote:

@wking That documentation of docker's current implementation is
wrong. The relationship between the mediatype and URLs is incidental
but urls may be used for any object. I just filed
distribution/distribution#1931 address this
inaccuracy.

The distinction you're making there and in distribution/distribution#1932
looks good to me.

I still don't see a need to push this information into the media type
(vs. having a new Descriptor field 1), but perhaps that will become
clearer if/when we get docs explaining the platform differences [2,3].

@philips
Copy link
Contributor

philips commented Aug 30, 2016

The more I read, even after seeing the Docker notes, the more I think it is clear there should be a "pushable" field that is false to enforce this policy.

@stevvooe
Copy link
Contributor

@philips

The more I read, even after seeing the Docker notes, the more I think it is clear there should be a "pushable" field that is false to enforce this policy.

I see the issue: from an engineering perspective, these aren't different things. From a legal standpoint, they are different types, and the handling of each type is different. Pulling this property up to the descriptor level obscures that.

Fundamentally, treating these as different types is correct. Each type has a handler for how it is pulled, pushed and managed. The descriptor only tells the type of a thing, where to get it and how big it is. There is no policy or processing that is specific to a given type within the descriptor itself. All of that is encoded in the type itself. Having this as a type is just an extension of that methodology: for this type, the ability to push it is not defined.

cc @jstarks @RobDolinMS @JLB13

@wking
Copy link
Contributor

wking commented Aug 30, 2016

On Tue, Aug 30, 2016 at 04:05:12PM -0700, Stephen Day wrote:

The more I read, even after seeing the Docker notes, the more I
think it is clear there should be a "pushable" field that is false
to enforce this policy.

I see the issue: from an engineering perspective, these aren't
different things…

Engineering issues with the three choices from 1, in the same order
as they're listed in that comment:

a. Adding a generic application/vnd.oci.descriptor.foreign.v1+json
type. This introduces an additional Descriptor, where the
lower-level Descriptor holds the “real” type and the higher-level
descriptor holds application/vnd.oci.descriptor.foreign.v1+json.
Having two descriptors means fetching the Merkle tree has an extra
round-trip of latency.

b. Adding a Descriptor field. No need for an additional Descriptor or
a doubling of known media types. Unicorns dance through
flower-filled meadows ;).

c. Adding foreign versions of existing media types. Folks can use
application/vnd.oci.image.foreign.serialization.rootfs.tar.gzip in
place of application/vnd.oci.image.serialization.rootfs.tar.gzip
when they reference a layer with push restrictions. We need
additional PRs landing additional media types if anyone wants
application/vnd.oci.image.foreign.serialization.config.v1+json or
similar.

So I like (b). If we know (somehow) that the unpushable tag will only
ever apply to layer tarballs, then (c) is reasonable; I just don't see
a reason to make that claim when (b) accomplishes the same thing with
safer scaling. Both (b) and (c) seem better than (a), since the (b) /
(c) tradeoff only impacts image-spec development, while (a) has a
deployed-performance impact.

@stevvooe
Copy link
Contributor

Option (a) is out, since I think that was a misunderstanding.
Option (b) provides dubious benefit and breaks the descriptor model.
Option (c) is proven.

@wking
Copy link
Contributor

wking commented Aug 31, 2016

On Tue, Aug 30, 2016 at 04:45:06PM -0700, Stephen Day wrote:

Option (b) provides dubious benefit and breaks the descriptor model.
Option (c) is proven.

I think one benefit of (b) (no need to land foreign versions of media
types if we need the same handling for types beyond layer tarballs) is
clear. I'm not clear on how the change breaks descriptors; it just
makes the OCI spec slightly less of a drop-in replacement for the
current Docker spec than it was previously.

But either way, I think the main issue is that a boolean (wherever we
store it) is insufficient to cover “can I push the referenced blob to
store $X?”. I expect the image-spec maintainers will either end up
punting that logic to higher layers (and dropping the foreign boolean)
or providing space for a more detailed push decision to be make in
software. If we go the latter route, the interesting information
includes:

  • Licenses that apply to the referenced blob 1. Some blobs will be
    dual licensed, and some blobs will contain components with several
    licenses, so we probably need a way to express logical ‘and’ and
    ‘or’ operations while listing licenses.
  • Copyright holders. For blobs with restrictive licenses or blobs
    where all rights are reserved, the copyright holder can still
    push the blob between repositories as they see fit.
  • Export classficiation. Some blobs can't be exported from the US, or
    cannot be exported to particular embargoed countries 2. Some
    companies or individuals may wish to impose similar restrictions
    internally, to keep sensitive blobs from leaking into the outside
    world.

I'm guessing that Descriptor.annotations in the spec and “can push?”
hooks in tooling will be the best way to address the complication,
although some fields (licensing?) are likely worth standardizing.
Hashing this out is going to take some time, and both (b) and (c) are
fine as stopgap solutions.

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
@runcom
Copy link
Member Author

runcom commented Aug 31, 2016

@stevvooe I moved the foreign word to the Layer media type - though I do share @philips's thoughts about this having to be a property/field of the layer and not a new media type.

@vbatts
Copy link
Member

vbatts commented Aug 31, 2016

i'm game for not having a new mimetype for same content. Just because it is not resolved in the present blobs directory and may need to be fetched does not make it new content.

@runcom runcom changed the title media-types.md: add foreign descriptor media type media-types.md: add foreign layer media type Aug 31, 2016
@philips
Copy link
Contributor

philips commented Aug 31, 2016

@runcom can you add more information on "pushed"? We need to be crisp on this and I don't feel anyone has really defined the semantics of "pushing". Can I not mirror it internally? Can it not be pushed into a local cache?

@runcom
Copy link
Member Author

runcom commented Aug 31, 2016

@runcom can you add more information on "pushed"? We need to be crisp on this and I don't feel anyone has really defined the semantics of "pushing". Can I not mirror it internally? Can it not be pushed into a local cache?

@philips is uploaded/uploading a better term instead?

@philips
Copy link
Contributor

philips commented Aug 31, 2016

@runcom right, but uploaded where and why? "Uploaded outside of administrative control?". I understand this is about handling of copyrighted material but I don't know how to phrase this.

@wking
Copy link
Contributor

wking commented Aug 31, 2016

On Wed, Aug 31, 2016 at 11:47:23AM -0700, Brandon Philips wrote:

@runcom right, but uploaded where and why? I understand this is
about handling of copyrighted material but I don't know how to
phrase this.

I'd guess “pushed to a location from which users other than the
initial requester might access it”. So you're free to cache it on
your laptop (if you're the only one with a login) and clearly can't
upload it to a public store. Whether you can cache it on a multi-user
box falls into the same sort of gray area that downloading other
copyright material falls into. You can legally access it, but aren't
allowed to share it with other users on the system (even root?). You
may be allowed to cache it on the multi-user box (or in the public
store?) if you encrypt it. Or do we want to cover licenses for seats
somehow? There was some discussion of commercial licensing 1.

 Subject: [Food for thought] Licencing issue for enterprise product
  distribution through containers.
 Date: Fri, 23 Oct 2015 01:54:31 -0700 (PDT)
 Message-Id: <8aee4524-ad6a-4f94-bc82-961ee875b12f@opencontainers.org>

@runcom
Copy link
Member Author

runcom commented Sep 2, 2016

Closing this as per #233

@runcom runcom closed this Sep 2, 2016
@runcom runcom deleted the foreign-mt branch September 2, 2016 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants