-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spec: describe descriptors and digests #111
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,117 @@ | ||
# OpenContainers Content Descriptors | ||
|
||
OCI have several components that come to together to describe an image. | ||
References between components form a [Merkle Directed Acyclic Graph (DAG)](https://en.wikipedia.org/wiki/Merkle_tree). | ||
The references in the _Merkle DAG_ are expressed through _Content Descriptors_. | ||
A _Content Descriptor_ or _Descriptor_, describes the disposition of targeted content. | ||
A _Descriptor_ includes the type of content, an independently-verifiable content identifier, known as a "digest" and the byte-size of the raw content. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The “, known as a "digest"” bit makes the three-part content listing hard for me to read. Can we drop this sentence, since we have a detailed “Properties” section below? |
||
|
||
Descriptors SHOULD be embedded in other formats to securely reference external content. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this should be turned around to get “Other formats SHOULD use descriptors to securely reference external content”. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 on turning it around as this format will stand on its own in #94 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, +1 on this change but I see below it says they can be independent There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On Wed, Jun 01, 2016 at 04:59:36PM -0700, Brandon Philips wrote:
We got the turned-around “Other formats SHOULD use descriptors to |
||
|
||
Other formats SHOULD use descriptors to securely reference external content. | ||
|
||
## Properties | ||
|
||
The following describe the primary set of properties that make up a _Descriptor_. | ||
|
||
- **`mediaType`** *string* | ||
|
||
This REQUIRED property contains the MIME type of the referenced object. | ||
|
||
- **`digest`** *string* | ||
|
||
This REQUIRED property is the _digest_ of the targeted content, meeting the requirements outlined in [Digests and Verification](#digests-and—verification). | ||
Retrieved content SHOULD be verified against this digest when consumed via untrusted sources. | ||
|
||
- **`size`** *int* | ||
This REQUIRED property specifies the size in bytes of the blob. | ||
This property exists so that a client will have an expected size for the content before validating. | ||
If the length of the retrieved content does not match the specified length, the content SHOULD NOT be trusted. | ||
|
||
### Reserved | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @stevvooe I don't follow. Are you saying some other type is polymorphic? Which one? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In general, we want to allow extension of descriptors but reserve a set of fields that may not be supplanted by other specifications. An example is the Another example is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On Thu, Jun 02, 2016 at 01:11:54PM -0700, Stephen Day wrote:
cough LSON-LD cough 1 ;). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see how JSON-LD is related at all. Seems to be overcomplicated and unfocused in the problems it solves. What benefit do you see in bringing the complexity of XML to JSON? Do you have examples? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On Thu, Jun 02, 2016 at 02:28:46PM -0700, Stephen Day wrote:
But that approach requires you (and anyone adding their own fields, And it looks like my vendor example should have been 1: { There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, but what problem does this complexity actually solve? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On Thu, Jun 02, 2016 at 02:58:34PM -0700, Stephen Day wrote:
You get to drop this section and sleep soundly without wondering if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure I could sleep soundly unleashing JSON-LD on implementors. Either way, this clause wouldn't protect against vendor-added fields in a descriptor. That is simply not supported. If one drops a random field into a descriptor, it may be overwritten by future versions of the specification. We already have annotations to cover this use case. This statement would only reserve it from use in other OCI specifications that use descriptors. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On Thu, Jun 02, 2016 at 03:33:16PM -0700, Stephen Day wrote:
That is supported by JSON-LD. Here's how you use both at once: { With JSON-LD, the property name (e.g. ‘urls’, ‘vendorUrls’, …) isn't |
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So why are these reserved? I'm surprising unclear on this. Is it because docker is using them? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Docker is introducing a
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. right on. Maybe there could be a notion of scoping? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Namespacing is a solid option, as is allowing annotations, but we really need to balance this against the simplicity and compatibility of descriptors. The manifest list descriptor uses Either way, let's see how There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am fine leaving these as reserved. Honestly, I would rather just define them here first but reserving more stuff rather than less in the interest of caution is fine. Overtime, my hope is that whatever discussion lead to urls being added to the Docker media type happens in OCI post-v1.0 as we all focus on a single shared spec. |
||
The following are field keys that MUST NOT be used in descriptors specified in other OCI specifications: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're restricting your reservation to “other OCI specifications”? I expect you can trust OCI maintainers to PR this spec if they want a new descriptor field. The folks you want to warn off are external implementations and specs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And because it doesn't hurt to also warn off other OCI specs, I'd use something generic like “MUST NOT be defined outside this specificiation”. |
||
|
||
- **`urls`** *array* | ||
|
||
This key is RESERVED for future versions of the specification. | ||
|
||
- **`data`** *string* | ||
|
||
This key is RESERVED for futures versions of the specification. | ||
|
||
All other fields may be included in other OCI specifications. | ||
Extended _Descriptor_ field additions proposed in other OCI specifications SHOULD first be considered for addition into this specification. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd drop OCI from both of these lines. |
||
|
||
## Digests and Verification | ||
|
||
The _digest_ component of a _Descriptor_ acts as a content identifier, employing [content addressability](http://en.wikipedia.org/wiki/Content-addressable_storage) for the OCI image format. | ||
It uniquely identifies content by taking a collision-resistant hash of the bytes. | ||
Such an identifier can be independently calculated and verified by selection of a common _algorithm_. | ||
If such an identifier can be communicated in a secure manner, one can retrieve the content from an insecure source, calculate it independently and be certain that the correct content was obtained. | ||
Put simply, the identifier is a property of the content. | ||
|
||
To disambiguate from other concepts, we call this identifier a _digest_. | ||
A _digest_ is a serialized hash result, consisting of a _algorithm_ and _hex_ portion. | ||
The _algorithm_ identifies the methodology used to calculate the digest, which are shared by implementations. | ||
The _hex_ portion is the hex-encoded result of the hash. | ||
|
||
We define a _digest_ string to match the following grammar: | ||
|
||
``` | ||
digest := algorithm ":" hex | ||
algorithm := /[A-Fa-f0-9_+.-]+/ | ||
hex := /[A-Fa-f0-9]+/ | ||
``` | ||
|
||
Some examples of _digests_ include the following: | ||
|
||
digest | description | | ||
----------------------------------------------------------------------------------|------------------------------------------------ | ||
sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b | Common sha256 based digest | | ||
|
||
Before consuming content targeted by a descriptor from untrusted sources, the byte content SHOULD be verified against the _digest_. | ||
The size of the content SHOULD be verified to reduce hash collision space. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This (and the |
||
Heavy processing of before calculating a hash SHOULD be avoided. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. “processing of before” → “processing before”. And maybe explain that this suggestion is denial-of-service protection. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 |
||
Implementations MAY employ some canonicalization to ensure stable content identifiers. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This content-production suggestion seems out-of-scope for a digest spec. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a fairly important consideration when building content-addressable systems. This is saying the practice is acceptable but discouraged. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On Wed, Jun 01, 2016 at 04:37:46PM -0700, Stephen Day wrote:
Wait why discouraged? I think it's a good idea. I'm just saying There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @stevvooe can you provide a concrete example of canonicalization? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Processing unverified content is problematic. For example, one should not process canonicalize a tar file before calculating the content digest. If this is done, it should only be done on the generation side. I'll add some language here to clear that up.
And that will likely result in hash stability problems. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On Wed, Jun 01, 2016 at 05:04:43PM -0700, Stephen Day wrote:
We should definitely be talking about hash stability in image-spec, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
In the spec or in response to your comment? |
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Include mention of size verification to avoid Length Extension Attack. |
||
### Algorithms | ||
|
||
While the _algorithm_ does allow one to implement a wide variety of algorithms, compliant implementations SHOULD use [SHA-256](#SHA-256). | ||
|
||
Let's use a simple example in pseudo-code to demonstrate a digest calculation: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This example is generic “Digests and Verification” stuff, so I think we should move it out of the “Algorithms” subsection. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is to clarify the algorithm for using |
||
A _digest_ is calculated by the following pseudo-code, where `H` is the selected hash algorithm, identified by string `<alg>`: | ||
``` | ||
let ID(C) = Descriptor.digest | ||
let C = <bytes> | ||
let D = '<alg>:' + EncodeHex(H(C)) | ||
let verified = ID(C) == D | ||
``` | ||
Above, we define the content identifier as `ID(C)`, extracted from the `Descriptor.digest` field. | ||
Content `C` is a string of bytes. | ||
Function `H` returns a the hashs of `C` in bytes and is passed to function `EncodeHex` to obtain the _digest_. | ||
The result `verified` is true if `ID(C)` is equal to `D`, confirming that `C` is the content identified by `D`. | ||
After verification, the following is true: | ||
|
||
``` | ||
D == ID(C) == '<alg>:' + EncodeHex(H(C)) | ||
``` | ||
|
||
The _digest_ is confirmed as the content identifier by independently calculating the _digest_. | ||
|
||
#### SHA-256 | ||
|
||
[SHA-256](https://tools.ietf.org/html/rfc4634#page-7) is a collision-resistant hash function, chosen for ubiquity, reasonable size and secure characteristics. | ||
Implementations MUST implement SHA-256 digest verification for use in descriptors. | ||
|
||
## Examples | ||
|
||
The following example describes a [_Manifest_](manifest.md#image-manifest) with a content identifier of "sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270", of size 7682 bytes: | ||
|
||
```json,title=Content%20Descriptor&mediatype=application/vnd.oci.descriptor.v1%2Bjson | ||
{ | ||
"mediaType": "application/vnd.oci.image.manifest.v1+json", | ||
"size": 7682, | ||
"digest": "sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270" | ||
} | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
{ | ||
"description": "OpenContainer Content Descriptor Specification", | ||
"$schema": "http://json-schema.org/draft-04/schema#", | ||
"id": "https://opencontainers.org/schema/descriptor", | ||
"type": "object", | ||
"properties": { | ||
"mediaType": { | ||
"description": "the mediatype of the referenced object", | ||
"$ref": "defs-image.json#definitions/mediaType" | ||
}, | ||
"size": { | ||
"description": "the size in bytes of the referenced object", | ||
"type": "integer" | ||
}, | ||
"digest": { | ||
"$ref": "defs-image.json#definitions/digest" | ||
} | ||
}, | ||
"required": [ | ||
"mediaType", | ||
"size", | ||
"digest" | ||
] | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,12 +14,12 @@ | |
"$ref": "defs-image.json#/definitions/mediaType" | ||
}, | ||
"config": { | ||
"$ref": "defs-image.json#/definitions/descriptor" | ||
"$ref": "content-descriptor.json" | ||
}, | ||
"layers": { | ||
"type": "array", | ||
"items": { | ||
"$ref": "defs-image.json#/definitions/descriptor" | ||
"$ref": "content-descriptor.json" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I reckon this is fine, but seems just as well to make it an object within a defs*.json file, rather than adding a new file just for a single definition. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah, but it can have it's own schema file to be validated against ... hrm There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suspect we'll have to shuffle the organization a bit, but, yes, this allows individual validation in the current test setup. |
||
} | ||
}, | ||
"annotations": { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disposition? seems like an ambiguous word here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"disposition" precisely means "the way in which something is placed or arranged, especially in relation to other things".
However, "describes the disposition" stutters, as well. I'll tweak this. Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Fri, Jun 03, 2016 at 11:38:09AM -0700, Stephen Day wrote:
I'd just drop the line. The previous line and rest of the docs make
the idea clear enough.