schema: allow compound algorithm specifiers in digests #654

stevvooe · 2017-04-24T22:01:08Z

While we currently have support for only sha256 in digests, OCI image
processors may encounter other kinds of "compound" digests, such as
tarsum+sha256. This change allows OCI validation stacks to correctly
validate these digest algorithms, then let them report failure based on
the lack of algorithm support.

Future cases include allow different encoding, such as a sha256+b64
and others. While this future proofs the field against this, this isn't
an endorsement to ride out and do this today. This just provides some
algorithmic agility that may be required later.

Note that this brings digest in line with what is supported across the
docker ecosystem today.

Signed-off-by: Stephen J Day stephen.day@docker.com

AkihiroSuda · 2017-04-25T01:23:52Z

Do we need other than +?

AkihiroSuda · 2017-04-25T01:27:02Z

Also, can we allow non-hexadecimal digest values?
e.g.
multihash+base58:QmRZxt2b1FVZPNqd8hsiykDL3TdBDeTSPX9Kv46HmX4Gx8

zhouhao3 · 2017-04-25T01:42:59Z

schema/manifest_test.go

+    }
+  ]
+}
+`,


Need add fail: false,.

Actually, you don't. Go initializes values to their zero value. The zero value for a boolean is false.

stevvooe · 2017-04-25T18:16:40Z

Also, can we allow non-hexadecimal digest values?
e.g.
multihash+base58:QmRZxt2b1FVZPNqd8hsiykDL3TdBDeTSPX9Kv46HmX4Gx8

Agreed, but I'll do this in a follow up.

vbatts · 2017-04-25T18:19:28Z

This lgtm, though there is no wording around this structure and what the separators indicate. If a value of sha256+farts+farts+farts:0228f90e926ba6b96e4f39cf294b2586d38fbb5a1e385c05cd1ee40ea54fe7fd is encountered, should it be attempted as sha256 or just errored out if I don't have a hash of sha256+farts+farts+farts?

stevvooe · 2017-04-25T18:28:48Z

The algorithms are treated as hardcoded strings. The allowance of these characters is just for stylization and this PR is just reserving the ability to do this in the future.

sha256+farts+farts+farts would have to be specific algorithm registered with the digester, contained in at least three levels of farts, by convention. From the perspective of the digest dispatch in go-digest, sha256+farts+farts and sha256+farts+farts+farts are just strings without any relation.

AkihiroSuda · 2017-04-25T18:31:48Z

What is the expected meaning of [._-]?
Does it differ from +?

vbatts · 2017-04-25T18:32:28Z

right on. A brief explanation of that, as well as updating the regexp, in https://github.com/opencontainers/image-spec/blob/master/descriptor.md#digests-and-verification would be helpful then.

stevvooe · 2017-04-25T19:25:16Z

@vbatts Widened the character set and updated the specification to match the goal.

@opencontainers/image-spec-maintainers PTAL

wking · 2017-04-27T16:28:58Z

descriptor.md

-hex         := /[a-f0-9]+/
+digest      := algorithm ":" encoded
+algorithm   := /[a-z0-9]+(?:[+._-][a-z0-9]+)*/
+encoded     := /[a-zA-Z0-9]+/


If the intention is to allow filesystem safe base 64, you'll want to add _=- to your encoded regexp. If the intention is to allow vanilla base 64, you'll want to add +/=.

oh right. hrm.

= actually isn't needed for base64. I'll add _-.

= actually isn't needed for base64. I'll add _-.

From the RFC:

Implementations MUST include appropriate pad characters at the end of encoded data unless the specification referring to this document explicitly states otherwise.

So if you do not allow = than all specs that use base 64 will need that explicit statement. This spec doesn't currently reference RFC 4648, so we don't need that explicit statement, but I see no reason to forbid = in the encoded part.

@wking The padding is recoverable from the length of the string. It is not needed in this case or most uses of base64. In fact, the RFC says exactly that. Please be clear that we are not adding support for base64 encoded digests. We are ensuring that they are supportable in the future.

We are ensuring that they are supportable in the future.

Right, but without allowing trailing =, you're allowing them only when:

The alg defines the length of the unencoded hash, and

The alg spec explicitly states that pad characters MUST NOT be appended.

vbatts · 2017-04-27T19:18:46Z

apart from allowing the charset for base64 in the encoded section, this LGTM

stevvooe · 2017-04-27T20:26:51Z

@vbatts Updated.

erikh · 2017-04-28T12:39:21Z

can you see my comments on this subject on #599? Maybe we can unify this.

stevvooe · 2017-04-28T19:15:29Z

@erikh I'm not sure that is fully related. Digest already has a history of a particular character set.

vbatts · 2017-05-01T18:41:43Z

descriptor.md

-hex         := /[a-f0-9]+/
+digest      := algorithm ":" encoded
+algorithm   := /[a-z0-9]+(?:[+._-][a-z0-9]+)*/
+encoded     := /[a-zA-Z0-9_-]+/


base64 has = for padding as well

= is not required for base64, since the length is known.

sorry, how is the length known for arbitrary encoded types?

The length of the digest is a part of the algorithm. Each algorithm has a fixed length.

that feels like an assumption, especially as this is introducing arbitrary encoding types.

But it is not an assumption. The algorithm must encode the length.

i must be missing something. i don't see that is not outlined anywhere.

crytpo.Hash, identified by the algorithm, has a fixed length output. It is a property of cryptographic hash functions. I'm pushing an update that allows this anyway. Do you have any examples of cryptographic hash functions that aren't parameterized by configuration of the algorithm? Effectively, they have to comparable in the same domain and typically that doesn't exist.

Updated to allow it though.

right right, but crypto.Hash is not what we're binding ourselves to here. It's being opened up to allow arbitrary strings for the algorithm. Just saying that it's an assumption that the algorithm being used has a fixed size.
Thanks for updating. I'll take a look.

stevvooe · 2017-05-02T18:21:18Z

@opencontainers/image-spec-maintainers PTAL

Let's move this forward so we can get the release out.

stevvooe · 2017-05-02T23:20:00Z

Updates:

Allow = in encoded portion
Decomposed regexp into components

vbatts · 2017-05-03T18:46:37Z

heh
LGTM

wking · 2017-05-03T20:07:59Z

descriptor.md

-digest                                                                  | algorithm           |
------------------------------------------------------------------------|---------------------|
-sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b | [SHA-256](#sha-256) |
+digest                                                                     | algorithm           | Supported |


“Supported” isn't particularly clear to me. I think there are two interesting levels for algorithm identifiers in the spec:

Registered (e.g. listed here).

Required to be implemented (as we do for sha256 here).

For example, sha512 (in flight with #609) would be “registered” but would not be “required to be implemented”.

I'm in favor of staying DRY and dropping the “Supported” column here, but I'd also be ok with a “Registered” and/or “Required to be implemented” columns. Without a clearer definition, I'm not in favor of a “Supported” column.

When this PR lands, you can submit a wildly complicated scheme to your liking. For now, this is quite clear.

heh. Though "support" has meaning in contexts. This is perhaps clear enough

This is perhaps clear enough…

It will become more clear once this is rebased around #609 (which has since landed), since we can see what value sha512 gets in this column to distinguish “registered” from “required to be implemented”.

AkihiroSuda · 2017-05-04T05:16:06Z

descriptor.md

+algorithm             := algorithm-component [algorithm-separator algorithm-component]*
+algorithm-component   := /[a-z0-9]+/
+algorithm-separator   := /[+._-]/
+encoded               := /[a-zA-Z0-9=_-]+/


Please consider adding the following sentence:

When encoded denotes a hex value, it MUST NOT (SHOULD NOT?) use upper characters, i.e. /[A-F]/. (when algorithm is sha256?)

When encoded denotes a hex value, it MUST NOT (SHOULD NOT?) use upper characters, i.e. /[A-F]/. (when algorithm is sha256?)

This constraint belongs to the algorithm, not this portion of the specification. If you want to qualify this, please submit a second PR.

@AkihiroSuda Maybe, it could be part of the registered table?

vbatts · 2017-05-04T18:54:50Z

pls2rebase

While we currently have support for only `sha256` in digests, OCI image processors may encounter other kinds of "compound" digests, such as `tarsum+sha256`. This change allows OCI validation stacks to correctly validate these digest algorithms, then let them report failure based on the lack of algorithm support. Future cases include allow different encoding, such as a `sha256+b64` and others. While this future proofs the field against this, this isn't an endorsement to ride out and do this today. This just provides some algorithmic agility that may be required later. Note that this brings digest in line with what is supported across the docker ecosystem today. Signed-off-by: Stephen J Day <stephen.day@docker.com>

In addition to the changes to allow separators, it was pointed out that we should also accept an expanded character set for the encoded portion of the digest. Again, this is to ensure that future formats validate properly but the result is left to the algorithm implementation. Signed-off-by: Stephen J Day <stephen.day@docker.com>

Signed-off-by: Stephen J Day <stephen.day@docker.com>

After some changes to the schema to open up the character set and add separators to the digest algorithm, this change set ensures we have a consistent definition for the components of a digest. The specification has been updated to clarify this decision as well as ensure the specification matches the validation components across the board. The portion of a digest known as `hex` is now known as `encoded` to correspond with the wider character set allowed. Signed-off-by: Stephen J Day <stephen.day@docker.com>

Signed-off-by: Stephen J Day <stephen.day@docker.com>

The digest grammar is now further decomposed into compontents to remove reliance on regex grouping syntax. We also extend the character set to include `=` sign which enables variable length base64 encoding. Signed-off-by: Stephen J Day <stephen.day@docker.com>

Signed-off-by: Stephen J Day <stephen.day@docker.com>

stevvooe · 2017-05-04T21:11:44Z

pls2rebase

Done.

vbatts · 2017-05-05T19:25:55Z

descriptor.md

-algorithm   := /[a-z0-9_+.-]+/
-hex         := /[a-f0-9]+/
+digest                := algorithm ":" encoded
+algorithm             := algorithm-component [algorithm-separator algorithm-component]*


maybe a nit, but does this grammar implies that there be only a single separator+component?

[] is "optional" and * is zero or more. This should match the following productions (A = algorithm-component, S = algorithm-separator):

A ASA ASASA

The following would not be matched:

SA ASAS

Put more succinctly, it allows a separator to appear sandwiched by algorithm-component, non-contiguously.

vbatts · 2017-05-05T19:41:19Z

besides the nit question, i reckon this LGTM

vbatts · 2017-05-05T19:44:21Z

silly pullapprove
👍

stevvooe · 2017-05-08T23:33:35Z

@opencontainers/image-spec-maintainers PTAL

philips · 2017-05-09T15:18:49Z

LGTM, thanks @stevvooe

stevvooe requested a review from vbatts April 24, 2017 22:01

stevvooe added the 1.0-release-blocker label Apr 24, 2017

This was referenced Apr 24, 2017

off-by-one-error parsing image_name's tag distribution/distribution#2248

Closed

digest: allow separators in algorithm field opencontainers/go-digest#33

Merged

zhouhao3 reviewed Apr 25, 2017

View reviewed changes

stevvooe added this to the v1.0.0-rc6 milestone Apr 26, 2017

wking mentioned this pull request Apr 27, 2017

image-layout: fix inconsistent description about external blob store #656

Merged

wking reviewed Apr 27, 2017

View reviewed changes

vbatts reviewed May 1, 2017

View reviewed changes

stevvooe force-pushed the future-proof-digest-constraints branch from 430ffca to f92369f Compare May 2, 2017 23:23

wking reviewed May 3, 2017

View reviewed changes

AkihiroSuda reviewed May 4, 2017

View reviewed changes

stevvooe added 2 commits May 4, 2017 14:09

stevvooe added 4 commits May 4, 2017 14:09

schema: add test cases to descriptor type

2e9f3dd

Signed-off-by: Stephen J Day <stephen.day@docker.com>

schema/digest: include characters urlsafe base64 encoding

7637741

Signed-off-by: Stephen J Day <stephen.day@docker.com>

stevvooe force-pushed the future-proof-digest-constraints branch from f92369f to 5a6b982 Compare May 4, 2017 21:09

spec: clean up digest example table

b52b2bf

Signed-off-by: Stephen J Day <stephen.day@docker.com>

stevvooe force-pushed the future-proof-digest-constraints branch from 5a6b982 to b52b2bf Compare May 4, 2017 21:11

vbatts reviewed May 5, 2017

View reviewed changes

stevvooe merged commit 3690645 into opencontainers:master May 9, 2017

stevvooe deleted the future-proof-digest-constraints branch May 9, 2017 19:07

vbatts mentioned this pull request May 19, 2017

Bump version to rc6 #681

Merged

schema: allow compound algorithm specifiers in digests #654

schema: allow compound algorithm specifiers in digests #654

Conversation

stevvooe commented Apr 24, 2017

AkihiroSuda commented Apr 25, 2017

AkihiroSuda commented Apr 25, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevvooe commented Apr 25, 2017

vbatts commented Apr 25, 2017

stevvooe commented Apr 25, 2017

AkihiroSuda commented Apr 25, 2017

vbatts commented Apr 25, 2017

stevvooe commented Apr 25, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vbatts commented Apr 27, 2017

stevvooe commented Apr 27, 2017

erikh commented Apr 28, 2017

stevvooe commented Apr 28, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevvooe commented May 2, 2017

stevvooe commented May 2, 2017

vbatts commented May 3, 2017 • edited by caniszczyk Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vbatts commented May 4, 2017

stevvooe commented May 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vbatts commented May 5, 2017 • edited Loading

vbatts commented May 5, 2017 • edited by caniszczyk Loading

stevvooe commented May 8, 2017

philips commented May 9, 2017 • edited by caniszczyk Loading

vbatts commented May 3, 2017 •

edited by caniszczyk

Loading

vbatts commented May 5, 2017 •

edited

Loading

vbatts commented May 5, 2017 •

edited by caniszczyk

Loading

philips commented May 9, 2017 •

edited by caniszczyk

Loading