Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schema: allow compound algorithm specifiers in digests #654

Merged

Conversation

stevvooe
Copy link
Contributor

While we currently have support for only sha256 in digests, OCI image
processors may encounter other kinds of "compound" digests, such as
tarsum+sha256. This change allows OCI validation stacks to correctly
validate these digest algorithms, then let them report failure based on
the lack of algorithm support.

Future cases include allow different encoding, such as a sha256+b64
and others. While this future proofs the field against this, this isn't
an endorsement to ride out and do this today. This just provides some
algorithmic agility that may be required later.

Note that this brings digest in line with what is supported across the
docker ecosystem today.

Signed-off-by: Stephen J Day stephen.day@docker.com

@AkihiroSuda
Copy link
Member

Do we need other than +?

@AkihiroSuda
Copy link
Member

Also, can we allow non-hexadecimal digest values?
e.g.
multihash+base58:QmRZxt2b1FVZPNqd8hsiykDL3TdBDeTSPX9Kv46HmX4Gx8

}
]
}
`,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need add fail: false,.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, you don't. Go initializes values to their zero value. The zero value for a boolean is false.

@stevvooe
Copy link
Contributor Author

Also, can we allow non-hexadecimal digest values?
e.g.
multihash+base58:QmRZxt2b1FVZPNqd8hsiykDL3TdBDeTSPX9Kv46HmX4Gx8

Agreed, but I'll do this in a follow up.

@vbatts
Copy link
Member

vbatts commented Apr 25, 2017

This lgtm, though there is no wording around this structure and what the separators indicate. If a value of sha256+farts+farts+farts:0228f90e926ba6b96e4f39cf294b2586d38fbb5a1e385c05cd1ee40ea54fe7fd is encountered, should it be attempted as sha256 or just errored out if I don't have a hash of sha256+farts+farts+farts?

@stevvooe
Copy link
Contributor Author

The algorithms are treated as hardcoded strings. The allowance of these characters is just for stylization and this PR is just reserving the ability to do this in the future.

sha256+farts+farts+farts would have to be specific algorithm registered with the digester, contained in at least three levels of farts, by convention. From the perspective of the digest dispatch in go-digest, sha256+farts+farts and sha256+farts+farts+farts are just strings without any relation.

@AkihiroSuda
Copy link
Member

What is the expected meaning of [._-]?
Does it differ from +?

@vbatts
Copy link
Member

vbatts commented Apr 25, 2017

right on. A brief explanation of that, as well as updating the regexp, in https://github.com/opencontainers/image-spec/blob/master/descriptor.md#digests-and-verification would be helpful then.

@stevvooe
Copy link
Contributor Author

@vbatts Widened the character set and updated the specification to match the goal.

@opencontainers/image-spec-maintainers PTAL

descriptor.md Outdated
hex := /[a-f0-9]+/
digest := algorithm ":" encoded
algorithm := /[a-z0-9]+(?:[+._-][a-z0-9]+)*/
encoded := /[a-zA-Z0-9]+/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the intention is to allow filesystem safe base 64, you'll want to add _=- to your encoded regexp. If the intention is to allow vanilla base 64, you'll want to add +/=.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh right. hrm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= actually isn't needed for base64. I'll add _-.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= actually isn't needed for base64. I'll add _-.

From the RFC:

Implementations MUST include appropriate pad characters at the end of encoded data unless the specification referring to this document explicitly states otherwise.

So if you do not allow = than all specs that use base 64 will need that explicit statement. This spec doesn't currently reference RFC 4648, so we don't need that explicit statement, but I see no reason to forbid = in the encoded part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking The padding is recoverable from the length of the string. It is not needed in this case or most uses of base64. In fact, the RFC says exactly that. Please be clear that we are not adding support for base64 encoded digests. We are ensuring that they are supportable in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are ensuring that they are supportable in the future.

Right, but without allowing trailing =, you're allowing them only when:

  • The alg defines the length of the unencoded hash, and
  • The alg spec explicitly states that pad characters MUST NOT be appended.

@vbatts
Copy link
Member

vbatts commented Apr 27, 2017

apart from allowing the charset for base64 in the encoded section, this LGTM

@stevvooe
Copy link
Contributor Author

@vbatts Updated.

@erikh
Copy link
Contributor

erikh commented Apr 28, 2017

can you see my comments on this subject on #599? Maybe we can unify this.

@stevvooe
Copy link
Contributor Author

@erikh I'm not sure that is fully related. Digest already has a history of a particular character set.

descriptor.md Outdated
hex := /[a-f0-9]+/
digest := algorithm ":" encoded
algorithm := /[a-z0-9]+(?:[+._-][a-z0-9]+)*/
encoded := /[a-zA-Z0-9_-]+/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base64 has = for padding as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= is not required for base64, since the length is known.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, how is the length known for arbitrary encoded types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The length of the digest is a part of the algorithm. Each algorithm has a fixed length.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that feels like an assumption, especially as this is introducing arbitrary encoding types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it is not an assumption. The algorithm must encode the length.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i must be missing something. i don't see that is not outlined anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crytpo.Hash, identified by the algorithm, has a fixed length output. It is a property of cryptographic hash functions. I'm pushing an update that allows this anyway. Do you have any examples of cryptographic hash functions that aren't parameterized by configuration of the algorithm? Effectively, they have to comparable in the same domain and typically that doesn't exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to allow it though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right right, but crypto.Hash is not what we're binding ourselves to here. It's being opened up to allow arbitrary strings for the algorithm. Just saying that it's an assumption that the algorithm being used has a fixed size.
Thanks for updating. I'll take a look.

@stevvooe
Copy link
Contributor Author

stevvooe commented May 2, 2017

@opencontainers/image-spec-maintainers PTAL

Let's move this forward so we can get the release out.

@stevvooe
Copy link
Contributor Author

stevvooe commented May 2, 2017

Updates:

  • Allow = in encoded portion
  • Decomposed regexp into components

@stevvooe stevvooe force-pushed the future-proof-digest-constraints branch from 430ffca to f92369f Compare May 2, 2017 23:23
@vbatts
Copy link
Member

vbatts commented May 3, 2017

heh
LGTM

Approved with PullApprove

descriptor.md Outdated
digest | algorithm |
------------------------------------------------------------------------|---------------------|
sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b | [SHA-256](#sha-256) |
digest | algorithm | Supported |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“Supported” isn't particularly clear to me. I think there are two interesting levels for algorithm identifiers in the spec:

  • Registered (e.g. listed here).
  • Required to be implemented (as we do for sha256 here).

For example, sha512 (in flight with #609) would be “registered” but would not be “required to be implemented”.

I'm in favor of staying DRY and dropping the “Supported” column here, but I'd also be ok with a “Registered” and/or “Required to be implemented” columns. Without a clearer definition, I'm not in favor of a “Supported” column.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this PR lands, you can submit a wildly complicated scheme to your liking. For now, this is quite clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh. Though "support" has meaning in contexts. This is perhaps clear enough

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is perhaps clear enough…

It will become more clear once this is rebased around #609 (which has since landed), since we can see what value sha512 gets in this column to distinguish “registered” from “required to be implemented”.

algorithm := algorithm-component [algorithm-separator algorithm-component]*
algorithm-component := /[a-z0-9]+/
algorithm-separator := /[+._-]/
encoded := /[a-zA-Z0-9=_-]+/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider adding the following sentence:

When encoded denotes a hex value, it MUST NOT (SHOULD NOT?) use upper characters, i.e. /[A-F]/. (when algorithm is sha256?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When encoded denotes a hex value, it MUST NOT (SHOULD NOT?) use upper characters, i.e. /[A-F]/. (when algorithm is sha256?)

This constraint belongs to the algorithm, not this portion of the specification. If you want to qualify this, please submit a second PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AkihiroSuda Maybe, it could be part of the registered table?

@vbatts
Copy link
Member

vbatts commented May 4, 2017

pls2rebase

stevvooe added 2 commits May 4, 2017 14:09
While we currently have support for only `sha256` in digests, OCI image
processors may encounter other kinds of "compound" digests, such as
`tarsum+sha256`. This change allows OCI validation stacks to correctly
validate these digest algorithms, then let them report failure based on
the lack of algorithm support.

Future cases include allow different encoding, such as a `sha256+b64`
and others. While this future proofs the field against this, this isn't
an endorsement to ride out and do this today. This just provides some
algorithmic agility that may be required later.

Note that this brings digest in line with what is supported across the
docker ecosystem today.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
In addition to the changes to allow separators, it was pointed out that
we should also accept an expanded character set for the encoded portion
of the digest.

Again, this is to ensure that future formats validate properly but the
result is left to the algorithm implementation.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
stevvooe added 4 commits May 4, 2017 14:09
Signed-off-by: Stephen J Day <stephen.day@docker.com>
After some changes to the schema to open up the character set and add
separators to the digest algorithm, this change set ensures we have a
consistent definition for the components of a digest. The specification
has been updated to clarify this decision as well as ensure the
specification matches the validation components across the board.

The portion of a digest known as `hex` is now known as `encoded` to
correspond with the wider character set allowed.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
Signed-off-by: Stephen J Day <stephen.day@docker.com>
The digest grammar is now further decomposed into compontents to remove
reliance on regex grouping syntax.

We also extend the character set to include `=` sign which enables
variable length base64 encoding.

Signed-off-by: Stephen J Day <stephen.day@docker.com>
@stevvooe stevvooe force-pushed the future-proof-digest-constraints branch from f92369f to 5a6b982 Compare May 4, 2017 21:09
Signed-off-by: Stephen J Day <stephen.day@docker.com>
@stevvooe stevvooe force-pushed the future-proof-digest-constraints branch from 5a6b982 to b52b2bf Compare May 4, 2017 21:11
@stevvooe
Copy link
Contributor Author

stevvooe commented May 4, 2017

pls2rebase

Done.

algorithm := /[a-z0-9_+.-]+/
hex := /[a-f0-9]+/
digest := algorithm ":" encoded
algorithm := algorithm-component [algorithm-separator algorithm-component]*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a nit, but does this grammar implies that there be only a single separator+component?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[] is "optional" and * is zero or more. This should match the following productions (A = algorithm-component, S = algorithm-separator):

A
ASA
ASASA

The following would not be matched:

SA
ASAS

Put more succinctly, it allows a separator to appear sandwiched by algorithm-component, non-contiguously.

@vbatts
Copy link
Member

vbatts commented May 5, 2017

besides the nit question, i reckon this LGTM

@vbatts
Copy link
Member

vbatts commented May 5, 2017

silly pullapprove
👍

Approved with PullApprove

@stevvooe
Copy link
Contributor Author

stevvooe commented May 8, 2017

@opencontainers/image-spec-maintainers PTAL

@philips
Copy link
Contributor

philips commented May 9, 2017

LGTM, thanks @stevvooe

Approved with PullApprove

@stevvooe stevvooe merged commit 3690645 into opencontainers:master May 9, 2017
@stevvooe stevvooe deleted the future-proof-digest-constraints branch May 9, 2017 19:07
@vbatts vbatts mentioned this pull request May 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants