-
Notifications
You must be signed in to change notification settings - Fork 687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
schema: allow compound algorithm specifiers in digests #654
schema: allow compound algorithm specifiers in digests #654
Conversation
Do we need other than |
Also, can we allow non-hexadecimal digest values? |
schema/manifest_test.go
Outdated
} | ||
] | ||
} | ||
`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need add fail: false,
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, you don't. Go initializes values to their zero value. The zero value for a boolean is false.
Agreed, but I'll do this in a follow up. |
This lgtm, though there is no wording around this structure and what the separators indicate. If a value of |
The algorithms are treated as hardcoded strings. The allowance of these characters is just for stylization and this PR is just reserving the ability to do this in the future.
|
What is the expected meaning of |
right on. A brief explanation of that, as well as updating the regexp, in https://github.com/opencontainers/image-spec/blob/master/descriptor.md#digests-and-verification would be helpful then. |
@vbatts Widened the character set and updated the specification to match the goal. @opencontainers/image-spec-maintainers PTAL |
descriptor.md
Outdated
hex := /[a-f0-9]+/ | ||
digest := algorithm ":" encoded | ||
algorithm := /[a-z0-9]+(?:[+._-][a-z0-9]+)*/ | ||
encoded := /[a-zA-Z0-9]+/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the intention is to allow filesystem safe base 64, you'll want to add _=-
to your encoded
regexp. If the intention is to allow vanilla base 64, you'll want to add +/=
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh right. hrm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=
actually isn't needed for base64
. I'll add _-
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=
actually isn't needed for base64. I'll add_-
.
Implementations MUST include appropriate pad characters at the end of encoded data unless the specification referring to this document explicitly states otherwise.
So if you do not allow =
than all specs that use base 64 will need that explicit statement. This spec doesn't currently reference RFC 4648, so we don't need that explicit statement, but I see no reason to forbid =
in the encoded
part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking The padding is recoverable from the length of the string. It is not needed in this case or most uses of base64. In fact, the RFC says exactly that. Please be clear that we are not adding support for base64 encoded digests. We are ensuring that they are supportable in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are ensuring that they are supportable in the future.
Right, but without allowing trailing =
, you're allowing them only when:
- The alg defines the length of the unencoded hash, and
- The alg spec explicitly states that pad characters MUST NOT be appended.
apart from allowing the charset for base64 in the |
@vbatts Updated. |
can you see my comments on this subject on #599? Maybe we can unify this. |
@erikh I'm not sure that is fully related. Digest already has a history of a particular character set. |
descriptor.md
Outdated
hex := /[a-f0-9]+/ | ||
digest := algorithm ":" encoded | ||
algorithm := /[a-z0-9]+(?:[+._-][a-z0-9]+)*/ | ||
encoded := /[a-zA-Z0-9_-]+/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
base64 has =
for padding as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=
is not required for base64
, since the length is known.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, how is the length known for arbitrary encoded types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The length of the digest is a part of the algorithm. Each algorithm has a fixed length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that feels like an assumption, especially as this is introducing arbitrary encoding types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it is not an assumption. The algorithm must encode the length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i must be missing something. i don't see that is not outlined anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crytpo.Hash
, identified by the algorithm, has a fixed length output. It is a property of cryptographic hash functions. I'm pushing an update that allows this anyway. Do you have any examples of cryptographic hash functions that aren't parameterized by configuration of the algorithm? Effectively, they have to comparable in the same domain and typically that doesn't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to allow it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right right, but crypto.Hash
is not what we're binding ourselves to here. It's being opened up to allow arbitrary strings for the algorithm. Just saying that it's an assumption that the algorithm being used has a fixed size.
Thanks for updating. I'll take a look.
@opencontainers/image-spec-maintainers PTAL Let's move this forward so we can get the release out. |
Updates:
|
430ffca
to
f92369f
Compare
descriptor.md
Outdated
digest | algorithm | | ||
------------------------------------------------------------------------|---------------------| | ||
sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b | [SHA-256](#sha-256) | | ||
digest | algorithm | Supported | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“Supported” isn't particularly clear to me. I think there are two interesting levels for algorithm identifiers in the spec:
For example, sha512
(in flight with #609) would be “registered” but would not be “required to be implemented”.
I'm in favor of staying DRY and dropping the “Supported” column here, but I'd also be ok with a “Registered” and/or “Required to be implemented” columns. Without a clearer definition, I'm not in favor of a “Supported” column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this PR lands, you can submit a wildly complicated scheme to your liking. For now, this is quite clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh. Though "support" has meaning in contexts. This is perhaps clear enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is perhaps clear enough…
It will become more clear once this is rebased around #609 (which has since landed), since we can see what value sha512
gets in this column to distinguish “registered” from “required to be implemented”.
algorithm := algorithm-component [algorithm-separator algorithm-component]* | ||
algorithm-component := /[a-z0-9]+/ | ||
algorithm-separator := /[+._-]/ | ||
encoded := /[a-zA-Z0-9=_-]+/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider adding the following sentence:
When encoded
denotes a hex value, it MUST NOT (SHOULD NOT?) use upper characters, i.e. /[A-F]/
. (when algorithm
is sha256
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When encoded denotes a hex value, it MUST NOT (SHOULD NOT?) use upper characters, i.e. /[A-F]/. (when algorithm is sha256?)
This constraint belongs to the algorithm, not this portion of the specification. If you want to qualify this, please submit a second PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AkihiroSuda Maybe, it could be part of the registered table?
pls2rebase |
While we currently have support for only `sha256` in digests, OCI image processors may encounter other kinds of "compound" digests, such as `tarsum+sha256`. This change allows OCI validation stacks to correctly validate these digest algorithms, then let them report failure based on the lack of algorithm support. Future cases include allow different encoding, such as a `sha256+b64` and others. While this future proofs the field against this, this isn't an endorsement to ride out and do this today. This just provides some algorithmic agility that may be required later. Note that this brings digest in line with what is supported across the docker ecosystem today. Signed-off-by: Stephen J Day <stephen.day@docker.com>
In addition to the changes to allow separators, it was pointed out that we should also accept an expanded character set for the encoded portion of the digest. Again, this is to ensure that future formats validate properly but the result is left to the algorithm implementation. Signed-off-by: Stephen J Day <stephen.day@docker.com>
Signed-off-by: Stephen J Day <stephen.day@docker.com>
After some changes to the schema to open up the character set and add separators to the digest algorithm, this change set ensures we have a consistent definition for the components of a digest. The specification has been updated to clarify this decision as well as ensure the specification matches the validation components across the board. The portion of a digest known as `hex` is now known as `encoded` to correspond with the wider character set allowed. Signed-off-by: Stephen J Day <stephen.day@docker.com>
Signed-off-by: Stephen J Day <stephen.day@docker.com>
The digest grammar is now further decomposed into compontents to remove reliance on regex grouping syntax. We also extend the character set to include `=` sign which enables variable length base64 encoding. Signed-off-by: Stephen J Day <stephen.day@docker.com>
f92369f
to
5a6b982
Compare
Signed-off-by: Stephen J Day <stephen.day@docker.com>
5a6b982
to
b52b2bf
Compare
Done. |
algorithm := /[a-z0-9_+.-]+/ | ||
hex := /[a-f0-9]+/ | ||
digest := algorithm ":" encoded | ||
algorithm := algorithm-component [algorithm-separator algorithm-component]* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a nit, but does this grammar implies that there be only a single separator+component?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[]
is "optional" and *
is zero or more. This should match the following productions (A
= algorithm-component, S
= algorithm-separator):
A
ASA
ASASA
The following would not be matched:
SA
ASAS
Put more succinctly, it allows a separator to appear sandwiched by algorithm-component, non-contiguously.
besides the nit question, i reckon this LGTM |
@opencontainers/image-spec-maintainers PTAL |
LGTM, thanks @stevvooe |
While we currently have support for only
sha256
in digests, OCI imageprocessors may encounter other kinds of "compound" digests, such as
tarsum+sha256
. This change allows OCI validation stacks to correctlyvalidate these digest algorithms, then let them report failure based on
the lack of algorithm support.
Future cases include allow different encoding, such as a
sha256+b64
and others. While this future proofs the field against this, this isn't
an endorsement to ride out and do this today. This just provides some
algorithmic agility that may be required later.
Note that this brings digest in line with what is supported across the
docker ecosystem today.
Signed-off-by: Stephen J Day stephen.day@docker.com