-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
descriptor: recognise sha512, tweak wording #609
Conversation
descriptor.md
Outdated
|
||
The value of the digest property, the _digest string_, is a serialized hash result, consisting of an _algorithm_ portion and a _hex_ portion. | ||
The algorithm identifies the methodology used to calculate the digest; the hex portion is the lowercase hex-encoded result of the hash. | ||
The value of the digest property, the _digest string_, is a serialized hash result, consisting of an _algorithm_ portion (the "algorithm identifier") and a _hex_ portion (the digest). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer we refer to the whole string as the “digest” and the part after the semicolon as the “hash” or the “hex” to match our definition and implementation. Calling the hash/hex portion the “digest” is confusing if the whole thing is so similar (the “digest string”) and is represented by the Digest
type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to find consistency in the terminology use in this area, but I agree this can be improved; see 23af834
@@ -76,20 +76,17 @@ hex := /[a-f0-9]+/ | |||
|
|||
Some example digest strings include the following: | |||
|
|||
digest | algorithm | | |||
digest string | algorithm | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With 23af834 moving us towards consistently using digest/algorithm (identifier)/hex matching our ABNF, I think we may want to stick to “digest” instead of using “digest string”. I don't mind if existing instances of “digest string” are changed to “digest” in this PR or not, but I think this line (and the “Before consuming…” line below) should be left alone in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine like this in context. If you want then submit a follow up that changes those AND the preceding two references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just referred to as a "digest". String here is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll buy that we can resolve the digest vs. digest string thing separately.
This is another ignored-by-PullApprove approval ;). Did you want to LGTM this PR? |
needs a rebase |
Besides the mailing list thread referenced in the diff, this algorithm identifier is supported in go-digest [1]. [1]: https://github.com/opencontainers/go-digest/blob/v1.0.0-rc0/algorithm.go#L33 Signed-off-by: W. Trevor King <wking@tremily.us>
Signed-off-by: Jonathan Boulle <jonathanboulle@gmail.com>
rebased |
bump (before something happens that requires another rebase!) |
Would have liked to have reviewed this one before going forward... |
@@ -59,10 +59,10 @@ Extended _Descriptor_ field additions proposed in other OCI specifications SHOUL | |||
|
|||
The _digest_ property of a Descriptor acts as a content identifier, enabling [content addressability](http://en.wikipedia.org/wiki/Content-addressable_storage). | |||
It uniquely identifies content by taking a [collision-resistant hash](https://en.wikipedia.org/wiki/Cryptographic_hash_function) of the bytes. | |||
If the identifier can be communicated in a secure manner, one can retrieve the content from an insecure source, calculate the digest independently, and be certain that the correct content was obtained. | |||
If the digest can be communicated in a secure manner, one can retrieve the content from an insecure source, recalculate the digest independently, and be certain that the correct content was obtained. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing to digest
here is incorrect. We are trying to communicate the concept of calculating a common identifier.
* Before calculating the digest, the size of the content SHOULD be verified to reduce hash collision space. | ||
* Heavy processing before calculating a hash SHOULD be avoided. | ||
* Implementations MAY employ some canonicalization of the underlying content to ensure stable content identifiers. | ||
* Implementations MAY employ [canonicalization](canonicalization.md) of the underlying content to ensure stable content identifiers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implies that the canonicalization is limited to those described in the document. Implementations may employ any kind of canonicalization they want in the generation of content.
|
||
While the _algorithm_ component of the digest does allow one to utilize a wide variety of algorithms, compliant implementations SHOULD use [SHA-256](#sha-256). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the algorithm component of the digest does allow one to utilize a wide variety of algorithms, compliant implementations SHOULD use those specified here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, this moved down.
Implementations MUST implement SHA-256 digest verification for use in descriptors. | ||
|
||
#### SHA-512 | ||
|
||
[SHA-512][rfc4634-s4.2] is a collision-resistant hash function which [may be more perfomant][sha256-vs-sha512] than [SHA-256](#sha-256) on some CPUs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implies that performance is a good reason to select one digest algorithm over another, when that is not the case at all. The most important factors in the selection of a digest algorithm is that it is common to all implementations.
In practice, the use of sha512
will likely cause the introduction of incompatible images.
This request is a late change as you try to close the Image Spec. If you are focused on finishing a spec I would suggest you either:
|
@stephenrwalli I get calling out that implementations MAY choose others, but this is not a late change. Calling out a hash like sha512 has been a couple year lingering conversation. Besides sha256 isn't immediately going away. This is not a product vs. spec issue. |
As previously discussed in #589, implementations may wish to use SHA-512 for
performance/security reasons, and since our endorsed digest package also
supports it, let's add the algorithm identifier to the recognised ones in the
spec.
At the same time, we keep wording to clarify (and indeed mandate) that SHA-256
MUST be supported by implementations and is preferred for interoperability
reasons.