Permitting arbitrary custom hashes as the basis for Artifact Identifiers should be removed. #69
Replies: 2 comments
-
On further discussion in today's WG meeting, I am thinking it may make more sense to specify that additional identifiers based on other hashes may be permitted, but the required identifiers must always be produced as well. So SHA-1DC, SHA-1, and SHA-256 would always be present (assuming we accept the concurrent suggestion made today to include these three as our core), and additional identifiers could be included in their own manifests. It should also probably be required that all identifiers use the "Git construction" of hashing in the Git blob header first. |
Beta Was this translation helpful? Give feedback.
-
Current discussion is leaning toward eliminating official support for custom hashes. If someone wants to do it there's a fairly clear way to do so, but we don't need to accept the additional complexity and enforce that complexity on all producers and consumers to support custom hashes. |
Beta Was this translation helpful? Give feedback.
-
The spec currently, in Annex A, opens the possibility of permitting the construction of Artifact Identifiers based on arbitrary alternative methods besides GitOID construction with SHA-1 or SHA-256. I believe this is a mistake and should be removed.
A key strength of the design of OmniBOR is that Artifact Identifiers are independently reproducible. They use well-known and widely-implemented hash functions (SHA-1 and SHA-256) with a well-known construction (GitOIDs). In many cases, projects are already using Git, and so have on hand, via Git itself, an easy way to get the Artifact Identifier of any file with
git hash-object <file>
.If custom Artifact Identifier constructions are permitted, then producers and consumers need some way to agree on the construction of those identifiers. For example, if a project decides to use BLAKE3 with no header info in identifier construction, they need some way to communicate that to consumers of the identifiers and manifests they produce for their software. This protocol is left undefined, and thus is left for consumers to figure out.
Consumers are then also put in a tough spot, as they need to in theory support an arbitrary number of identifier constructions. If they consume an OmniBOR manifest based on an unknown construction, they need some way to discover how those identifiers were produced, and their consumption tooling must be augmented to support that identifier construction.
This may be feasible in the case that an organization is producing and consuming only their own software, but that is not a normal use case, and we should expect that even in cases where this is the intent, manifests may sometimes cross to the outside world where they will not be understood or consumable.
I think leaving open this possibility is a recipe for complexity in consumption of OmniBOR manifests for minimal gain and trades off a core goal of OmniBOR that identifiers be independently reproducible.
Beta Was this translation helpful? Give feedback.
All reactions