-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add in-toto format for items with hash of hashes #264
Add in-toto format for items with hash of hashes #264
Conversation
e411db8
to
a92b6cc
Compare
a92b6cc
to
8875185
Compare
for descriptor in manifest.resource_descriptors(): | ||
hasher.update(descriptor.digest.digest_value) | ||
subjects.append({ | ||
"name": descriptor.identifier, | ||
"digest": descriptor.digest.digest_hex, | ||
"algorithm": descriptor.digest.algorithm, | ||
}) | ||
|
||
digest = {"sha256": hasher.compute().digest_hex} | ||
descriptor = statement.ResourceDescriptor(digest=digest).pb | ||
|
||
return statement.Statement( | ||
subjects=[descriptor], | ||
predicate_type=predicate_type, | ||
predicate={predicate_top_level_name: subjects}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the subject be a list of the file hashes?
It feels a bit like misusing the Statement format when the digests are put into the predicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really see the verification argument. It applies to every format you roll with, if there is another implementation there is always the risk of doing it wrong/different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just implemented all manifest ideas discussed in #111 so we can experiment with them and see which one is more ergonomic, better suited for our needs, less prone to failures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this specific PR is what's being discussed in #248 (comment)
I think my biggest concern with this proposal (and #265, #266, #267) is that it doesn't allow for any other predicates to be made for models in the future. It can only sign things, but cannot convey any additional information with that signature (things like what types of data this model was trained on, etc...). It sounds like the primary reason for this approach is that Sigstore cannot support other hash types in Rekor. To me that seems like the more important problem to solve (but I don't understand the technical problems with doing so). |
Right, these are just for signing, though we can extend the payload to contain more information, I think |
This converts model serialization manifests that record every model file hash into an in-toto payload that can then be passed to Sigstore's `sign_intoto` for signing to generate a Sigstore `Bundle` (if using Sigstore). To identify the models, we compute a hash of all hashes of the files and use that as the subject. The individual file hashes are used as the payload and we would have the verifier check them as part of the verification process. Signed-off-by: Mihai Maruseac <mihaimaruseac@google.com>
8875185
to
69e6d59
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deferring to Laurent's review, I did confirm Mihai's latest force-push was a rebase on top of main.
|
||
predicate_type: Final[str] = "https://model_signing/DigestOfDigests/v0.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was curious if it mattered that this wasn't resolvable. The answer is apparently no, but should we namespace this any more to avoid collisions?
SHOULD resolve to a human-readable description, but MAY be unresolvable. SHOULD include a version number to allow for revisions.
TypeURIs are not registered. The natural namespacing of URIs is sufficient to prevent collisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely yes. Right now all these constants are just for testing. When we standardize, we will pick sensible normal values for them.
Thank you for flagging and the review!
Summary
Note: This is an experiment serialization, one of the 4 in a series of PRs (#264, #265, #266, #267). Before a stable release of the library, we would standardize on an ergonomic format, with as little corner cases / dangerous corners as possible.
This converts model serialization manifests that record every model file hash into an in-toto payload that can then be passed to Sigstore's
sign_intoto
for signing to generate a SigstoreBundle
(if using Sigstore).To identify the models, we compute a hash of all hashes of the files and use that as the subject. The individual file hashes are used as the payload and we would have the verifier check them as part of the verification process.
CC @susperius for converting manifest to in-toto. This should cover #111, #224, and #248 (first part of the machinery). CC @laurentsimon and (optionally) @TomHennen to make sure I did not mishandle in-toto.
Note: This builds on #263. I decided to split every feature into its own PR to make it easier to review what changes (should be only the last commit) and to be able to merge partial work and continue from there.
Release Note
NONE
Documentation
NONE