Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add in-toto format for items with hash of hashes #264

Merged

Conversation

mihaimaruseac
Copy link
Collaborator

@mihaimaruseac mihaimaruseac commented Jul 27, 2024

Summary

Note: This is an experiment serialization, one of the 4 in a series of PRs (#264, #265, #266, #267). Before a stable release of the library, we would standardize on an ergonomic format, with as little corner cases / dangerous corners as possible.

This converts model serialization manifests that record every model file hash into an in-toto payload that can then be passed to Sigstore's sign_intoto for signing to generate a Sigstore Bundle (if using Sigstore).

To identify the models, we compute a hash of all hashes of the files and use that as the subject. The individual file hashes are used as the payload and we would have the verifier check them as part of the verification process.

CC @susperius for converting manifest to in-toto. This should cover #111, #224, and #248 (first part of the machinery). CC @laurentsimon and (optionally) @TomHennen to make sure I did not mishandle in-toto.

Note: This builds on #263. I decided to split every feature into its own PR to make it easier to review what changes (should be only the last commit) and to be able to merge partial work and continue from there.

Release Note

NONE

Documentation

NONE

@mihaimaruseac mihaimaruseac requested review from a team as code owners July 27, 2024 16:41
@mihaimaruseac mihaimaruseac force-pushed the in-toto-digests-of-digests branch 2 times, most recently from e411db8 to a92b6cc Compare July 27, 2024 17:19
@mihaimaruseac mihaimaruseac added this to the V1 release milestone Jul 27, 2024
@mihaimaruseac mihaimaruseac force-pushed the in-toto-digests-of-digests branch from a92b6cc to 8875185 Compare July 29, 2024 23:26
Comment on lines +145 to +160
for descriptor in manifest.resource_descriptors():
hasher.update(descriptor.digest.digest_value)
subjects.append({
"name": descriptor.identifier,
"digest": descriptor.digest.digest_hex,
"algorithm": descriptor.digest.algorithm,
})

digest = {"sha256": hasher.compute().digest_hex}
descriptor = statement.ResourceDescriptor(digest=digest).pb

return statement.Statement(
subjects=[descriptor],
predicate_type=predicate_type,
predicate={predicate_top_level_name: subjects},
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the subject be a list of the file hashes?

It feels a bit like misusing the Statement format when the digests are put into the predicate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing that in #266. But in #111 there was a discussion that the semantics of verification on subjects is not working properly for models, so we put as subject a single digest for the entire model and then the actual hashes in payload to reconstruct as needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see the verification argument. It applies to every format you roll with, if there is another implementation there is always the risk of doing it wrong/different.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just implemented all manifest ideas discussed in #111 so we can experiment with them and see which one is more ergonomic, better suited for our needs, less prone to failures

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this specific PR is what's being discussed in #248 (comment)

@TomHennen
Copy link

I think my biggest concern with this proposal (and #265, #266, #267) is that it doesn't allow for any other predicates to be made for models in the future. It can only sign things, but cannot convey any additional information with that signature (things like what types of data this model was trained on, etc...).

It sounds like the primary reason for this approach is that Sigstore cannot support other hash types in Rekor. To me that seems like the more important problem to solve (but I don't understand the technical problems with doing so).

@mihaimaruseac
Copy link
Collaborator Author

Right, these are just for signing, though we can extend the payload to contain more information, I think

laurentsimon
laurentsimon previously approved these changes Jul 31, 2024
This converts model serialization manifests that record every model file
hash into an in-toto payload that can then be passed to Sigstore's
`sign_intoto` for signing to generate a Sigstore `Bundle` (if using Sigstore).

To identify the models, we compute a hash of all hashes of the files and
use that as the subject. The individual file hashes are used as the
payload and we would have the verifier check them as part of the
verification process.

Signed-off-by: Mihai Maruseac <mihaimaruseac@google.com>
Copy link
Contributor

@spencerschrock spencerschrock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferring to Laurent's review, I did confirm Mihai's latest force-push was a rebase on top of main.

Comment on lines +218 to +219

predicate_type: Final[str] = "https://model_signing/DigestOfDigests/v0.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious if it mattered that this wasn't resolvable. The answer is apparently no, but should we namespace this any more to avoid collisions?

SHOULD resolve to a human-readable description, but MAY be unresolvable. SHOULD include a version number to allow for revisions.

TypeURIs are not registered. The natural namespacing of URIs is sufficient to prevent collisions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely yes. Right now all these constants are just for testing. When we standardize, we will pick sensible normal values for them.

Thank you for flagging and the review!

@mihaimaruseac mihaimaruseac merged commit fce6087 into sigstore:main Aug 1, 2024
20 checks passed
@mihaimaruseac mihaimaruseac deleted the in-toto-digests-of-digests branch August 1, 2024 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants