Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for hashing files with header. #194

Closed
wants to merge 1 commit into from

Conversation

mihaimaruseac
Copy link
Collaborator

Summary

Missed this in #188, but found out I need it when working on #190. The serialize_v0/serialize_v1 methods all had headers in front of the files, so we need to do that too. Will update usage of header on #190 shortly.

As a benefit, we can simulate hashing a file with a header for the first portion of the file and a sharded hasher for the remainder of the file.

Release Note

NONE

Documentation

NONE

Missed this in sigstore#188, but found out I need it when working on sigstore#190. The
`serialize_v0`/`serialize_v1` methods all had headers in front of the
files, so we need to do that too. Will update usage of header on sigstore#190
shortly.

As a benefit, we can simulate hashing a file with a header for the first
portion of the file and a sharded hasher for the remainder of the file.

Signed-off-by: Mihai Maruseac <mihaimaruseac@google.com>
@mihaimaruseac mihaimaruseac requested a review from a team as a code owner June 3, 2024 23:55
@mihaimaruseac mihaimaruseac added this to the V1 release milestone Jun 3, 2024
Copy link
Collaborator

@laurentsimon laurentsimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit dangerous that:
"header" + "file_content" will return the same hash as "head" + "erfile_content"

What's the use case?

@mihaimaruseac
Copy link
Collaborator Author

Actually, I don't need it.

In the old serialization we had

        h = hashlib.sha256(header)
        with open(path, "rb") as f:
            if chunk == 0:
                all_data = f.read()
                h.update(all_data)
            else:
                # Compute the hash by reading chunk bytes at a time.
                while True:
                    chunk_data = f.read(chunk)
                    if not chunk_data:
                        break
                    h.update(chunk_data)
        return h.digest()

So we needed a header before hashing the file. But we actually have 2 different hashers here, so this is not needed.

@mihaimaruseac mihaimaruseac deleted the hashing-header branch June 4, 2024 00:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants