Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v6] Make hash usage extensible to enable concurrent support of multiple hash types #1362

Open
wants to merge 19 commits into
base: v6-sha256
Choose a base branch
from

Conversation

pjbgf
Copy link
Member

@pjbgf pjbgf commented Jan 6, 2025

Refactor the codebase so that it stops relying on plumbing.Hash's implementation details, and instead rely on exported methods.
This is a stepping stone to enable support of multiple hash types, which will be dealt with as a follow-up PR.

The hash package now contains all code regarding hash, with exception to Hasher related logic, which is spread between plumbing and hasher. plumbing.Hash now redirects to hash.SHA1Hash, which may (or may not) be dropped at a later stage.

The previous sha256 specific tests that relied on build tags were removed as they are now redundant.

Relates to #706 #899.
Supersedes to #920.

@pjbgf pjbgf added the breaking label Jan 6, 2025
@pjbgf pjbgf added this to the v6.0.0 milestone Jan 6, 2025
@pjbgf pjbgf requested a review from aymanbagabas January 6, 2025 00:10
Comment on lines 10 to 17
const (
// CryptoType defines what hash algorithm is being used.
CryptoType = crypto.SHA1
// Size defines the amount of bytes the hash yields.
Size = 20
Size = SHA1Size
// HexSize defines the strings size of the hash when represented in hexadecimal.
HexSize = 40
HexSize = SHA1HexSize
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need these?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still being used, instead of replacing all the refs to SHA1Size, to then have to change them again later, I think we are better off leaving as is and replace as the specific packages get multi-hash support.

Comment on lines 3 to 14
func Zero() SHA1Hash {
return SHA1Hash{}
}

func ZeroSHA256() SHA256Hash {
return SHA256Hash{}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replace these with var zero hash.SHA1Hash and var zero256 hash.SHA256Hash?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason for a func was to avoid the awkard position were API users are able to change the meaning of zero, using a single var will revert to that. However, I agree with the change for the time being, but it is likely this code will change again before it being ready for merging back to v6-exp.

Comment on lines 16 to 19

func (ih SHA256Hash) IsZero() bool {
var empty SHA256Hash
return ih == empty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (ih SHA256Hash) IsZero() bool {
var empty SHA256Hash
return ih == empty
var empty256 SHA256Hash
func (ih SHA256Hash) IsZero() bool {
return ih == empty256

Comment on lines 26 to 29

func (ih SHA1Hash) IsZero() bool {
var empty SHA1Hash
return ih == empty
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (ih SHA1Hash) IsZero() bool {
var empty SHA1Hash
return ih == empty
}
var empty SHA1Hash
func (ih SHA1Hash) IsZero() bool {
return ih == empty
}

Comment on lines 104 to 126
// ZeroFromHash returns a zeroed hash based on the given hash.Hash.
//
// Defaults to SHA1-sized hash if the provided hash is not supported.
func ZeroFromHash(h hash.Hash) ObjectID {
switch h.Size() {
case SHA256Size:
return SHA256Hash{}
default:
return SHA1Hash{}
}
}

// ZeroFromHash returns a zeroed hash based on the given ObjectFormat.
//
// Defaults to SHA1-sized hash if the provided format is not supported.
func ZeroFromObjectFormat(f format.ObjectFormat) ObjectID {
switch f {
case format.SHA256:
return SHA256Hash{}
default:
return SHA1Hash{}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use these funcs now. We can add them later when the need arises

import "io"

// ObjectID represents a calculated hash, which is immutable by nature.
type ObjectID interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rename this to Hash to keep naming consistent

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am holding back on making this change while plumbing.Hash exists, to avoid additional confusion on having multiple "Hash" types. At present plumbing.Hash has more than one use which is leading me to think that potentially it could be split into two different types. Either way, let's make that decision together with the final design for the API (as per discord thread) - which will be in a follow-up PR.


// LazyObjectID represents an object hash which may not be known at the time
// the object is created. Or an objectID which changes during its lifetime.
type LazyObjectID interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, I'd rename this to LazyHash

plumbing/hasher/hasher.go Outdated Show resolved Hide resolved
pjbgf and others added 19 commits January 6, 2025 22:21
Signed-off-by: Paulo Gomes <paulo.gomes@suse.com>
Signed-off-by: Paulo Gomes <paulo.gomes@suse.com>
Signed-off-by: Paulo Gomes <paulo.gomes@suse.com>
The introduction of both new types are a stepping stone to enable
SHA256 support concurrently - without the need for a build tag.

ObjectID provides a way to represent varied sized hashes (for SHA1
or SHA256) with a single type, while keeping similar performance of the
existing plumbing.Hash.

The names were picked in order to provide a clearer distinction on what
they do. ObjectID is the result of a hash operation and can't be
changed once calculated. ObjectHasher, adds the Git object header before
a normal hash operation that hash.Hash or plumbing/hash.Hash do.

The performance results shows that SHA1 operations could be slower,
however for SHA256 it can be over 3 times faster:

~Hasher/hasher-sha1-100B-16         	            2202226	     574.7 ns/op	 174.00 MB/s     272 B/op     7 allocs/op
~Hasher/objecthash-sha1-100B-16     	            1511851	     772.6 ns/op	 129.43 MB/s     272 B/op     9 allocs/op
~Hasher/objecthash-sha256-100B-16   	            5057584	     247.4 ns/op	 404.21 MB/s      96 B/op     7 allocs/op
~Hasher/hasher-sha1-5000B-16        	              96476	     12523 ns/op	 399.25 MB/s    5536 B/op     7 allocs/op
~Hasher/objecthash-sha1-5000B-16    	             105376	     10983 ns/op	 455.27 MB/s     272 B/op     9 allocs/op
~Hasher/objecthash-sha256-5000B-16  	             420741	      2828 ns/op	1767.77 MB/s      96 B/op     7 allocs/op
~HashFromHex/hasher-parse-sha1-valid-16         	23243548     64.65 ns/op	 618.69 MB/s      48 B/op     1 allocs/op
~HashFromHex/objecthash-fromhex-sha1-valid-16   	18120699     79.62 ns/op	 502.36 MB/s      72 B/op     2 allocs/op
~HashFromHex/objecthash-fromhex-sha256-valid-16 	 8969871    124.2 ns/op	     515.22 MB/s      96 B/op     2 allocs/op

Signed-off-by: Paulo Gomes <paulo.gomes@suse.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Improves plumbing.Hash extensibility, which is part of the sha256 support.

Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Removes redundant logic, and keep all hash specific code inside its own package.
The refactoring of Hasher will be done separately.

Signed-off-by: Paulo Gomes <pjbgf@linux.com>
The initial implementation required build tags in order to support SHA256.
This will no longer be the case, and therefore this code is redundant.

Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
@pjbgf
Copy link
Member Author

pjbgf commented Jan 6, 2025

@aymanbagabas changes made and code rebased. PTAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants