Skip to content

hash: add Clone #69521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
FiloSottile opened this issue Sep 18, 2024 · 51 comments
Open

hash: add Clone #69521

FiloSottile opened this issue Sep 18, 2024 · 51 comments
Labels
Proposal Proposal-Crypto Proposal related to crypto packages or other security issues Proposal-FinalCommentPeriod
Milestone

Comments

@FiloSottile
Copy link
Contributor

There isn't a general way to clone the state of a hash.Hash, but #20573 introduced the concept of hash.Hash implementations also implementing encoding.BinaryMarshaler and encoding.BinaryUnmarshaler, and the hash.Hash docs commit our implementations to doing that.

Hash implementations in the standard library (e.g. hash/crc32 and crypto/sha256) implement the encoding.BinaryMarshaler and encoding.BinaryUnmarshaler interfaces.

That allows cloning the hash state without recomputing it, as done in HMAC.

go/src/crypto/hmac/hmac.go

Lines 96 to 103 in db40d1a

marshalableInner, innerOK := h.inner.(marshalable)
if !innerOK {
return
}
marshalableOuter, outerOK := h.outer.(marshalable)
if !outerOK {
return
}

However, it's obscure and pretty clunky to use.

I propose we add a hash.Clone helper function.

package hash

// Clone returns a separate Hash instance with the same state as h.
//
// h must implement encoding.BinaryMarshaler and encoding.BinaryUnmarshaler,
// or be provided by the Go standard library. Otherwise, Clone returns an error.
func Clone(h Hash) (Hash, error)

In practice, we should only fallback to BinaryMarshaler + BinaryUnmarshaler for the general case, while for standard library implementations we can do an undocumented interface upgrade to interface { Clone() Hash }. In that sense, hash.Clone is a way to hide the interface upgrade as a more discoverable and easier to use function.

(Yet another example of why we should be returning concrete types everywhere rather than interfaces.)

CloneXOF

If #69518 is accepted, I propose we also add hash.CloneXOF.

package hash

// CloneXOF returns a separate XOF instance with the same state as h.
//
// h must implement encoding.BinaryMarshaler and encoding.BinaryUnmarshaler,
// or be provided by the Go standard library or by the golang.org/x/crypto module
// (starting at version v0.x.y). Otherwise, Clone returns an error.
func CloneXOF(h XOF) (XOF, error)

None of our XOFs actually implement BinaryMarshaler + BinaryUnmarshaler, but they have their own interface methods Clone() ShakeHash and Clone() XOF that each return an interface. I can't really think of a way to use them from CloneXOF, so instead we can add hidden methods CloneXOF() hash.XOF and interface upgrade to them.

As we look at moving packages from x/crypto to the standard library (#65269) we should switch x/crypto/sha3 and x/crypto/blake2[bs] from returning interfaces to returning concrete types, at least for XOFs. Then they can have a Clone() method that returns a concrete type, and a CloneXOF() method that returns a hash.XOF interface and enables hash.CloneXOF.

(If anyone has better ideas for how to make this less redundant, I would welcome them. I considered and rejected using reflect to call the existing Clone methods because hash is a pretty core package. This sort of interface-method-that-needs-to-return-a-value-implementing-said-interface scenarios are always annoying.)

/cc @golang/security @cpu @qmuntal (who filed something similar in #69293, as I found while searching refs for this)

@FiloSottile FiloSottile added Proposal Proposal-Crypto Proposal related to crypto packages or other security issues labels Sep 18, 2024
@FiloSottile FiloSottile added this to the Proposal milestone Sep 18, 2024
@gabyhelp
Copy link

@magical
Copy link
Contributor

magical commented Sep 18, 2024

How do you intend to implement the fallback path in Clone? MarshalBinary + UnmarshalBinary only gives you the ability to save and restore a Hash's state, not construct a new instance of it. You might be able to do it with reflect but it sounds like we're trying to avoid reflect.

@magical
Copy link
Contributor

magical commented Sep 19, 2024

I'll also point out that, when using Clone to optimize repeated hashes with the same prefix, you really want to pair it with a Set method which copies the hash state from one existing instance to another.

e.g.

h := newHash()
h.Write(prefix)
h0 := h.Clone() // allocates
for _, message := range messages {
   h.Set(h0) // doesn't allocate
   h.Write(message)
   fmt.Println(h.Sum(nil))
}

Otherwise, without Set, you end up creating unavoidable garbage on every message.

h0 := newHash()
h0.Write(prefix)
for _, message := range messages {
   h := h0.Clone() // allocates
   h.Write(message)
   fmt.Println(h.Sum(nil))
}

My initial stab at the hmac optimization (before hashes implemented BinaryMarshaler and BinaryUnmarshaler) combined Clone and Set into a single method:

type HashCloner interface {
    hash.Hash

    // Clone returns a copy of its reciever, reusing the provided Hash if possible
    Clone(hash.Hash) hash.Hash
}

It looks a little odd but ends up being fairly ergonomic. One advantage this definition has over separate Clone and Set methods is that there is a nice answer for what to do when the argument and receiver types do not match: just allocate a new value. Whereas Set would have to panic.

// Example implementation for sha1.digest
func (d0 *digest) Clone(h hash.Hash) hash.Hash {
	d, ok := h.(*digest)
	if !ok {
		d = new(digest)
	}
	*d = *d0
	return d
}
h0 := newHash()
h0.Write(prefix)
var h hash.Hash
for _, message := range messages {
   h = h0.Clone(h)  // allocates on first iteration, reuses h on subsequent iterations
   h.Write(message)
   fmt.Println(h.Sum(nil))
}

@qmuntal
Copy link
Member

qmuntal commented Sep 19, 2024

This proposal tackles the same problem that made me start drafting #69293: there is no clean way to clone hashes.

About the proposed hash.Clone function, I had the same concerns as @magical. How do you instantiate a new hash before calling UnmarshalBinary? This I why in #69293 I proposed to define a hash.Cloner interface.

My motivation to have a common way to clone hash objects is to improve the compatibility of several hash implementations that I've implemented using CNG and OpenSSL with those libraries that are currently using MarshalBinary + UnmarshalBinary to clone a hash. The issue is that CNG/OpenSSL don't provide an API to serialize the internal state of a given hash, but they do provide APIs to clone a hash object.

@FiloSottile
Copy link
Contributor Author

FiloSottile commented Oct 30, 2024

How do you intend to implement the fallback path in Clone? MarshalBinary + UnmarshalBinary only gives you the ability to save and restore a Hash's state, not construct a new instance of it.

Doh. Yes, that doesn't make sense, thank you.

We discussed this with @rsc and it it would make sense to follow the io/fs pattern:

package hash

// Clone returns a separate Hash instance with the same state as h.
//
// h must implement CloneHash, as all Hash implementations in the Go standard library do.
// Otherwise, Clone returns an error.
func Clone(h Hash) (Hash, error)

type CloneHash interface {
    hash.Hash
    Clone() hash.Hash
}

The question is whether we can make it not allocate by a combination of devirtualization and inlining. I think the answer is yes if either the type of the Hash passed to Clone devirtualizes or if it's a concrete type. Notably, I think the crypto/hmac use where a Hash is saved in the struct matches neither case. It would be useful if someone else could go through common cloning use cases

This isn't really urgent for Go 1.24. I don't want to add Clone methods to the new crypto/sha3 types (#69982) until we decide this, because there's a risk we'll make them not implement the interface, but crypto/sha3 can start with just MarshalBinary/UnmarshalBinary like every other stdlib Hash.

@mateusz834
Copy link
Member

The question is whether we can make it not allocate by a combination of devirtualization and inlining. I think the answer is yes if either the type of the Hash passed to Clone devirtualizes or if it's a concrete type

Currently it would not work, see #64824.

@mateusz834
Copy link
Member

mateusz834 commented Oct 30, 2024

@FiloSottile the io/fs pattern always includes the base interface, so:

type CloneHash interface {
    hash.Hash
    Clone() hash.Hash
}

What do you think?

EDIT: or even:

type CloneHash interface {
    hash.Hash
    Clone() hash.CloneHash
}

@FiloSottile
Copy link
Contributor Author

Ah yes, that's what I meant to write. Not sure about returning a hash.CloneHash, I think it depends on whether we want users to pass the extended version around or keep it an "implementation detail" of hash.Clone.

@mateusz834
Copy link
Member

Not sure about returning a hash.CloneHash, I think it depends on whether we want users to pass the extended version around or keep it an "implementation detail" of hash.Clone.

This way we also make sure that the returned (cloned) hash implements the hash.CloneHash, not doing so would let returning a non-cloneable hash.

@ericlagergren
Copy link
Contributor

Has a generic hash.Clone been ruled out?

@mateusz834
Copy link
Member

mateusz834 commented Oct 31, 2024

@FiloSottile Do we plan to implement this new interface by the crypto/hmac, this might influence the API, Clone() hash.Hash or Clone() hash.CloneHash must be allowed to return nil, because the provided hash constructor passed to hmac.New might return an hash.Hash that does not implement the interface (probably hash.Clone should treat that case as an error).

@mateusz834
Copy link
Member

Or instead we can check whether the hash implements the new interface in hmac.New and return a different implementation that does not implement hmac.CloneHash in that case.

@FiloSottile
Copy link
Contributor Author

Good point on crypto/hmac. I think I like the option of choosing the implementation in hmac.New the best.

@mateusz834
Copy link
Member

mateusz834 commented Nov 5, 2024

Good point on crypto/hmac. I think I like the option of choosing the implementation in hmac.New the best.

This might cause issues in the future if we would like to make hmac.New alloc-free and devirtualizable. Also wouldn't this be problematic for the fips module (crypto/internal/fips/hmac)? It returns a concrete type now. But considering that crypto/hmac does not yet implement the encoding.BinaryMarshaler and Unmarshaler interfaces i assume that we don't need to make it clonable now.

// Note that unlike other hash implementations in the standard library,
// the returned Hash does not implement [encoding.BinaryMarshaler]
// or [encoding.BinaryUnmarshaler].
func New(h func() hash.Hash, key []byte) hash.Hash {

@aclements aclements moved this from Incoming to Active in Proposals Nov 6, 2024
@aclements
Copy link
Member

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.

@rsc
Copy link
Contributor

rsc commented Nov 12, 2024

Talked to @FiloSottile about this and we agreed to leave this for Go 1.25.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2024

It sounds like we agree on:

package hash

// Clone returns a separate Hash instance with the same state as h.
//
// h must implement CloneHash, as all Hash implementations in the Go standard library do.
// Otherwise, Clone returns an error.
func Clone(h Hash) (Hash, error)

type CloneHash interface {
    hash.Hash
    Clone() hash.Hash
}

Do I have that right?

(I'm assuming we ignore CloneXOF until XOF is accepted, and it can be part of the XOF proposal.)

@mateusz834
Copy link
Member

@rsc #69521 (comment)

Personally i think that we should treat nil from CloneHash.Clone as an error condition in hash.Clone and just return an error.

@aclements
Copy link
Member

Given that we're trying to move toward concrete hash types, it would be nice if you could clone a concrete hash type and get back the same concrete hash type. However, I can't quite figure out how to make this work because CloneHash would have to be parameterized and then I'm not sure how a non-parameterized func Clone(h Hash) (Hash, error) would actually work.

General question: Why do people need to clone hash functions? I think understanding this would help us in evaluating this proposal. @magical mentioned using this to "optimize repeated hashes with the same prefix", but I don't actually know why people would want to do that either.

@aclements
Copy link
Member

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— aclements for the proposal review group

The proposal is to add the following to the hash package:

package hash

// A Cloner is a hash function whose state can be cloned.
// All hashes in the standard library implement this interface.
type Cloner interface {
    hash.Hash
    Clone() hash.Hash
}

And to implement the Clone method on all standard library hashes (most, but not all of which, already implement MashalBinary/UnmarshalBinary).

As a follow-on, ideally the compiler would implement optimizatoins such that, if the concrete type of the hash is fixed, it's possible to write an allocation-free Clone method.

@aclements aclements changed the title proposal: hash: add Clone hash: add Clone Jan 28, 2025
@aclements aclements modified the milestones: Proposal, Backlog Jan 28, 2025
@magical
Copy link
Contributor

magical commented Feb 3, 2025

I don't really see the point of Clone as accepted. In the long run i think we want to move towards having the hash packages return concrete types (as is done in crypto/internal/fips140). In that world, it would make more sense for Clone to return the concrete type as well (obviating any need for devirtualization). In fact, most hashes will not even need a Clone method because a shallow copy is enough. You only really need the Cloner interface for cases like crypto/hmac where the underlying hash is not known at compile time, and all the compiler optimization in the world will not help you avoid the allocations in that case. (Unless it is paired with a Set method, as i described in my earlier comment.)

@rsc
Copy link
Contributor

rsc commented Feb 5, 2025

It seems perfectly reasonable to write code that is not specific to a single hash and yet still wants to Clone the hash.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/649195 mentions this issue: cmd/compile: devirtualize interface calls with type assertions

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/652036 mentions this issue: cmd/compile/internal/devirtualize: improve concrete type analysis

@qmuntal
Copy link
Member

qmuntal commented May 2, 2025

@FiloSottile @aclements would be nice if this proposal could land in Go 1.25. There are some CLs already submitted ready for review implementing it. Is there anything else missing?

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/670178 mentions this issue: internal/testhash: move cryptotest.TestHash to shared package

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/670179 mentions this issue: hash: use testhash.TestHash in all hash functions

@aclements
Copy link
Member

Two API questions have come up again during implementation:

  1. Should Clone return Hash or Cloner? There was some discussion of this starting here, but I don't think we actually resolved this.
  2. How should hmac.Clone work, given that it wraps another hash.Hash and doesn't know statically if that hash implements Cloner? There was some discussion of this starting here.

For 1, I don't feel strongly, but lean slightly toward changing this to return Cloner simply based on the principle that T.Clone() should return T.

For 2, the options are a) you just can't clone an hmac, b) hmac checks at construction time whether the underlying hash is clonable and returns one of two different implementation types, or c) hmac always implements Clone, but we have to tweak the API so Clone can fail if the underlying hash doesn't implement Clone (or it does but that Clone fails).

Option b is nice, but not very scalable. If we were to add another extension/optional/upgrade interface in the future, we'd need four hmac types.

I'm leaning toward option c, and specifying that Clone can return errors.ErrUnsupported, which was in part intended for exactly this purpose. This was added relatively recently, and I don't think we actually use it in any public APIs yet, but this seems like the perfect situation for it: we have a type wrapping another type, an upgrade interface, and no fallback. (We almost used this in encoding.AppendText, but realized that had a fallback.)

Together, this would look like:

package hash

// A Cloner is a hash function whose state can be cloned.
// All hashes in the standard library implement this interface.
type Cloner interface {
	Hash

	// Clone returns a copy of the Hash with identical and
	// independent internal state.
	//
	// If the hash cannot be cloned, it returns an error that
	// wraps [errors.ErrUnsupported].
	Clone() (Cloner, error)
}

@mateusz834
Copy link
Member

For 1, I don't feel strongly, but lean slightly toward changing this to return Cloner simply based on the principle that T.Clone() should return T.

I also feel that way, i also would assume that some people would like to pass a hash.Cloner in arguments/struct fields, but then the name does not make sense for me, as it should be a ClonableHash (because it has all the functionality of a hash.Hash)

// If the hash cannot be cloned, it returns an error that
// wraps [errors.ErrUnsupported].

That is nice, but having an error just for one state does not feel great, it could be a bool or we could also allow a nil return to represent that a hash is not cloneable instead.

But we should also note that with the addition of an additional error, it might require us to add a global helper Clone func (as before, see: #69521 (comment)), because now to clone a hash you have to make an type assertion and also error check (bunch of code just to clone a hash).

@rolandshoemaker
Copy link
Member

rolandshoemaker commented May 7, 2025

Re the failure case where a hash doesn't implement the interface, is this essentially only going to be the case for hashes implemented outside of the standard library? (Since we explicitly document All hashes in the standard library implement this interface).

I'm not really sure how common this is. There are plausibly hashes we don't implement, but I'm not sure I've ever actually seen anyone do this. Of course I'm not saying it doesn't happen, but I'm wondering how much complexity we should incur to handle this case, especially if it's extremely rare.

@aclements aclements moved this from Accepted to Active in Proposals May 8, 2025
@aclements
Copy link
Member

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— aclements for the proposal review group

@qiulaidongfeng
Copy link
Member

qiulaidongfeng commented May 8, 2025

From CL 644275 , Regarding crypto/hmac, there is another option. If it cannot be cloned, it should panic, similar to the panic of reflect.Value.Seq over values that cannot be iterated.

gopherbot pushed a commit that referenced this issue May 8, 2025
This test helper can test any hash.Hash, not just crypto hashes. Move
it to $GOROOT/src/internal/testhash so both crypto and hash can use
it.

For #69521

Change-Id: Iac086cca513d5c03936e35d1ab55b8636f4652f7
Reviewed-on: https://go-review.googlesource.com/c/go/+/670178
Reviewed-by: qiu laidongfeng2 <2645477756@qq.com>
Auto-Submit: Austin Clements <austin@google.com>
Reviewed-by: Roland Shoemaker <roland@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot pushed a commit that referenced this issue May 8, 2025
For #69521

Change-Id: I4e056253f94ad421fcef12d21edaaaf2517b64c1
Reviewed-on: https://go-review.googlesource.com/c/go/+/670179
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Austin Clements <austin@google.com>
Reviewed-by: qiu laidongfeng2 <2645477756@qq.com>
Reviewed-by: Roland Shoemaker <roland@golang.org>
@aclements
Copy link
Member

It's okay for reflect.Value.Seq to panic because the caller can (and should) check whether it's a sequence before calling that. The caller either knows a priori that it's a sequence, or had to query what kind of value it was before doing much of anything anyway.

For Clone, there's currently no way to check, meaning every generic Clone call would have to defensively handle the panic. If we were to allow Clone to panic, we'd need to add a CanClone() to check, but that's not a good approach either because it's an extra step that also very easy to forget.

@aclements
Copy link
Member

The proposal committee discussed this and decided that, while there isn't a perfect choice, returning ErrUnsupported seems like the best option. This is literally the exact situation that's meant for. Panicking isn't ideal because it requires defensively handling the panic, and it would be easy to forget. Returning nil isn't a great option because it would also be very easy to forget to check for and would likely immediately lead to a more confusing panic. It could return a bool, but that's not very self-descriptive or standard.

@aclements aclements moved this from Active to Likely Accept in Proposals May 13, 2025
@aclements
Copy link
Member

Based on the discussion above, this proposal seems like a likely accept.
— aclements for the proposal review group

The proposal is to add the following to the hash package:

package hash

// A Cloner is a hash function whose state can be cloned.
// All hashes in the standard library implement this interface.
// If a hash can only determine at runtime if it can be cloned,
// (e.g., if it wraps another hash), it may return [errors.ErrUnsupported].
type Cloner interface {
    Hash
    Clone() (Cloner, error)
}

And to implement the Clone method on all standard library hashes (most, but not all of which, already implement MashalBinary/UnmarshalBinary).

As a follow-on, ideally the compiler would implement optimizations such that, if the concrete type of the hash is fixed, it's possible to write an allocation-free Clone method.

@mateusz834
Copy link
Member

As a follow-on, ideally the compiler would implement optimizations such that, if the concrete type of the hash is fixed, it's possible to write an allocation-free Clone method.

I already mailed CL 652036 for this. I added a test case for the new API and it still works. Not sure whether we want this for 1.25 though or whether we should defer that to 1.26.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Proposal Proposal-Crypto Proposal related to crypto packages or other security issues Proposal-FinalCommentPeriod
Projects
Status: Likely Accept
Development

No branches or pull requests