-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crypto/sha256: provide a way to efficiently hash multiple byte slices #21948
Comments
Unfortunately we can not make that change, as it would violate the Go 1 compatibility guarantee (while existing calls would still compile, other uses of the function might not). The Closing. |
Unfortunately, I can't adapt the function Sum256 because it uses private functions and struct. I would have done it if it was so simple. Could you explain me what it would break by changing the data argument to a vararg ? I honnestly don't know. Edit: oh I see. By changing the function signature, the type of a function variable would change. |
Oh, sorry, I failed to notice that I will reopen. We can't do what you suggest, but perhaps we can do something. |
Given this snippet: var b1, b2, b3 []byte
var h [sha256.Size]byte
d := sha256.New()
d.Write(b1)
d.Write(b2)
d.Write(b3)
d.Sum(h[:0]) Even though
|
Thank you for reopening the issue. Another and simpler solution would be to add a new function accepting a varargs.
This would be perfectly backward compatible, and it would be easy to understand what it does. |
As @ianlancetaylor already pointed out, this is not backwards compatible as it changes the signature of the function. |
@bsadfitz #21070 is about Sum but it address a different issue which I agree with. AppendSum would make clear that it uses the append pattern. It took me some time to figure it out. For #19361, I don't see the link with this issue. A compiler optimization would allow to fix the problem for all hashes in one hit. But I have no idea of how much effort it would require. After thinking about it, I agree that adding functions would a false good idea. PS: For the secure cookie, I think I can get around the problem by concatenating slices to hash. This should be possible by arranging the data differently with a minimum of data copy. So for my point of view, the possibility to hash multiple slices by avoiding a heap allocation is now "good to have" instead of "must have". Beside, hashing one block is faster than hashing multiple block fragment. So I hope the concatenation cost will be compensated by that benefit. |
@chmike
and use Hash.Reset. This should avoid the allocations as far as I can see... Nevertheless Go can do the compiler optimization but for this specific use case I don't think that it's required. |
Thanks for the suggestion. This is what I did. Code is available at github.com/chmike/securecookie. My secure cookie get and set need only 3 allocations and they are all outside of the value encoding and decoding. As a comparison, the gorilla secure cookie get/set require respectively 37 and 39 allocations. While there is a factor 10 difference in number of allocations, the speed difference is only around 3. Of course the gorilla secure cookie code is much more readable because it use the New functions and don't use arrays to avoid heap allocations. |
The problem here is that sha256.New returns a hash.Hash. If it returned a *digest then the compiler would have an easier time inlining the whole thing and then seeing that nothing escapes and removing all the allocations. There was a recent CL that does the first step of devirtualizing this kind of thing. I think we should probably let the compiler optimizations continue to the point where the obvious code just doesn't allocate. That's better than adding new API that won't be (hopefully) necessary in a few releases. It's fine of course if you want to fork sha256 for your use. |
Counterintuitively, it is much more memory efficient to build up a byte slice of data to hash and call sha512.Sum512 on it than to incrementally Write the data to the hasher. This is a limitation of the Go compiler's escape analysis. Since sha512.New returns a hash.Hash interface, the Go compiler is unable to reason about whether byte slices written to hash.Hash.Write escape, so it forces each of these values to be heap-allocated. This change uses sha512.Sum512, which uses the unexported, concrete *sha512.digest type. The Go compiler is able to infer that the *digest does not escape Sum512. The data slice provided to Sum512 requires a single allocation if we compute its size ahead of time. See golang/go#21948. This change was motivated by seeing that chainkd.XPub.Child was in the top 10 functions for number of allocations in a benchmark. ``` name old time/op new time/op delta XPrvChildNonHardened-4 73.7µs ± 5% 71.4µs ± 4% ~ (p=0.310 n=5+5) XPrvChildHardened-4 688ns ±10% 558ns ±10% -18.93% (p=0.002 n=6+6) XPubChild-4 85.5µs ± 4% 90.6µs ±14% ~ (p=0.247 n=5+6) XPrvSign-4 143µs ± 7% 147µs ± 7% ~ (p=0.485 n=6+6) XPubVerify-4 186µs ± 3% 198µs ± 8% ~ (p=0.065 n=6+6) name old alloc/op new alloc/op delta XPrvChildNonHardened-4 400B ± 0% 80B ± 0% -80.00% (p=0.002 n=6+6) XPrvChildHardened-4 368B ± 0% 80B ± 0% -78.26% (p=0.002 n=6+6) XPubChild-4 848B ± 0% 560B ± 0% -33.96% (p=0.002 n=6+6) XPrvSign-4 833B ± 0% 624B ± 0% -25.09% (p=0.002 n=6+6) XPubVerify-4 352B ± 0% 352B ± 0% ~ (all equal) name old allocs/op new allocs/op delta XPrvChildNonHardened-4 6.00 ± 0% 1.00 ± 0% -83.33% (p=0.002 n=6+6) XPrvChildHardened-4 5.00 ± 0% 1.00 ± 0% -80.00% (p=0.002 n=6+6) XPubChild-4 8.00 ± 0% 4.00 ± 0% -50.00% (p=0.002 n=6+6) XPrvSign-4 10.0 ± 0% 8.0 ± 0% -20.00% (p=0.002 n=6+6) XPubVerify-4 3.00 ± 0% 3.00 ± 0% ~ (all equal) ``` Closes #3472 Author: Jackson Owens <jackson_owens@alumni.brown.edu> Date: Wed Apr 25 10:40:51 2018 -0700 upstream:6f31447593b5a975e92f8c1981e6ecb404479b57
Using go1.9.
I'm trying to implement an optimized version of HMAC-SHA-256 digest computation that avoids allocation.
Current code is available in [this github repository]](https://github.com/chmike/hmacsha256)
The function signature I provide is the following:
The MAC is computed over the list of data slices given as argument.
Unfortunately, I'm not able to use the hardware implementation of SHA, so I'm stuck.
"internal/cpu" is not accessible.
I could use the
sha256.Sum256(data []byte) [Size]byte
function, but it accept only one data slice.With a small backward compatible change to this function signature, it would be possible to compute an SHA over multiple slices without needing an allocation with sha256.New.
hmac requires to compute a hash over at least two data blocks.
If there is another solution to use the hardware or assembly optimized sha256 block function, I'm a buyer.
PS: as explained in the end of the readme of the github repository, my goal is to use this for secure cookie which has to be verified for nearly every request. Avoiding these basically useless allocations, could reduce the pressure on the GC.
The text was updated successfully, but these errors were encountered: