-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hash/maphash: add func Comparable #54670
Comments
I'm realizing a major caveat with this is that this can hash pointers. In order for |
Note that in #28322 we discussed and rejected the idea of having the package that became hash/maphash support general types. |
Even today, we have moving GC if you count stacks. The built-in maps avoid this problem by never allocating on the stack objects that map pointer keys refer to. For |
I wanted something like https://github.com/dolthub/maphash My use case is a simple stripe-locked map, where a generic hash function is used to pick the stripe: type Map[K comparable, V any] struct {
stripes [64]map[K]V
locks [64]sync.Mutex
} Looking through the history of this issue, it seems like there are a number of interesting applications for this proposal. @bcmills listed a number of relevant examples in #21195 |
In my application, I'm writing a generic data structure for packet processing, where I only care about hashing byte arrays. As a half-step measure, I wonder if there's value in providing a The downside is that generic data structures (that expose |
For posterity, here are the thorny issues with pointers, interfaces, and channels:
|
I don't think it does. Broadly, there are two solutions to pointer hashing in a moving GC:
Today, option 1 would mean rebuilding all maps. This is possible, but would be extremely difficult to do in a low-latency way. If we were to expose hash values of pointers, this would become essentially untenable. (You could imagine that the hash values we expose are opaque, but that severely limits what you can do with them.) Option 2 is, I believe, fairly reasonable. This is what Java does for the default hashCode implementation. The main downside is that, today, hashing a pointer value requires no indirection–we simply hash the pointer value–and this would require us to perform an object lookup to check for an original hash (not just an indirection because it could be an interior pointer). It might be possible on some architectures to use a bit in the pointer to indicate that the object has moved and we need to do this extra work. It would also require us to have somewhere to store the original hash that's easy to access from the pointer. But if we're moving objects, we'd probably have a different heap layout, too, and this would simply have to be a consideration.
I think maphash.Comparable would have to escape its argument in the current implementation. That's not ideal, but doesn't seem like a big deal to me. |
I think there are a couple of additions to Go that make this highly worth reconsidering:
The lack of
Providing |
Edit 2024-06-05: Tried to clarify table. We were discussing this in the Google compiler/runtime team and had some more thoughts.
Stepping back, the basic property of hashing is that I'm being very careful to say "values of type Values with underlying type Values of interface type are more complicated. Their dynamic type may be It's informative to look at how built-in maps deal with all this today. Built-in maps are known to the compiler's escape analysis and follow very simple rules: OptionsOnce we decouple hashing from data structure operations using that hash, things get more complicated. I see a few options:
1. Change how pointers to the stack are hashedThe hash function for Today, composite types can sometimes be hashed directly with a single large The more concerning thing to me is that this may restrict some future design directions. For example, we're currently exploring dynamic escape analysis, where values can move from stack to the heap at run-time. This would make the simple normalized hash function no longer stable. We could apply the same solutions as a moving GC, but that has a cost for hashing of all pointers. There may be other ways this would hamstring us. 2. Escape
|
Op | Type | Built-in map | User map | Note |
---|---|---|---|---|
Lookup | *T |
No escape | Escapes | Lookup can never succeed for a stack pointer, so this is almost certainly rare. |
Lookup | interface | No escape | Escapes ⚠ | Unfortunate. |
Lookup | string (no special case) |
No escape | Escapes ⚠⚠ | Very unfortunate. |
Lookup | string (w/ special case) |
No escape | No escape | |
Insert | *T |
Escapes | Escapes | Both hashing and storing cause escape. |
Insert | interface | Escapes | Escapes | Same. |
Insert | string (no special case) |
Escapes | Escapes | Same. |
Insert | string (w/ special case) |
Escapes | Escapes | Storing causes escape in user map. |
With this approach, lookups in user-defined maps are slightly disadvantaged relative to the built-in map. For *T
, this probably doesn't matter in practice. For interfaces it's unfortunate because interface values often contain pointers that don't affect the hash result. For example, if the interface's dynamic type is int
, the compiler has to box that value and store it as a pointer to an int
value. This solution would force these to escape.
For string values (and strings embedded in other types), if we treat them naively, they'll always escape, but we can special-case them in escape analysis to close this particular gap.
3. Dynamically escape stack pointers passed to maphash
Dynamic escape analysis allows the compiler's escape analysis to be less conservative by enabling the runtime to move values allocated on the stack to the heap only when necessary. We don't currently have this ability, but we're exploring it as a general optimization when we have hot/cold path information from PGO and the compiler can determine that all escapes of a value happen on cold paths. We could apply it to maphash as well.
4. Track the data flow of hash values
I include this here for completeness, but I'm not sure how plausible it is. The idea is that, if we can bound the lifetime of a hash value, it doesn't need to escape the hashed valued. This may be possible if we could represent hash values as a distinct type, but in general data structures need to use the numerical value of a hash, e.g., to select as hash bucket, and I think at that point it would be very easy to lose track of the data flow.
The easiest implementation would be to accept that Comparable's arguments escape. We see how to do that very easily. My instinct is that it probably is still useful in that case, but maybe I'm wrong. Is it still useful if arguments escape? If arguments can't escape then it will take a lot longer to move forward with this. |
This proposal has been added to the active column of the proposals project |
This PR adds a hashtriemap implementation to the project. This is mostly a direct copy from unique package [^1] internals [^2] with some adjustments to make it work with our project. Once `maphash.Comparable` lands we can remove dependency on Go internals [^3]. Michael Knyszek says he plans to use this implementation as a general idea for the future generic sync/v2.Map, but nothing is in stone yet. [^1]: golang/go#62483 [^2]: https://go-review.googlesource.com/c/go/+/573956 [^3]: golang/go#54670 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
This PR adds a hashtriemap implementation to the project. This is mostly a direct copy from unique package [^1] internals [^2] with some adjustments to make it work with our project. Once `maphash.Comparable` lands we can remove dependency on Go internals [^3]. Michael Knyszek says he plans to use this implementation as a general idea for the future generic sync/v2.Map, but nothing is in stone yet. [^1]: golang/go#62483 [^2]: https://go-review.googlesource.com/c/go/+/573956 [^3]: golang/go#54670 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
@dsnet I'd love to hear your take on whether Comparable escaping its arguments is still a useful API. (Let's assume the compiler transparently rewrites maphash.Comparable[string] -> maphash.String so the escape question is not important in that specific case.) Also it seems odd to have the global func but no way to update a *Hash with a comparable. The choices are to have a method taking an any, or else to have a global func like func WriteComparable[T comparable](h *Hash, x T) I assume these routines panic if you use something like |
Overall, this seems worth doing even for the simple implementation. That said, when re-reading, I’m not always 100% certain when some folks say in some spots above something like "always escape k" if they are using that as shorthand for "always escape k in some cases" (such as if k is a pointer, interface, chan, or other variations). For clarity, I think the key would not escape if it is for example an integer type or struct of integers, even under the simple implementation?
Would it be reasonable to also avoid escaping types like the following (either in the first implementation, or with modest follow-up effort)? type t1 struct { k1, k2 string }
type t2 struct { k1, k2 string; k3 int }
type t3 struct { k1 t1; k2 t2 } (That seems plausible based on a perhaps naive guess at an implementation). In any event, even the "simple" implementation seems pretty useful, and hard cases could come later or never. |
Can we use this: //go:nosplit
func noEscapePtr[T any](p *T) *T {
x := uintptr(unsafe.Pointer(p))
return (*T)(unsafe.Pointer(x ^ 0))
}
func main() {
...
h := maphash.Comparable(*noEscapePtr(&myStruct)) To trick |
This PR adds a hashtriemap implementation to the project. This is mostly a direct copy from unique package [^1] internals [^2] with some adjustments to make it work with our project. Once `maphash.Comparable` lands we can remove dependency on Go internals [^3]. Michael Knyszek says he plans to use this implementation as a general idea for the future generic sync/v2.Map, but nothing is in stone yet. [^1]: golang/go#62483 [^2]: https://go-review.googlesource.com/c/go/+/573956 [^3]: golang/go#54670 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
This PR adds a hashtriemap implementation to the project. This is mostly a direct copy from unique package [^1] internals [^2] with some adjustments to make it work with our project. Once `maphash.Comparable` lands we can remove dependency on Go internals [^3]. Michael Knyszek says he plans to use this implementation as a general idea for the future generic sync/v2.Map, but nothing is in stone yet. [^1]: golang/go#62483 [^2]: https://go-review.googlesource.com/c/go/+/573956 [^3]: golang/go#54670 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
This PR adds a hashtriemap implementation to the project. This is mostly a direct copy from unique package [^1] internals [^2] with some adjustments to make it work with our project. Once `maphash.Comparable` lands we can remove dependency on Go internals [^3]. Michael Knyszek says he plans to use this implementation as a general idea for the future generic sync/v2.Map, but nothing is in stone yet. [^1]: golang/go#62483 [^2]: https://go-review.googlesource.com/c/go/+/573956 [^3]: golang/go#54670 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
This PR adds a hashtriemap implementation to the project. This is mostly a direct copy from unique package [^1] internals [^2] with some adjustments to make it work with our project. Once `maphash.Comparable` lands we can remove dependency on Go internals [^3]. Michael Knyszek says he plans to use this implementation as a general idea for the future generic sync/v2.Map, but nothing is in stone yet. [^1]: golang/go#62483 [^2]: https://go-review.googlesource.com/c/go/+/573956 [^3]: golang/go#54670 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
This PR adds a hashtriemap implementation to the project. This is mostly a direct copy from unique package [^1] internals [^2] with some adjustments to make it work with our project. Once `maphash.Comparable` lands we can remove dependency on Go internals [^3]. Michael Knyszek says he plans to use this implementation as a general idea for the future generic sync/v2.Map, but nothing is in stone yet. [^1]: golang/go#62483 [^2]: https://go-review.googlesource.com/c/go/+/573956 [^3]: golang/go#54670 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
This PR adds a hashtriemap implementation to the project. This is mostly a direct copy from unique package [^1] internals [^2] with some adjustments to make it work with our project. Once `maphash.Comparable` lands we can remove dependency on Go internals [^3]. Michael Knyszek says he plans to use this implementation as a general idea for the future generic sync/v2.Map, but nothing is in stone yet. [^1]: golang/go#62483 [^2]: https://go-review.googlesource.com/c/go/+/573956 [^3]: golang/go#54670 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
|
This comment was marked as resolved.
This comment was marked as resolved.
Based on the discussion above, this proposal seems like a likely accept. The proposal is to add:
There were some questions about implementation but I believe we have worked them out and are confident we can implement it efficiently enough to be useful. |
No change in consensus, so accepted. 🎉 The proposal is to add:
There were some questions about implementation but I believe we have worked them out and are confident we can implement it efficiently enough to be useful. |
By default, runtime.memhash is used. When purego is used, reflect is used to generate the same []byte with the same value, and then hash the []byte. Fixes golang#54670 Change-Id: Ibd0538a7dfb3d831c5145970cac7c910692bca69
By default, runtime.memhash is used. When purego is used, reflect is used to generate the same []byte with the same value, and then hash the []byte. Fixes golang#54670 Change-Id: Ibd0538a7dfb3d831c5145970cac7c910692bca69
Change https://go.dev/cl/609761 mentions this issue: |
By default, runtime.memhash is used. When purego is used, reflect is used to generate the same []byte with the same value, and then hash the []byte. Fixes golang#54670 Change-Id: Ibd0538a7dfb3d831c5145970cac7c910692bca69
By default, runtime.memhash is used. When purego is used, reflect is used to generate the same []byte with the same value, and then hash the []byte. Fixes golang#54670 Change-Id: Ibd0538a7dfb3d831c5145970cac7c910692bca69
By default, runtime.memhash is used. When purego is used, reflect is used to generate the same []byte with the same value, and then hash the []byte. Fixes golang#54670 Change-Id: Ibd0538a7dfb3d831c5145970cac7c910692bca69
By default, runtime.memhash is used. When purego is used, reflect is used to generate the same []byte with the same value, and then hash the []byte. Fixes golang#54670 Change-Id: Ibd0538a7dfb3d831c5145970cac7c910692bca69
Circling back here with an issue related to collision avoidance. I think we should strive to avoid collisions between two values of the same type, but not between two values of different types. The reason is that I think it would be hard for an attacker to populate a single map with many different key types. There just aren't that many paths for attackers to force a program to generate lots of differently-typed keys for a single map. Normally, this package will be used with just one type for all the calls to Comparable (with the same seed and whose collisions matter, at least). We do want to defend against different values of the same type having identical hashes. So the case So consider this a clarifying mini-proposal. Let me know if anyone has a different thought in this area. |
The question is how would the current map hashing code handle those cases? The original intent of runtime/maphash is to expose the runtime internal hashing for user defined containers needing hashes benefitting from any improvements made to runtime internal hashing. The current implementation is not type variant as every type gets its own algorithm generated at compile time. Value variance must be handled already in existing hashing code as one can use the above mentioned types as keys in a map. I believe the hash collisions created by identical byte sequences in a data structure is also ok currently in map hashes of existing maps. Strings might include the length in the hash, so that might lead to different hashes in arrays for the simple cases. So I would suggest to not over engineer a solution that is only supposed to expose an existing hashing and it's constraints. |
I'm trying to implement my own map-like data structure. I'm using generics where I have a
comparable
type on hand, but I don't have a way to hash it. To workaround it, I'm adding aHash() uint64
constraint, but that requires every key type to have a customHash
method, which is quite unfortunate, given that the Go runtime knows how to hashcomparable
types.I propose we add:
Under the hood, this function is a thin wrapper over
mapType.hasher
.There are some caveats with the operation of this function, but they're no different than the caveats of using
==
.The text was updated successfully, but these errors were encountered: