-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: maps,cmd/compile: special case and/or add function for getting an arbitrary entry from a map #66361
Comments
Change https://go.dev/cl/572119 mentions this issue: |
We could add:
And happen to implement that with a compiler optimization that recognizes the way users often write it today. That way existing users benefit, and future users have a more convenient and reliable spelling. |
it appears to be somewhat common https://github.com/search?q=language%3AGo+%2Ffor.*%3D.*range.*%5Cn%5Cs*break%24%2F&type=code |
An alternate, more general, approach would be to expose some way to get the nth map entry. That may be too general but it's worth exploring, if just for contrast. Let's say there's package unsafe
func Entry(m MapType, n IntegerType) (ArbitraryType, ArbitraryType) It does not provide a fixed order over the map, but as long as you do not modify or iterate over the map the same With that selecting an arbitrary element is Of course if you want an arbitrary element a random one will do and you could avoid exposing it by just having a func in |
As exposed to users today, map entries are unordered, so there's no unique "nth map entry." We could change that and define some order, but that seems like a substantially broader change than proposed here, and I'm skeptical that's something we want to pursue without use cases. |
A function would be weird because it becomes |
@bradfitz I'm not convinced a fast path would improve anything here, scanning the empty slots seems like a slow thing to do. package a
import "fmt"
import "testing"
var doNotOptimizeK, doNotOptimizeV uint
func benchmark(b *testing.B, cap, len uint) {
b.Run(fmt.Sprintf("%d/%d", cap, len), func(b *testing.B) {
m := make(map[uint]uint, cap)
for i := range len {
m[i] = i
}
b.ResetTimer()
for range b.N {
for doNotOptimizeK, doNotOptimizeV = range m {
break
}
}
})
}
func BenchmarkMap(b *testing.B) {
for cap := uint(1); cap < 1000000; cap *= 7 {
for len := uint(1); len <= cap; len *= 7 {
benchmark(b, cap, len)
}
}
}
It easily do |
@Jorropo, it's a very heavily loaded server. I did the optimization based on data, not on a whim or for fun. |
The patch you sent make sense. |
I think it is generally unobvious, that it's not an even distribution by taking the first element of iteration (https://go.dev/play/p/n50sJuL1O6s and for other lens it's still not even). Do we actually want to encourage "getting random element directly from a map"? This clearly adds a confusion about the distribution |
@pkositsyn I assume that the intention is to get an 'arbitrary' element (which could be the same each time) instead of a 'random' element. The specification leaves the order undefined. So, it is theoretically possible that another Go implementation (or this one in the future, albeit unlikely) implements an ordered map. |
Anecdata: I spent many hours tracking down odd behaviour in a gossip system due to the skewed distribution of this pattern. |
If it is about getting "any" element I suggest renaming the issue to "getting arbitrary element entry in map" |
This replaces a map used as a set with a slice. We were using a surprising amount of CPU in this code, making mapiters to pull out a random element of the map. Instead, just rand.IntN to pick a random element of the slice. It also adds a benchmark: │ before │ after │ │ sec/op │ sec/op vs base │ ConnRequestSet-8 1818.0n ± 0% 452.4n ± 0% -75.12% (p=0.000 n=10) (whether random is a good policy is a bigger question, but this optimizes the current policy without changing behavior) Updates #66361 Change-Id: I3d456a819cc720c2d18e1befffd2657e5f50f1e7 Reviewed-on: https://go-review.googlesource.com/c/go/+/572119 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Brad Fitzpatrick <bradfitz@golang.org>
Pedantically, "random" does not imply "uniform distribution", but I appreciate that many users conflate the two. I agree "any", "some", or "arbitrary" seem like better words to use in user-facing naming and documentation to minimize confusion if we don't guarantee a uniform distribution. (I would think we could guarantee range always starts at a uniform random map entry; but that seems like a separate proposal, which I expect has already been discussed and rejected too.) I don't think it's necessary to retitle the issue though. |
I think it's worth pointing out that the more generic patterns in which a single entry is processed inside the loop seems to be much more common than the simple pattern where the loop body contains only // delete one element from a non-empty map
for k := range m {
delete(m, k)
break
}
// process a single element in a non-empty map
for k, v := range m {
// some statement operating on m, k, and/or v - e.g. one of the following:
// - foo(m, k, v)
// - m[k] = ...
// - foo(v); delete(m, k)
// - if some_condition { ... } (as long as control flow does not diverge)
break
} |
I don't see any practical way to guarantee uniform distribution in ω(n) time. At least, given multiple iterations from the same map. (Uniform distribution among multiple distinct maps with the same contents, or multiple binary invocations might be easier.) I think "arbitrary" would be the only way forward here. |
We can use the number of iterations from the initial random guess until we found a valid value to measure the bias towards that value and use a second random call to pick (or continue iterating) this value based on the bias. The worst case is gonna be scanning the whole map (like currently) tho. |
Well, the bias is not just the number empty slots until you find a valid value, but the number of empty slots since the previous valid value. So you'd have to at least walk in the reverse direction some distance. And then do the bias calculation and compare to len(m)/size(m), and use that to determine whether to continue. But maybe that would be ω(n) overall... |
Right I did left out all the tricky details. |
Some mathematical background that might be helpful: The problem of uniformly sampling from a collection is reservoir sampling, and it can be done optimally in O(log n) time for a single sample. The basic idea is to assign each valid item a uniform random number and choose the one with the largest value. The more clever idea is to draw distances between successive largest values from the Gumbel distribution, which requires two floating point logarithms per sample (plus an exponential if drawing more than one sample). It's been a while since I've looked at Go's maps in particular, but for a basic hash map, I would assume that accounting for empty slots probably still requires O(n) rather than ω(n) iterations. Probably the best improvement you get is exiting early when your next slot position exceeds the size of the map, which can still be implemented in terms of a naïve iterator. |
Most often when I use package maps
// Sole returns the sole entry of map m. It panics if len(m) != 1.
func Sole[K comparable, V any](m map[K]V) (K, V) {
if n := len(m); n > 1 {
panic(fmt.Sprintf("map contains %d entries", n))
}
for k, v := range m { return k, v }
panic("empty map")
} |
This replaces a map used as a set with a slice. We were using a surprising amount of CPU in this code, making mapiters to pull out a random element of the map. Instead, just rand.IntN to pick a random element of the slice. It also adds a benchmark: │ before │ after │ │ sec/op │ sec/op vs base │ ConnRequestSet-8 1818.0n ± 0% 452.4n ± 0% -75.12% (p=0.000 n=10) (whether random is a good policy is a bigger question, but this optimizes the current policy without changing behavior) Updates golang#66361 Change-Id: I3d456a819cc720c2d18e1befffd2657e5f50f1e7 Reviewed-on: https://go-review.googlesource.com/c/go/+/572119 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Brad Fitzpatrick <bradfitz@golang.org>
I don't think we need to worry about giving every element exactly the same chance for the start of an iteration. Special-casing this:
to be faster seems reasonable and does not require a proposal. Is anyone interested? |
Proposal Details
Sometimes people try to get a random key/value out of a map, writing:
One example is/was in
database/sql
. Unfortunately, it uses a lot of CPU:So I rewrote it in https://go-review.googlesource.com/c/go/+/572119
An alternative is the compiler could recognize that pattern: a range with a body containing only a
break
.If so, the compiler could insert a call to a runtime func that takes 2 pointers (either/both of which may be nil) and updates them to a random map value if the map has non-zero length. If it's empty, the pointers are unmodified.
/cc @golang/compiler @maisem @raggi
The text was updated successfully, but these errors were encountered: