-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
math/rand: rand.Float64 is not purely uniform #12290
Comments
There is a comment in the source code: // A clearer, simpler implementation would be:
// return float64(r.Int63n(1<<53)) / (1<<53)
// However, Go 1 shipped with
// return float64(r.Int63()) / (1 << 63)
// and we want to preserve that value stream.
//
// There is one bug in the value stream: r.Int63() may be so close
// to 1<<63 that the division rounds up to 1.0, and we've guaranteed
// that the result is always less than 1.0. To fix that, we treat the
// range as cyclic and map 1 back to 0. This is justified by observing
// that while some of the values rounded down to 0, nothing was
// rounding up to 0, so 0 was underrepresented in the results.
// Mapping 1 back to zero restores some balance.
// (The balance is not perfect because the implementation
// returns denormalized numbers for very small r.Int63(),
// and those steal from what would normally be 0 results.)
// The remapping only happens 1/2⁵³ of the time, so most clients
// will not observe it anyway. |
That's an interesting comment, but the consequence of those choices is a non uniform distribution, where 0 and numbers near 0 are over-represented. |
A quick test of
shows a non-uniform distribution of the LSB -- it is 25% 1, instead of 50% 1. |
Isn't it wrong to look at the LSB for floating point numbers? |
As I understand it, staring with a 53-bit integer and then dividing it by 2-to-the-53 means that there are floating point numbers that will occur with probability zero. Any "random" value less than 1/2 (252/253) will not have its LSB set, for example. I think this is also a flaw in the original algorithm, but for a smaller set of FP numbers. I don't actually know what is officially "right" here because I am not an expert, but right now neither of these algorithms is doing much for me. I recall from some earlier work that one way to do this is to take a 64-bit random number, split it into 12 and 52-bit parts, stuff 52-bits into mantissa, and then count leading zeroes on the 12-bit part to derive your exponent. If the 12-bit part is all zero, pull more 64-bit random numbers till you get one that is not zero or till you hit (about, up to fenceposts) 1024 of them in which case you call it zero, and use the number of leading zeroes to derive your exponent. Since you'd only do second RNG call 1 time out of 4096, this could be plenty efficient since it involves no FP ops at all (though we might hope that multiplication and division by powers of two would be very efficient). I am somewhat creeped out that both algorithms are guaranteed not to hit certain FP values. |
I would not worry about not hitting certain FP values, especially since FPs are not uniformly distributed by nature (much more towards 0 than towards 1). What's important is that the FP values generated have a uniform distribution. If you restrict yourself to the 53 bits, then the FP numbers are uniformly spaced in (0,1), so it's easy to grasp that the FP generator has to be uniform if the underlying integer generator is uniform. Yes you loose the small numbers around 0 (less than machine epsilon) but there is no way to be uniform and keep those. Maybe it's not a very good argument, but I would trust Matsumoto and the official Mersenne-Twister implementation as he is a very prominent researcher in the area. |
maybe this paper could help understanding the idea of uniform random floating point numbers: |
Anybody doing Monte-Carlo simulations will want the Matsumoto like behavior. I don't really see any other use case for a Float64 function than Monte-Carlo simulations. |
Any particular reason not to instead use
Provenance of funny constant: https://play.golang.org/p/FSkOkffu_2 This gets you uniformity, and hits more floating point numbers. |
look at the comment from stemar94, 0 is underrepresented under this algo. You just can't hit more random points and have a uniform distribution appropriate for simulation. |
The value stream argument seems to be winning at this end (i.e., I floated the possibility of an RNG that differed by 1eps in many instances, and that was judged to be not worth the trouble), but I agree that the wrap-back from 1 overrepresents 0, by I think a factor of 1<<9 for Float64 and 1<<38 for Float32 -- I may have a fencepost error in my exponents, the derivation is that the highest two values for Float64 are 1 and 1-1/(1<<53), there are 1024 points between them in the integer random stream, and each point gets half, versus a single point for 1/(1<<63) and 0. Wrapping 1 to 0 maps 512 points from the integer stream to 0, which is a substantial overrepresentation, and egregiously so for Float32. To (mostly) preserve the value stream, it appears that the least-awful method is to replace the zero substitution with try-another-number. It benchmarked the same, though of course this was just a microbenchmark. I don't understand the claim that zero is overrepresented. The Mersenne twister RNG will generate 1/(1<<53) and 0 both at p=1/(1<<53), this algorithm generates 1/(1<<63) and 0 both at p=1/(1<<63). In both cases the probability of zero and its nearest neighbor are the same (and p(0) is lower in this algorithm because all of the points between 0 and 1/(1<<54) get hits, where in Mersenne twister they instead round to zero). |
At this point I don't believe we should change the value stream generated by (*Rand).Float64 in any significant way. The split of rand.Source (the source of non-determinism) from rand.Rand (the algorithms on top) means that people expect that only rand.Source might differ from release to release. If people use the same rand.Source output as input to rand.Rand, they expect to get the same outputs. rand.Rand should not itself be a source of non-determinism, not even from release to release, if at all possible. When the implementation of (*Rand).Float64 returned 1, that was obviously a mistake and in contradiction to the docs, so it merited a breaking change. The implementation of (*Rand).Float64 is not perfect. My apologies. But I believe it's not awful enough to merit a breaking change. The We may be able to loop and grab another value, but I'm not even convinced that's necessary at this point. More generally, the specified API for (*Rand).Float64 is incomplete. All it says is “Float64 returns, as a float64, a pseudo-random number in [0.0,1.0) from the default Source.” It says nothing about uniformity, which is good, because floating point numbers themselves are not uniformly distributed in [0, 1). In practice, if you care deeply about the properties of your random floating-point numbers, you should probably write your own wrapper around r.Int63() instead of depending on unspecified details of a library implementation. |
Thanks rsc, this makes perfect sense. And this is what I actually ended up doing. The only thing I am left wondering is what are the use cases for this Random.Float64 method? |
CL https://golang.org/cl/17670 mentions this issue. |
The change does not really improve uniformity that much. Again, as there are more floating points near zero, the outcome will be skewed towards zero with probability 1/2^11. |
@fabienlefloch I disagree. Yes, there are more floating-point numbers near zero, but they are selected less often, so it works out. By "uniform" here we mean that after x = rand.Float64(), as much as possible, x < k with probability k, for all k in [0, 1]. I spent a while calculating Pr(x < k) analytically for the various fixes, and the current implementation seems to me quite close, certainly far closer than one in 2^11. I'd be happy to see an analysis disproving that though. |
@rsc Why would they be selected less often near 0? |
Here are the graphs I created in November: https://swtch.com/randdist.html.
Make your browser wide enough to see four in each row.
The first row is the original behavior, that could return 1 sometimes. The
Pr(rand.Float64() < 1) in that system was not 1.0: it could return 1
sometimes, and the red line does not get to the top of the rightmost graph.
The second row is the "corrected" behavior, which returns 0 too often. The
Pr(rand.Float64() < 1) in that system is 1.0, but overall Pr(rand.Float64()
< x) was much larger than x for small values of x. You can see this most
clearly in the second graph on the row, where the red line is nowhere near
the blue line. In general quantization means it can't be a perfect match,
but you'd want to see it dipping down to the blue line on occasion, or even
crossing.
The third row is the new resampling behavior, which I argue does not return
0 nor values close to 0 too often. Again there is quantization, but the red
y=Pr(rand.Float64() < x) tracks the blue line y=x quite nicely. To me, the
graphs show that the systematic bias of the first row and the different
systematic bias of the second row are both gone.
Russ
|
Would this test program perhaps be helpful? It computes 100 billion random float32s, and bins them by negative-power-of-two-less-than-or-equal (i.e., (1,0.5] ends up in bin 1 ) for bins 1-64.
I ran this for over 100 minutes, the last printed line was: The reason points near zero are selected less often is by construction. If we start with points uniform on [0,263-1], full-precision division by 263 preserves uniformity, each with probability P0 = 1/263. If we then round those to a 53-bit mantissa, the probabilities of [0,253)/263 remain unchanged (=P0); they are exactly representable and there is no rounding. Consider the mapping of points 253+k for smallish k. For even k the point is exactly representable and the value appears in the FP64 stream, but for odd k the value must be rounded. 1->0, 2=2, 3->4, 4=4, 5->4, 6=6, 7->8, 8=8, 9->8, 10=10 etc. So on the one hand, I see an interesting glitch in uniformity, in that for k = 2, 6, 10, ... p(253+k) = P0 but for k = 4, 8, 12, ... p(253+k) = 3*P0 because of round-to-nearest-zero-if-tie. (I modified the program above to test this and it appeared right away. Similarly, the old unfixed code starts dumping hits into the last bucket right away for rand.Float32). For larger powers of two, the difference in even and odd probabilities diminishes. For p(254+k) two bits must be dropped and the winners will be k=0,4,8,12,16. Rounding to the respective values of k:
To your original worry/complaint, notice that in general each time half the points drop out, the probability of the remaining points being chosen increases by a factor of two on average. In the data above, you'd expect to see a glitch, if any, occurring between bucket 10 and bucket 11, and there is none (I checked). It's a bit of a shame we can't do this in RTZ FP mode; it would, however change the stream of bits from the RNG and perhaps cause some sadness. |
I did not have time yet to produce a simple example to illustrate uniformity. I don't fully understand why golang wants to reinvent the wheel on this subject, while every single researcher on the topic uses the same techniques to produce random float64 from integers (see Matsumoto, L'Ecuyer, etc.). |
The difficulty is that the wheel was reinvented differently, and then there is a strong desire to retain exact compatibility with older releases. There were clear bugs in the original that had to be fixed, so they were fixed. If you can demonstrate a bug, that greatly increases the odds that it will be fixed. |
The current implementation (as of Go 1.5) of rand.Float64 is:
As the mantissa of a Float64 is on 52 bits, there are a lot of numbers (2^11 ?) that were initially mapped to 1 that will effectively become 0. The Float64() method therefore does not preserve the uniform property of the underlying Int63() method.
This is related to issues #4965 and #6721 .
A better alternative, even if this implies to trade numbers under machine epsilon against uniformity comes from the prototypical code of Mersenne Twister 64 at http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/VERSIONS/C-LANG/mt19937-64.c :
An interesting consequence is that this preserves symmetry around 0.5. It's also quite a bit faster on my machine, as the if statement of the original algorithm deteriorate performance significantly (not entirely sure why).
The text was updated successfully, but these errors were encountered: