Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

random doesn't generate uniform numbers (very high correlation) when using seed from time #8589

Closed
timotheecour opened this issue Aug 9, 2018 · 9 comments · Fixed by #17468

Comments

@timotheecour
Copy link
Member

#[
nim c --nimcache:/tmp/nim//nimcache/ -o:/tmp/nim//app -r tests/nim/stdlib/t02_random_seed.nim

/tmp/nim//app
0.7110418528567215
/tmp/nim//app
0.7110418342302638
etc => BUG: they're all very close to each other
]#

import times
import random

proc test()=
  let seed = now().toTime.toUnix
  echo
  seed.randomize
  echo rand(1.0)
  # for i in 0..10: echo rand(1.0) # this would be ok

test()
rdmd --eval=' auto seed = Clock.currTime().toUnixTime; auto rnd = Random(seed.to!int); uniform(0.0, 1.0, rnd).writeln'
0.22832

rdmd --eval=' auto seed = Clock.currTime().toUnixTime; auto rnd = Random(seed.to!int); uniform(0.0, 1.0, rnd).writeln'
0.0909191
=> D works
@demotomohiro
Copy link
Contributor

xoroshiro128+ (and most of pseudorandom number generator) generates random numbers by repeatedly updating internal state by calling next() function.
It is not supposed to generate random numbers by repeatedly setting seed.

Here is C code of xoroshiro128+
http://xoshiro.di.unimi.it/xoroshiro128plus.c

Following C code call next function in above c file.
It set seed in the same way as initRand*() in lib/pure/random.nim, and update internal state only once before printing like your code.

#include <inttypes.h>
#include <stdio.h>

int main() {
    int i;
    for(i=0; i<20; ++i) {
        //Today's unix timestamp
        uint64_t x = i+1533876749;
        s[0] = x >> 16;
        s[1] = x & 0xffff;
        next();
        uint64_t n = next();
        printf("%" PRIu64 "\n", n);
    }
}

output doesn't looks like random:

2300569911837024
2300982228894051
2300844789875042
2304555643388285
2304418204369276
2304830521426303
2304693082407294
2304005887312249
2303868448293240
2304280765350267
2304143326331258
2303456131236213
2303318692217204
2303731009274231
2303593570255222
2302906375160177
2302768936141168
2303181253198195
2303043814179186
2297958570475853

If you want to generate random number in such a way, use hash function or Counter-based random number generator.

By the way,
in lib/pure/random.nim

proc rand*(r: var Rand; max: range[0.0 .. high(float)]): float {.benign.} =
    ...
    let u = (0x3FFu64 shl 52u64) or (x shr 12u64)
    result = (cast[float](u) - 1.0) * max

This code works only when float is IEEE 754 64bit float.
Is type float always IEEE 754 64bit float?

According to Nim manual:

float
the generic floating point type; its size is platform dependent (the compiler chooses the processor's fastest floating point type). This type should be used in general.

@cheatfate
Copy link
Member

@timotheecour this is exactly not a bug, you seeding random number generator with ascending sequence of numbers, and this is not how PRNG (pseudo random number generators) must work. You need to seed it only once.
But if you still want to reseed your every iteration PRNG, you can use OS specific CSPRNG for seed generation.

@Araq
Copy link
Member

Araq commented Aug 10, 2018

Is type float always IEEE 754 64bit float?

Yeah, it is, I updated the spec.

@Araq
Copy link
Member

Araq commented Aug 10, 2018

Yeah, no bug here.

@Araq Araq closed this as completed Aug 10, 2018
Araq added a commit that referenced this issue Aug 10, 2018
@timotheecour
Copy link
Member Author

timotheecour commented Aug 10, 2018

You need to seed it only once.

I know, I was simplifying. Issue remains when binary is run multiple times (each time it seeds once)

I improved randomness by using nanoseconds instead of seconds as done in random.nim (which I just noticed now)

  proc randomize*() {.benign.} =
    let now = times.getTime()
    randomize(convert(Seconds, Nanoseconds, now.toUnix) + now.nanosecond)

However I'd like to keep this issue open: D's version produces random looking results even when initialized from unix time in seconds:

import std.stdio;
import std.random;
import std.datetime;
import std.conv;

void main(){
  auto seed = Clock.currTime().toUnixTime;
  auto seed2 = seed.to!int;
  auto rnd = Random(seed2);
  //auto x = uniform(0.0, 1.0, rnd);
  auto x = uniform(0, int.max, rnd);
  writefln("%10s %10s", seed2, x);
}

rdmd main.d
1533886940 1240308021
rdmd main.d
1533886941 796131160

which shows it has better randomness than nim's version, so there's room for improvement, eg using https://en.wikipedia.org/wiki/Mersenne_Twister with a period of 2^^ 19937

@Araq
Copy link
Member

Araq commented Aug 10, 2018

https://cs.stackexchange.com/questions/50059/why-is-the-mersenne-twister-regarded-as-good

"MT was regarded as good for some years, until it was found out to be pretty bad with the more advanced TestU01 BigCrush tests and better PRNGs.

The table at pcg-random.org e.g. gives a good overview of features of some of the most used PRNGs, where the only "good" feature of the Mersenne Twister is the huge period, 2219937 and the possibility to use a seed (Reproducible Results), it passes the simple and fast TestU01 SmallCrush tests, but it fails some of the newer statistical quality tests, esp. TestU01's LinearComp Test and the TestU01's Crush and BigCrush Batteries."

@demotomohiro
Copy link
Contributor

If you use random module at the same time in multiple threads/processes, you should seed them with same seed and call skipRandomNumbers so that each rand generate non-overlapping streams.
If you want more simple way (but there is a risk of using overlapping streams in different thread/process), multiply large odd value to a seed so that most bits of a seed are randomly changed even if more than half high bits of each seed are same.
For example:

import times
import random

let t = now().toTime.toUnix
for i in 0..<20:
    # Multiplier from 
    # https://en.wikipedia.org/wiki/Linear_congruential_generator#Parameters_in_common_use
    let seed = int64(uint64(t+i) * 6364136223846793005u64)
#    let seed = (t+i)
    var r = initRand(seed)
    var rf = r
    echo r.next(), "\t", rf.rand(1.0)

Applying good hash function to seed would also work but bit slower.

You cannot say D's Random is better than nim xoroshiro128plus only from your code.
I don't know much about D, but it seems D's Random is MersenneTwister.
https://dlang.org/library/std/random/random.html

MT(Mersenne Twister) is already in lib/pure/mersenne.nim
A long period does not imply high quality:
http://xoshiro.di.unimi.it/#remarks
MT fails Empirical Statistical Tests:
https://www.johndcook.com/blog/2017/08/14/testing-rngs-with-practrand/
http://www.pcg-random.org/statistical-tests.html
MT require large memory (2.5k bytes).
I think no one needs soooo vast period even if you use worlds fastest supercomputer.

If you still doubt of a quality of xoroshiro128+, xoshiro256** and xoshiro256+ are bigger version of it.
http://xoshiro.di.unimi.it/
Also, PCG might be a good competitor.
http://www.pcg-random.org/

@timotheecour
Copy link
Member Author

timotheecour commented Aug 10, 2018

thanks for the notes. regarding multithreading:
what if the state: Rand was thread local, and randomize would take (pid, tid, time) to generate seed? that way, rand() from different threads would not be correlated

@timotheecour
Copy link
Member Author

timotheecour commented May 11, 2021

fixed by #17468

import std/[times, random]
proc test()=
  let seed = now().toTime.toUnix
  seed.randomize
  echo rand(1.0)
test()

for i in {1..10}; do nim r --hints:off $timn_D/tests/nim/all/t12262.nim; sleep 1; done

after PR:

0.5795982367884893
0.2043700138087881
0.2527687822420892
0.4845061810712352
0.8156170470855539
0.3533471493803952
0.7261053651555247
0.4779878967729234
0.8582831885896214
0.6085394564568243

before PR:

0.3010194302173124
0.3010194339426018
0.3010194562943453
0.3010194451184771
0.3010196462841976
0.301019650009487
0.3010196425589082
0.3010196611853624
0.3010196537347836
0.301019657460073

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants