xoroshiro128+ seeding problems, was: The first call to `theft_random_choice(t, 4)` returns 2 most of the time. #39

jeberger · 2018-03-28T07:53:28Z

How to reproduce

Run the following code:

#include <theft.h>

static enum theft_trial_res prop_arg_should_not_be_2 (
      struct theft *t, void *arg)
{
   return (*(int*)arg == 2) ? THEFT_TRIAL_FAIL : THEFT_TRIAL_PASS;
}

static enum theft_alloc_res alloc_random_int (
      struct theft* t,
      void*         env,
      void**        instance)
{
   *instance        = malloc (sizeof (int));
   *(int*)*instance = theft_random_choice (t, 4);
   return THEFT_ALLOC_OK;
}

int main (void)
{
   int  status = 0;

   struct theft_type_info test_info = {
      .alloc = alloc_random_int,
      .free  = free,
   };

   struct theft_run_config config  = {
      .trials    = 1000,
      .name      = "theft-random",
      .prop1     = prop_arg_should_not_be_2,
      .type_info = { &test_info },
      .seed      = theft_seed_of_time(),
   };
   if (theft_run (&config) != THEFT_RUN_PASS) status = 1;

   return status;
}

Expected results

1/4 of the tests should fail and 3/4 should succeed. Some might be reported as duplicates (I don't know how theft determines duplicates exactly).

Actual results

Most of the time, theft reports 1 failure and 999 duplicates. Sometimes (rarely) there is 1 success. Example run:

== PROP 'theft-random': 1000 trials, seed 0x5634148fc7854230


 -- Counter-Example: theft-random
    Trial 0, Seed 0x5634148fc7854230
Fdddddddddd(DUP x 10)ddddddddd(DUP x 100)ddddddddd
== FAIL 'theft-random': pass 0, fail 1, skip 0, dup 999

Notes

I get the expected result for `theft_random_choice (t, 5)`:

== PROP 'theft-random-05': 100 trials, seed 0x95aa7c8e00f2a039
..

 -- Counter-Example: theft-random-05
    Trial 2, Seed 0xc4dc214c16c339c8
F.......

skip several failure reports

 -- Counter-Example: theft-random-05
    Trial 86, Seed 0xf3275c78f72fba13
F.d..........d
== FAIL 'theft-random-05': pass 75, fail 17, skip 0, dup 8

A quick check with values from 2 to 20 shows funny results for all powers of two (see attached source):

theft-test.zip

> ./theft-test | grep '^== \(PASS\|FAIL\)'
== PASS 'theft-random-02': pass 1, fail 0, skip 0, dup 999
== FAIL 'theft-random-03': pass 317, fail 60, skip 0, dup 623
== PASS 'theft-random-04': pass 2, fail 0, skip 0, dup 998
== FAIL 'theft-random-05': pass 362, fail 32, skip 0, dup 606
== FAIL 'theft-random-06': pass 319, fail 62, skip 0, dup 619
== FAIL 'theft-random-07': pass 391, fail 27, skip 0, dup 582
== PASS 'theft-random-08': pass 2, fail 0, skip 0, dup 998
== FAIL 'theft-random-09': pass 385, fail 35, skip 0, dup 580
== FAIL 'theft-random-10': pass 375, fail 38, skip 0, dup 587
== PASS 'theft-random-11': pass 446, fail 0, skip 0, dup 554
== PASS 'theft-random-12': pass 434, fail 0, skip 0, dup 566
== PASS 'theft-random-13': pass 436, fail 0, skip 0, dup 564
== PASS 'theft-random-14': pass 443, fail 0, skip 0, dup 557
== PASS 'theft-random-15': pass 429, fail 0, skip 0, dup 571
== PASS 'theft-random-16': pass 2, fail 0, skip 0, dup 998
== FAIL 'theft-random-17': pass 384, fail 26, skip 0, dup 590
== FAIL 'theft-random-18': pass 380, fail 23, skip 0, dup 597
== FAIL 'theft-random-19': pass 378, fail 26, skip 0, dup 596

Version information

Running on Linux with gcc version 4.8.2:

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/i686-linux-gnu/4.8/lto-wrapper
Target: i686-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.8.2-19ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.8 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.8 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-libmudflap --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-i386/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-i386 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-i386 --with-arch-directory=i386 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-targets=all --enable-multiarch --disable-werror --with-arch-32=i686 --with-multilib-list=m32,m64,mx32 --with-tune=generic --enable-checking=release --build=i686-linux-gnu --host=i686-linux-gnu --target=i686-linux-gnu
Thread model: posix
gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1)

Current theft develop head: 0619c2a

The text was updated successfully, but these errors were encountered:

silentbicycle · 2018-03-28T12:36:44Z

Thanks! This is a very helpful bug report.

I expect to fix this for the v0.5.0 release (which has been taking quite a while, due to the multicore stuff in #16). I have a pretty good idea what the root cause is.

silentbicycle · 2018-03-29T11:24:53Z

If you disable the logic here https://github.com/silentbicycle/theft/blob/develop/src/theft_random.c#L112-L120 by changing it to if (0 && ...), can you confirm that the anomaly goes away? It appears to be related to that special case, which is good, because it means I don't need to bump out the faster PRNG from the next release.

Further investigation suggests that it is specific to the new PRNG, when the limit is a power of 2. The PRNG theft used until a recent change on develop (mt64) doesn't exhibit this problem. What I don't understand yet is why my existing test for bias in the distribution didn't catch the issue. Thanks for reporting this.

jeberger · 2018-03-29T14:27:25Z

Further testing shows that the problem only manifests for the first few values. If I discard the first 30 calls to random_choice (t, 4) and only use the 31st, then I get the expected result.

I suspect that the PRNG bit pool is not reset properly between runs.

From what I've seen of your tests, this wouldn't show up because you only test for consistency within a single run, and the first values would be statistically insignificant against the whole run.

silentbicycle · 2018-03-29T15:55:31Z

I suspect the reason your test caught this issue and my existing one didn't is that yours was using a freshly seeded PRNG each time, whereas mine was seeding once and continuing to draw -- I think the xoroshiro128+ PRNG added in #29 has less variety in the lower bits, particularly immediately after seeding. This is less of a problem when continuing to draw, because theft will buffer the fully generated uint64_t, rather than returning the lower bits and discarding the rest, but if the PRNG is re-seeded each time it will always get the lowest N bits from the PRNG.

(Sorry: I came to the same conclusion, but forgot to send the above comment because of an appointment.)

The way theft's shrinking is implemented, it needs to re-seed the PRNG for each new trial, so having too much bias in the starting context is a problem. Discarding the first 32 bits when re-seeding xoroshiro128+ appears to work, but that's kind of a hack, and I'm going to think about whether I want to depend on it. Either way -- thanks for bringing this to my attention, I'll get it resolved before 0.5.0.

silentbicycle · 2018-03-29T21:03:37Z

Based on this article by Daniel Lemire and some local testing, I think his stateless splitmix64 may be a better choice for use in theft. It appears to be roughly on par with xoroshiro128+ (both take roughly half as long to finish theft's prng test suite as MT64), but it does significantly better with its tests for statistical bias in early draws, and it's also in the public domain / CC0 1.0.

jeberger · 2018-03-30T06:34:37Z

Note that when seeding a PRNG with N bits of state, it is usually considered good practice to discard at least the first N bits of output in order to avoid bias introduced by the seeding procedure.

I would probably avoid stateless generators, I suspect that they have much poorer statistical qualities (in particular in terms of generating independent sequences by using different seeds). "Stateless" is a poor choice of term to describe them btw: they're just generators with a small state that is externally managed.

silentbicycle · 2018-03-30T11:49:57Z

Right, it's only "stateless" in that it is using a passed in state rather than mutating a global or static state.

I'm not decided whether I should revert to MT64 (its only downside for my purposes is that it's a bit slower) or switch to splitmix64. I'll continue testing both. I don't expect the next release to be done for a bit still because of a massive restructuring for the multi-core support.

jeberger · 2018-04-10T07:08:20Z

I've run some tests using the dieharder test suite on 4 random number generators: Mersenne Twister, Xoroshiro128+, Splitmix64 and Combined Tausworthe. I've run the following tests:

straight: give the output of the RNG straight into dieharder -g 200 -a. This is mostly a sanity check, and all four generators passed.
reseeding: each time a random number is needed, draw a number, use it to reseed the generator, then draw a new number and return that. This is intended to check what happens between successive theft runs when the RNG is reseeded. As expected Xoroshiro fails this test. All the other RNGs pass.
reseeding+3: same as the previous test, except that after reseeding the RNG I discard the first 3 values drawn from the RNG and return the 4th. All four generators passed.
interleaving: seed two generators and draw alternatively from each. This test intends to check the independence between runs using different seeds. Splitmix fails this test (not surprising since its period is much shorter than the other generators). The three other generators passed.
reversing: draws a number and reverses the order of the bits. Theft processes the random values as a stream of bits and some RNG are known to be weak when considering only the LSB, so I wanted to be sure there wouldn't be any problems when calling theft_random_choice repeatedly with small values. All four generators passed this test.

As a conclusion, I'd say avoid Splitmix64. If the slower speed is acceptable, Mersenne Twister is a possible choice. Otherwise, Combined Tausworthe is good. Xoroshiro128 is also acceptable provided that you draw and discard the first few values after reseeding.

silentbicycle · 2018-09-10T14:53:10Z

Thanks again for this analysis. I'm still working on the changes for multicore support, but this will factor into the next release.

silentbicycle · 2018-10-06T14:26:08Z

I've read quite a bit more about PRNGs, and feel like I'm almost approaching an informed decision now.

Blackman and Vigna have deprecated Xoroshiro128+ in favor of Xoshiro256**. While Splitmix64 doesn't seem like a great fit in general, it seems perfect for initializing Xoshiro256**'s internal state. That said, I'm also strongly considering a PRNG from the PCG family instead.

The entire PRNG setup will be somewhat different in 0.5.0 -- rather than exposing the seed to the caller as an opaque token for reproducing trials, it may instead expose the full PRNG state, to avoid frequent reseeding. That's not worth the trouble with MT64, but PCG and Xoshiro256** have a much smaller internal state.

silentbicycle added the bug label Mar 28, 2018

silentbicycle self-assigned this Mar 28, 2018

silentbicycle added this to the v0.5.0 milestone Mar 28, 2018

silentbicycle modified the milestones: v0.5.0, v0.4.4 Mar 29, 2018

silentbicycle modified the milestones: v0.4.4, v0.5.0 Mar 29, 2018

silentbicycle changed the title ~~The first call to theft_random_choice(t, 4) returns 2 most of the time.~~ xoroshiro128+ seeding problems, was: The first call to theft_random_choice(t, 4) returns 2 most of the time. Oct 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xoroshiro128+ seeding problems, was: The first call to `theft_random_choice(t, 4)` returns 2 most of the time. #39

xoroshiro128+ seeding problems, was: The first call to `theft_random_choice(t, 4)` returns 2 most of the time. #39

jeberger commented Mar 28, 2018

silentbicycle commented Mar 28, 2018

silentbicycle commented Mar 29, 2018 •

edited

Loading

jeberger commented Mar 29, 2018

silentbicycle commented Mar 29, 2018 •

edited

Loading

silentbicycle commented Mar 29, 2018 •

edited

Loading

jeberger commented Mar 30, 2018

silentbicycle commented Mar 30, 2018

jeberger commented Apr 10, 2018

silentbicycle commented Sep 10, 2018

silentbicycle commented Oct 6, 2018

xoroshiro128+ seeding problems, was: The first call to theft_random_choice(t, 4) returns 2 most of the time. #39

xoroshiro128+ seeding problems, was: The first call to theft_random_choice(t, 4) returns 2 most of the time. #39

Comments

jeberger commented Mar 28, 2018

How to reproduce

Expected results

Actual results

Notes

I get the expected result for theft_random_choice (t, 5):

A quick check with values from 2 to 20 shows funny results for all powers of two (see attached source):

Version information

silentbicycle commented Mar 28, 2018

silentbicycle commented Mar 29, 2018 • edited Loading

jeberger commented Mar 29, 2018

silentbicycle commented Mar 29, 2018 • edited Loading

silentbicycle commented Mar 29, 2018 • edited Loading

jeberger commented Mar 30, 2018

silentbicycle commented Mar 30, 2018

jeberger commented Apr 10, 2018

silentbicycle commented Sep 10, 2018

silentbicycle commented Oct 6, 2018

xoroshiro128+ seeding problems, was: The first call to `theft_random_choice(t, 4)` returns 2 most of the time. #39

xoroshiro128+ seeding problems, was: The first call to `theft_random_choice(t, 4)` returns 2 most of the time. #39

I get the expected result for `theft_random_choice (t, 5)`:

silentbicycle commented Mar 29, 2018 •

edited

Loading

silentbicycle commented Mar 29, 2018 •

edited

Loading

silentbicycle commented Mar 29, 2018 •

edited

Loading