jl_rng_split: better to do PCG output after mixing

Key issue before was that a linear relationship between task pedigree caused a linear relationship between Xoshiro task states. The output function of PCG-RXS-M-XS-64 is specifically designed to mask linearity. By mixing the LCG state with the Xoshiro state and then applying the PCG output function, any linearity is better masked.
JuliaLang · Feb 27, 2024 · eb6055f · eb6055f
1 parent 7647422
commit eb6055f
Showing 1 changed file with 30 additions and 27 deletions.
diff --git a/src/task.c b/src/task.c
@@ -864,18 +864,17 @@ usage invertibility is actually a benefit (as is explained below) and adding as
 little additional memory overhead to each task object as possible is preferred.
 
 The goal of jl_rng_split is to perturb the state of each child task's RNG in
-such a way each that for an entire tree of tasks spawned starting with a given
-state in a root task, no two tasks have the same RNG state. Moreover, we want to
-do this in a way that is deterministic and repeatable based on (1) the root
-task's seed, (2) how many random numbers are generated, and (3) the task tree
-structure. The RNG state of a parent task is allowed to affect the initial RNG
-state of a child task, but the mere fact that a child was spawned should not
-alter the RNG output of the parent. This second requirement rules out using the
-main RNG to seed children: if we use the main RNG, we either advance it, which
-affects the parent's RNG stream or, if we don't advance it, then every child
-would have an identical RNG stream. Therefore some separate state must be
-maintained and changed upon forking a child task while leaving the main RNG
-state unchanged.
+such a way that for an entire tree of tasks spawned starting with a given root
+task state, no two tasks have the same RNG state. Moreover, we want to do this
+in a way that is deterministic and repeatable based on (1) the root task's seed,
+(2) how many random numbers are generated, and (3) the task tree structure. The
+RNG state of a parent task is allowed to affect the initial RNG state of a child
+task, but the mere fact that a child was spawned should not alter the RNG output
+of the parent. This second requirement rules out using the main RNG to seed
+children: if we use the main RNG, we either advance it, which affects the
+parent's RNG stream or, if we don't advance it, then every child would have an
+identical RNG stream. Therefore some separate state must be maintained and
+changed upon forking a child task while leaving the main RNG state unchanged.
 
 The basic approach is that used by the DotMix [2] and SplitMix [3] RNG systems:
 each task is uniquely identified by a sequence of "pedigree" numbers, indicating
@@ -1030,14 +1029,14 @@ cannot have hash collisions. What about parent colliding with child? That can
 only happen if all four main RNG registers are perturbed by exactly zero. This
 seems unlikely, but could it occur? Consider the core of the output function:
 
-    p ^= p >> ((p >> 59) + 5);
-    p *= m[i];
-    p ^= p >> 43
+    w ^= w >> ((w >> 59) + 5);
+    w *= m[i];
+    w ^= w >> 43;
 
 It's easy to check that this maps zero to zero. An unchanged parent RNG can only
-happen if all four `p` values are zero at the end of this, which implies that
+happen if all four `w` values are zero at the end of this, which implies that
 they were all zero at the beginning. However, that is impossible since the four
-`p` values differ from `x` by different additive constants, so they cannot all
+`w` values differ from `x` by different additive constants, so they cannot all
 be zero. Stated more generally, this non-collision property: assuming the main
 RNG isn't used between task forks, sibling and parent tasks cannot have RNG
 collisions. If the task tree structure is more deeply nested or if there are
@@ -1060,27 +1059,31 @@ void jl_rng_split(uint64_t dst[JL_RNG_SIZE], uint64_t src[JL_RNG_SIZE]) JL_NOTSA
     src[4] = dst[4] = x * 0xd1342543de82ef95 + 1;
     // high spectrum multiplier from https://arxiv.org/abs/2001.05304
 
+    // random xor constants
     static const uint64_t a[4] = {
-        0x214c146c88e47cb7, // random additive offsets...
+        0x214c146c88e47cb7,
         0xa66d8cc21285aafa,
         0x68c7ef2d7b1a54d4,
         0xb053a7d7aa238c61
     };
+    // random odd multipliers
     static const uint64_t m[4] = {
         0xaef17502108ef2d9, // standard PCG multiplier
-        0xf34026eeb86766af, // random odd multipliers...
-        0x38fd70ad58dd9fbb,
-        0x6677f9b93ab0c04d
+        0x5329a060d41b0fe3,
+        0x1028b28b062ae5b9,
+        0x6095c81c297fdbc5
     };
 
     // PCG-RXS-M-XS-64 output with four variants
     for (int i = 0; i < 4; i++) {
-        uint64_t s = bswap_64(src[i]);
-        uint64_t w = x + a[i];
-        w ^= w >> ((w >> 59) + 5);
-        w *= m[i];
-        w ^= w >> 43;
-        dst[i] = 2*s*w + s + w; // (2s+1)(2w+1) ÷ 2
+        uint64_t s = src[i];
+        uint64_t w = x ^ a[i];
+        s += w*(2*s + 1); // s = (2s+1)(2w+1)÷2 % 2^64
+        s ^= s >> ((s >> 59) + 5);
+        s *= m[i];
+        s ^= s >> 43;
+        // mix key into the state
+        dst[i] = s;
     }
 }