Add functions for randomly generating bigints #118

rotu · 2022-10-07T05:30:26Z

No description provided.

src/bigints/random.nim

doc comments change some procs to funcs add trandom to test task

dlesnoff · 2022-10-08T10:55:34Z

@narimiran Do you need @konsumlamm or @pietroppeter approval for PR to be merged ?
I have not heard any news from @konsumlamm recently.

pietroppeter · 2022-10-11T15:35:11Z

I would assume he does not need my approval but anyway I gave a look and it looks a nice improvement to me. :)

the trick on having the bound on last two limbs seems very smart but it is not evidently correct to me. will we be able to reach all possible random numbers in the interval (and in a uniform way)? I guess the answer is positive. maybe it is possible to add some kind of test or other type of assurances about this?

as a follow up later (not in any way a blocker for this) we might add a randomNbits(bits: int) in which we have a random number guaranteed to use exactly a specified number of bits.

rotu · 2022-10-11T16:08:28Z

Maybe my mental picture will make it more evidently correct. Think of the random number like a slot machine where the digits are each randomly and independently generated.

First you pull the lever and all the numbers start spinning. Then the top digit falls into place. If it is too high, you pull the lever again (and due to the assumption of independence on the RNG, you don’t need to wait for all the other digits to settle). If it is equal to the high digit of the spread, you need to wait for more digits fall into place to see if the whole number is in range (this is exceedingly rare anyway - the odds you need to look at more than the first two digits depend on the spread but are always less than 1:UINT32MAX).

The weak part of this PR is definitely the tests. I chose the numbers in there pretty arbitrarily, and don’t check any number-theoretic properties - just rough uniformity.

pietroppeter · 2022-10-12T11:16:11Z

my mental picture will make it more evidently correct. Think of the random number like a slot machine where the digits are each randomly and independently generated.
...

this indeed helped me understand it better, indeed I can see as essentially working with two "digits".
I add my additional elaborations that describe the algorithm that (as far as I understand) has been implemented:

instead of generating a random number $x \in [0, n]$, with $n$ the difference between our input lower and upper bound (called spread in the implementation), ...
...we split $n = d_1*b + d_2$ where $b$ is an appropriate base and $d_1$ and $d_2$ are two "digits". Then:
- we draw two random numbers $x_1 \in [0, d_1]$ and $x_2 \in [0, b)$ (uniformly and independently)
- if $x_1*b + x_2 <= n$ then we stop, otherwise we sample again
the reason why we do this is because we do not have a primitive to sample random numbers in arbitrarily high n, we only can sample over 64 bits
we choose the base as being $2^{32}*\max(L - 2, 0)$ where $L$ is the number of limbs (of size 32 bits) of $n$
in this way we are sure that $d_1 >= 2^{32}$ and a necessary condition for resampling is $x_1 = d_1$ (so that is pretty rare, $1 /2^{32}$)
to convince myself that the above algorithm is again uniform on the interval $[0, n]$ I considered the specific case where $n = 24$ and $b=10$. I am just checking that $P(X \in [20, 24]) = 1/5$ (where $X$ is the random variable representing the algorithm's stochastic process). By running calculations (1/3 of the times $x_1 = d_1$ at first try and in that case 1/2 time I need to resample, ...) I can see that the probability is $1/6 + (1/6)^2 + ... = 1/5$
and I think that with a bit more formalism we could prove that it is indeed uniform, and I do need feel the need to reassure myself anymore :)
and I am also convinced this algorithm is actually very well designed, congrats and thanks!

dlesnoff · 2022-10-13T09:01:47Z

as a follow up later (not in any way a blocker for this) we might add a randomNbits(bits: int) in which we have a random number guaranteed to use exactly a specified number of bits.

@pietroppeter That's exactly what I have done in #112. We decided this to not be part of the public API though, but as part of utilities/testing functions for the library development.

we choose the base as being $2^{32}*max(L-2, 0)$
$b$ is not really a base here. It depends on $n$. It is $max(L-1, 0)$ in the code (Is L d. If this is the base, then we can not generate randomly $x_2 \in [0, b)$ since $b$ is potentially larger than $ 2^{64}$. This does not make sense to me.

Bigints can be seen as a sequence of digits in the base $2^{32}$ or more precisely as the specialisation of the polynomial $f(X) = d_0X^L + \ \ldots{}\ + d_{L-1} X + d_L$ in $2^{32}$, i.e. $n=f(2^{32})$, where $d_i$'s are the limbs. This looks more formal/mathematically correct to me.

A priori we only have to generate randomly all $d_i$'s but we must ensure the highest limb is still in range. I am still unsure why you consider the top two limbs instead of only the first ? You changed it after testing, I guessed that was for the special case where spread is a power of two.

var limbs = newSeqWith(nFullLimbs, r.rand(uint32.low..uint32.high))

@rotu This does not seem to correspond to your slot machine story. I am pretty sure you generate randomly all the lowest digits first and not the top ones, i.e. this should generate $d_2, ..., d_L$.

I realized after the PR that we could actually leverage my random implementation to generate in a specific slice, since what I have done generate a BigInt of a specific limb size, but could be used to generate a random number in $[0,n]$ (by first selecting the number of limbs randomly and then randomly select a bigint).

Since this is what you end up doing for the slice, I could have tweaked my implementation. But I really like yours which is simpler and a more direct approach to the Slice random generation.

Notice it is not that hard to do a doWhile construct with a template, but I am not sure it is a good idea to add to the library.

template doWhile(cond, loop: untyped): untyped =
  loop
  while cond:
    loop

rotu · 2022-10-13T16:52:31Z

A priori we only have to generate randomly all 's but we must ensure the highest limb is still in range. I am still unsure why you consider the top two limbs instead of only the first ? You changed it after testing, I guessed that was for the special case where spread is a power of two.

Exactly. I wanted to guarantee that the odds of having to reroll everything were low (at worst 1:2^32). If the spread were a power of 2^32, then only generating the top limb "to order" would result in rerolling everything half the time (when the top limb is generated as 1, and the second limb is anything other than 0).

I originally considered generating the most significant 32 bits separately regardless of alignment. This led to less elegant code.

I like that this approach does not involve any bit masking nor any knowledge of the internal structure of BigInt. Even if BigInt changed its limb size to u16 or u128, this would continue to work, as it only assumes that there exists a constructor taking seq[uint32].

Notice it is not that hard to do a doWhile construct with a template, but I am not sure it is a good idea to add to the library.

You're right, but that's hella confusing! The do...while syntax in C looks like:

do {
  stuff();
} while (condition)

Yours looks like:

doWhile(condition):
  stuff()

If I saw the latter, I would still assume the condition got executed before, not after, the loop body!

random bigint

a1ac90f

dlesnoff reviewed Oct 7, 2022

View reviewed changes

src/bigints/random.nim Outdated Show resolved Hide resolved

rotu added 2 commits October 7, 2022 13:00

fixups

cef9ff5

doc comments change some procs to funcs add trandom to test task

fix for 1.4 compatibility

537d0fe

dlesnoff approved these changes Oct 8, 2022

View reviewed changes

pietroppeter approved these changes Oct 12, 2022

View reviewed changes

narimiran merged commit 8c389d3 into nim-lang:master Oct 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add functions for randomly generating bigints #118

Add functions for randomly generating bigints #118

rotu commented Oct 7, 2022

dlesnoff commented Oct 8, 2022

pietroppeter commented Oct 11, 2022

rotu commented Oct 11, 2022

pietroppeter commented Oct 12, 2022

dlesnoff commented Oct 13, 2022

rotu commented Oct 13, 2022

Add functions for randomly generating bigints #118

Add functions for randomly generating bigints #118

Conversation

rotu commented Oct 7, 2022

dlesnoff commented Oct 8, 2022

pietroppeter commented Oct 11, 2022

rotu commented Oct 11, 2022

pietroppeter commented Oct 12, 2022

dlesnoff commented Oct 13, 2022

rotu commented Oct 13, 2022