Add and test initRandomBigInt for issue #101 #112

dlesnoff · 2022-08-10T16:43:52Z

I added random as std dependency.
It is now possible to generate a BigInt which is exactly nbits long.
I also added a file for tests specific for probabilistic tests.

tests/trandom.nim

src/bigints.nim

… a fixed number of limbs

… add tests for gcd

dlesnoff · 2022-08-12T12:37:01Z

I added an enum type. You may find a less generic name than the one I chose.
Another option would be to do two procedures for the initialization of random Bigints :
initRandomBigintLimb and initRandomBigintBits
I prefer to use an enum type, that's much less to type.
I kept the init prefix to differentiate with a proc that would change an already initialized variable. I don't know if that would be useful, I have not implemented that. If we don't do that in the end, we may drop the init prefix -> randomBigint is shorter and much easier to memorize.

All these changes will be really useful for another benchmarking PR I am working on. It also closes issue #101.

konsumlamm · 2022-08-15T08:58:50Z

src/bigints.nim

+  SizeDescriptor* = enum
+    Limbs, Bits
+
+proc initRandomBigInt*(number: Natural, unit: SizeDescriptor = Limbs): BigInt =


What's unit supposed to mean here? That doesn't seem to fit at all. Maybe mode: RandomMode?

src/bigints.nim

konsumlamm · 2022-08-15T09:03:50Z

src/bigints.nim

+  ## Initializes a standalone BigInt whose value is chosen randomly with exactly
+  ## `number` bits or limbs, depending on the value of `unit`. By default, the 
+  ## BigInt is chosen with `number` limbs chosen randomly.
+  if unit == Limbs:


This should rather be an exhaustive case, in case a new variant is added.

konsumlamm · 2022-08-15T09:04:55Z

src/bigints.nim

+  ## BigInt is chosen with `number` limbs chosen randomly.
+  if unit == Limbs:
+    if number == 0:
+      raise newException(ValueError, "A Bigint must have at least one limb !")


Actually, it doesn't, no limbs means the value is 0, although that may change in the future ig?

This is actually a weird and problematic edge case here :

with 0 limbs, there is only one value possible, @[]. It is not really "random".

It means that zeros will appear with more probability than other values if we try to select a random bigint amongst output of random bigints with fixed size limb. Imagine you try to generate a random bigint with r limbs, with r from zero up to 10.
Since there are multiple definitions of zeros. @[], @[0] (isNegative=true), and @[0] (isNegative=false), and two of them are potentially generated, ...

An empty seq @[] does not permit type inference on the type inside the seq, while @[0] does. In fact, this argument is limited it is not clear, if it is an int, a Natural or a Bigint. At least we can guess it is a SomeInt type.

The message is incorrect, I will change it for now, I do not think we should try to generate an empty seq anyway.

konsumlamm · 2022-08-15T09:10:17Z

src/bigints.nim

+      remainder = number mod 32
+      n_limbs = (if remainder == 0: number shr 5 else: number shr 5 + 1)


Suggested change

remainder = number mod 32

n_limbs = (if remainder == 0: number shr 5 else: number shr 5 + 1)

remainder = number shr 5

n_limbs = (if remainder == 0: remainder else: remainder + 1)

I may add a variable for number shr 5 called quotient.
number mod 32 and number shr 5 does not do the same thing!
You changed the remainder definition but not the equality assertion with remainder.
I think you misunderstood these lines, I didn't get your changes.

Oh right, number shr 5 is number div 32. But number mod 32 can be turned into number and 31 (which may or may not help, depending on if that optimization is already done).

src/bigints.nim

konsumlamm · 2022-08-15T09:13:26Z

src/bigints.nim

@@ -1198,3 +1242,4 @@ func powmod*(base, exponent, modulus: BigInt): BigInt =
        result = (result * basePow) mod modulus
      basePow = (basePow * basePow) mod modulus
      exponent = exponent shr 1
+


Suggested change

There already is a trailing newline, isn't there?

No there is not!
I added one, It makes the code fits better in the buffer.

tests/trandom.nim

konsumlamm · 2022-08-15T09:36:21Z

Let me start this by saying that I don't think that this should be part of the public API.

The proposed function seems to be designed specifically for testing. For this purpose, it looks good to me and I'd be happy to use it for some randomized tests.

However, there are a number of problems for the general use case:

It uses the std/random RNG, so users can't simply use their favourite RNG (afaik there is no standard RNG interface for Nim, even std/sysrand and std/mersenne have different APIs).
The implementation uses rand, which uses the default RNG and is not thread-safe.
It only supports the creation of BigInts of a specific bit/limb size. I imagine most users would want to generate a BigInt between a and b instead.
The current API has no support for any special distributions.
Limb sizes are kind of an implementation detail and I doubt that most users care about them.

It also closes issue #101.

A small note: You need to write something like "closes #101" (in the description or a comment, not the title) so that GitHub actually closes the issue when merging the PR.

Remove an intermediate variable Co-authored-by: konsumlamm <44230978+konsumlamm@users.noreply.github.com>

Co-authored-by: konsumlamm <44230978+konsumlamm@users.noreply.github.com>

Changed the description of initRandomBigInt Co-authored-by: konsumlamm <44230978+konsumlamm@users.noreply.github.com>

Co-authored-by: konsumlamm <44230978+konsumlamm@users.noreply.github.com>

dlesnoff · 2022-08-15T18:44:11Z

Thank you for this review !
Indeed, this should be removed from the public API, but the actual export system does not enable to export non-star procs to other modules in the library. We would not be able to use the proc outside of bigints.nim.

The implementation uses rand, which uses the default RNG and is not thread-safe.

That's important to keep in mind for the release of Nim v2, which is promised to be threads on by default.

It only supports the creation of BigInts of a specific bit/limb size. I imagine most users would want to generate a BigInt between a and b instead.

I totally agree with you on that. Actually, I think this can be implemented on top of this current random function and some rejection sampling.

I do not think I should close the issue now, as more random functions for testing have to be added. mratsim proposed two other test functions.

Co-authored-by: konsumlamm <44230978+konsumlamm@users.noreply.github.com>

konsumlamm · 2022-08-16T01:39:32Z

Indeed, this should be removed from the public API, but the actual export system does not enable to export non-star procs to other modules in the library. We would not be able to use the proc outside of bigints.nim.

The easiest solution is to simply move the random stuff out of bigints.nim (it doesn't need access to anything private, you can construct a BigInt from a seq[uint32] via initBigInt). This would ofc mean that it can't be used in the bigints module itself, but perhaps it would be better to put a future public random API into its own module anyway?

dlesnoff · 2022-08-16T09:36:19Z

I need the function both in the benchmarks and in the tests which are in two separated folders. I will need to export it anyway.

it doesn't need access to anything private

Currently, it does. I am not sure if I can avoid private membership access.

We make the function slower by proceding in two steps, and takes twice as much memory (since the same seq is created twice, one for the generation of the random number and one for the initialisation of bigint).
The fact that the random generation time is negligible compared to the multiplication/division of two integers is crucial for the benchmarks.
I am okay with it for now.

I think the code needs refactoring anyway: everything is stored in a single monolithic file.
We could very well split bigints.nim into multiple nim files and include them all in bigints.nim.
We should separate bigint initialisation, representation (toString), basic operations on it: +, -, *, /, shr, shl
and advanced operations (operations that rely on simpler bigint operations): pow, powmod, gcd
With the include system, it acts just like the same as the current single file.
All the splitted files can go in a subdirectory, like the already existing src/bigints
If we want to do a file with helper functions that doesn't go into the public API, how should I call the file ? helper.nim ?
Are there any other functions that would go into it ?

konsumlamm · 2022-08-16T11:29:43Z

I need the function both in the benchmarks and in the tests which are in two separated folders. I will need to export it anyway.

It can just be a separate module though (e.g. in tests/ or a new folder).

it doesn't need access to anything private

Currently, it does. I am not sure if I can avoid private membership access.

We make the function slower by proceding in two steps, and takes twice as much memory (since the same seq is created twice, one for the generation of the random number and one for the initialisation of bigint). The fact that the random generation time is negligible compared to the multiplication/division of two integers is crucial for the benchmarks. I am okay with it for now.

It shouldn't copy the seq, at least with ARC/ORC, since it is a sink parameter.

I think the code needs refactoring anyway: everything is stored in a single monolithic file. We could very well split bigints.nim into multiple nim files and include them all in bigints.nim. We should separate bigint initialisation, representation (toString), basic operations on it: +, -, *, /, shr, shl and advanced operations (operations that rely on simpler bigint operations): pow, powmod, gcd With the include system, it acts just like the same as the current single file. All the splitted files can go in a subdirectory, like the already existing src/bigints

Agreed.

dlesnoff · 2022-09-04T17:22:39Z

@konsumlamm I have moved away the new function initRandomBigInt in a new file in bigints folder called utilities.nim, as requested.
Can you validate the changes ?
I have not checked whether it copies in place, but it should.

demotomohiro · 2022-09-22T22:31:09Z

How about to add rnd: var Rand parameter to initRandomBigInt and randomizeBigInt procs, and generate random numbers using it?
Doing so makes these procedures thread safe.

rotu · 2022-10-04T18:49:34Z

I did my own implementation (below), before I realized you already wrote one.

The size of a limb is an implementation detail, not a feature. RandomMode = Limbs should not be exposed.
I think the rand(lo..hi) is more in line with existing APIs and more intuitive.

For optimization purposes, if it's undesirable to construct ancillary BigInts to specify limits, it might make sense to have e.g. specialized types for arbitrary powers of two which just store an exponent and a sign (so you can do myBigInt -= fastPow2(900) as well as rand(0'bi..fastPow2(1000)).

import std/random
import bigints
import std/sugar
import std/options

proc rand(r: var Rand, x: Slice[BigInt]): BigInt =
  assert(x.a <= x.b, "empty slice")
  let spread = x.b - x.a
  if spread == 0'bi:
    return x.a
  let nbits = spread.fastLog2
  while true:
    let fulllimbs = nbits div 32
    var digits = collect:
      for i in 0 ..< fulllimbs:
        r.rand(uint32.low..uint32.high)
    let maxHighDigit = (spread shr (fulllimbs*32)).toInt[:uint32].get()
    digits.add(r.rand(uint32.low..maxHighDigit))
    result = initBigInt(digits)
    if result <= spread:
      break
  result += x.a

proc rand(x: Slice[BigInt]): BigInt = rand(randState(), x)

dlesnoff · 2022-10-05T22:19:07Z

@rotu

I did my own implementation (below), before I realized you already wrote one.

No problem here, it is always good to have a comparison.

The size of a limb is an implementation detail, not a feature. RandomMode = Limbs should not be exposed.

You missed my code motivation here. I wanted to have a helper function for the library development and unit testing first and foremost. I need to generate both, there are slight variations in the generation between the two, and I don't expose it in the public API (unless there is an error).

I think the rand(lo..hi) is more in line with existing APIs and more intuitive.
This second way looked more difficult to generate in an uniformly distributed manner. Otherwise, it is indeed more untuitive (slices doesn't support steps though).

Quick comments on the code:
Don't use std/sugar. It is always possible to do without. Just importing it, not even using it, slows down the compilation quite a bit.
We don't want it, at least in the library, maybe for helper functions.

Why do you move out the conditional from the while loop ?

while true:
  ...
  if result <= spread:
    break

I still have to look closely, but the code looks simpler than mine, and I have carefully tried many problematic edge cases.

@demotomohiro I forgot! Glad to learn about thread safety, I have an applied mathematics degree and don't know much yet about thread safety. Sorry, but I have very few time to spare on the library.

Finally, I think that the function should have at least some test for the shape of sample's distribution (to check if its uniform).
One way to do this would be to take the remainder of each generated bigint by a set of small integers (less than a limb, or just a dozen) and to look if this is uniform.

rotu · 2022-10-06T03:31:51Z

@dlesnoff

You terribly missed my code motivation here. I wanted to have a helper function for the library development and unit testing first and foremost. I need to generate both, there are slight variations in the generation between the two, and I don't expose it in the public API (unless there is an error).

Aha! Yes, I did miss your intention :-).

Quick comments on the code: Don't use std/sugar. It is always possible to do without. Just importing it, not even using it, slows down the compilation quite a bit. We don't want it, at least in the library, maybe for helper functions.

I've since discovered newSeqWith.

Why do you move out the conditional from the while loop ?

I'm not sure I understand the question. I would use a do-while loop, but Nim doesn't support them.

The loop body generates a uniformly distributed random number in the range 0..limit, where limit is a number slightly above spread (in particular, limit has the same first limb as spread and all the other limbs equal to uint32.high). Then I postselect to ensure the chosen value is actually less than spread without breaking uniformity.

import bigints
import std/sequtils
import std/options
import std/random

proc rand(r: var Rand, x: Slice[BigInt]): BigInt =
  assert(x.a <= x.b, "empty slice")
  let spread = x.b - x.a
  if spread == 0'bi:
    return x.a
  let
    nbits = spread.fastLog2
    nFullLimbs = nbits div 32
    maxHighDigit = (spread shr (nFullLimbs*32)).toInt[:uint32].get()
  while true:
    var digits = newSeqWith(nFullLimbs, r.rand(uint32.low..uint32.high))
    digits.add(r.rand(uint32.low..maxHighDigit))
    result = initBigInt(digits)
    if result <= spread:
      break
  result += x.a

proc rand(x: Slice[BigInt]): BigInt = rand(randState(), x)

dlesnoff · 2022-10-06T21:25:42Z

Oh, I see. First, excuse me, my message was rude.

I think your code would actually be a great addition to the lib. I thought the slice would be harder, but like you said, you just use rejection sampling.

If you do another PR with a bit of testing and make your function public, I'll review and approve it.

Add and test initRandomBigInt

49a7817