Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add and test initRandomBigInt for issue #101 #112

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

dlesnoff
Copy link
Contributor

I added random as std dependency.
It is now possible to generate a BigInt which is exactly nbits long.
I also added a file for tests specific for probabilistic tests.

tests/trandom.nim Outdated Show resolved Hide resolved
tests/trandom.nim Outdated Show resolved Hide resolved
src/bigints.nim Outdated Show resolved Hide resolved
@dlesnoff
Copy link
Contributor Author

I added an enum type. You may find a less generic name than the one I chose.
Another option would be to do two procedures for the initialization of random Bigints :
initRandomBigintLimb and initRandomBigintBits
I prefer to use an enum type, that's much less to type.
I kept the init prefix to differentiate with a proc that would change an already initialized variable. I don't know if that would be useful, I have not implemented that. If we don't do that in the end, we may drop the init prefix -> randomBigint is shorter and much easier to memorize.

All these changes will be really useful for another benchmarking PR I am working on. It also closes issue #101.

@dlesnoff dlesnoff changed the title Add and test initRandomBigInt Add and test initRandomBigInt, closes #101 Aug 12, 2022
src/bigints.nim Outdated
SizeDescriptor* = enum
Limbs, Bits

proc initRandomBigInt*(number: Natural, unit: SizeDescriptor = Limbs): BigInt =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's unit supposed to mean here? That doesn't seem to fit at all. Maybe mode: RandomMode?

src/bigints.nim Outdated Show resolved Hide resolved
src/bigints.nim Outdated
## Initializes a standalone BigInt whose value is chosen randomly with exactly
## `number` bits or limbs, depending on the value of `unit`. By default, the
## BigInt is chosen with `number` limbs chosen randomly.
if unit == Limbs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should rather be an exhaustive case, in case a new variant is added.

src/bigints.nim Outdated
## BigInt is chosen with `number` limbs chosen randomly.
if unit == Limbs:
if number == 0:
raise newException(ValueError, "A Bigint must have at least one limb !")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it doesn't, no limbs means the value is 0, although that may change in the future ig?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a weird and problematic edge case here :

  • with 0 limbs, there is only one value possible, @[]. It is not really "random".
  • It means that zeros will appear with more probability than other values if we try to select a random bigint amongst output of random bigints with fixed size limb. Imagine you try to generate a random bigint with r limbs, with r from zero up to 10.
    Since there are multiple definitions of zeros. @[], @[0] (isNegative=true), and @[0] (isNegative=false), and two of them are potentially generated, ...
  • An empty seq @[] does not permit type inference on the type inside the seq, while @[0] does. In fact, this argument is limited it is not clear, if it is an int, a Natural or a Bigint. At least we can guess it is a SomeInt type.

The message is incorrect, I will change it for now, I do not think we should try to generate an empty seq anyway.

src/bigints.nim Outdated
Comment on lines 95 to 96
remainder = number mod 32
n_limbs = (if remainder == 0: number shr 5 else: number shr 5 + 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
remainder = number mod 32
n_limbs = (if remainder == 0: number shr 5 else: number shr 5 + 1)
remainder = number shr 5
n_limbs = (if remainder == 0: remainder else: remainder + 1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may add a variable for number shr 5 called quotient.
number mod 32 and number shr 5 does not do the same thing!
You changed the remainder definition but not the equality assertion with remainder.
I think you misunderstood these lines, I didn't get your changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, number shr 5 is number div 32. But number mod 32 can be turned into number and 31 (which may or may not help, depending on if that optimization is already done).

src/bigints.nim Outdated Show resolved Hide resolved
src/bigints.nim Outdated Show resolved Hide resolved
@@ -1198,3 +1242,4 @@ func powmod*(base, exponent, modulus: BigInt): BigInt =
result = (result * basePow) mod modulus
basePow = (basePow * basePow) mod modulus
exponent = exponent shr 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

There already is a trailing newline, isn't there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No there is not!
I added one, It makes the code fits better in the buffer.

tests/trandom.nim Outdated Show resolved Hide resolved
tests/trandom.nim Outdated Show resolved Hide resolved
@konsumlamm
Copy link
Contributor

Let me start this by saying that I don't think that this should be part of the public API.

The proposed function seems to be designed specifically for testing. For this purpose, it looks good to me and I'd be happy to use it for some randomized tests.

However, there are a number of problems for the general use case:

  • It uses the std/random RNG, so users can't simply use their favourite RNG (afaik there is no standard RNG interface for Nim, even std/sysrand and std/mersenne have different APIs).
  • The implementation uses rand, which uses the default RNG and is not thread-safe.
  • It only supports the creation of BigInts of a specific bit/limb size. I imagine most users would want to generate a BigInt between a and b instead.
  • The current API has no support for any special distributions.
  • Limb sizes are kind of an implementation detail and I doubt that most users care about them.

It also closes issue #101.

A small note: You need to write something like "closes #101" (in the description or a comment, not the title) so that GitHub actually closes the issue when merging the PR.

dlesnoff and others added 4 commits August 15, 2022 20:27
Remove an intermediate variable

Co-authored-by: konsumlamm <44230978+konsumlamm@users.noreply.github.com>
Co-authored-by: konsumlamm <44230978+konsumlamm@users.noreply.github.com>
Changed the description of initRandomBigInt

Co-authored-by: konsumlamm <44230978+konsumlamm@users.noreply.github.com>
Co-authored-by: konsumlamm <44230978+konsumlamm@users.noreply.github.com>
@dlesnoff
Copy link
Contributor Author

dlesnoff commented Aug 15, 2022

Thank you for this review !
Indeed, this should be removed from the public API, but the actual export system does not enable to export non-star procs to other modules in the library. We would not be able to use the proc outside of bigints.nim.

The implementation uses rand, which uses the default RNG and is not thread-safe.

That's important to keep in mind for the release of Nim v2, which is promised to be threads on by default.

It only supports the creation of BigInts of a specific bit/limb size. I imagine most users would want to generate a BigInt between a and b instead.

I totally agree with you on that. Actually, I think this can be implemented on top of this current random function and some rejection sampling.

I do not think I should close the issue now, as more random functions for testing have to be added. mratsim proposed two other test functions.

dlesnoff and others added 2 commits August 15, 2022 20:44
Co-authored-by: konsumlamm <44230978+konsumlamm@users.noreply.github.com>
@dlesnoff dlesnoff changed the title Add and test initRandomBigInt, closes #101 Add and test initRandomBigInt for issue #101 Aug 15, 2022
@konsumlamm
Copy link
Contributor

Indeed, this should be removed from the public API, but the actual export system does not enable to export non-star procs to other modules in the library. We would not be able to use the proc outside of bigints.nim.

The easiest solution is to simply move the random stuff out of bigints.nim (it doesn't need access to anything private, you can construct a BigInt from a seq[uint32] via initBigInt). This would ofc mean that it can't be used in the bigints module itself, but perhaps it would be better to put a future public random API into its own module anyway?

@dlesnoff
Copy link
Contributor Author

dlesnoff commented Aug 16, 2022

I need the function both in the benchmarks and in the tests which are in two separated folders. I will need to export it anyway.

it doesn't need access to anything private

Currently, it does. I am not sure if I can avoid private membership access.

We make the function slower by proceding in two steps, and takes twice as much memory (since the same seq is created twice, one for the generation of the random number and one for the initialisation of bigint).
The fact that the random generation time is negligible compared to the multiplication/division of two integers is crucial for the benchmarks.
I am okay with it for now.

I think the code needs refactoring anyway: everything is stored in a single monolithic file.
We could very well split bigints.nim into multiple nim files and include them all in bigints.nim.
We should separate bigint initialisation, representation (toString), basic operations on it: +, -, *, /, shr, shl
and advanced operations (operations that rely on simpler bigint operations): pow, powmod, gcd
With the include system, it acts just like the same as the current single file.
All the splitted files can go in a subdirectory, like the already existing src/bigints
If we want to do a file with helper functions that doesn't go into the public API, how should I call the file ? helper.nim ?
Are there any other functions that would go into it ?

@konsumlamm
Copy link
Contributor

I need the function both in the benchmarks and in the tests which are in two separated folders. I will need to export it anyway.

It can just be a separate module though (e.g. in tests/ or a new folder).

it doesn't need access to anything private

Currently, it does. I am not sure if I can avoid private membership access.

We make the function slower by proceding in two steps, and takes twice as much memory (since the same seq is created twice, one for the generation of the random number and one for the initialisation of bigint). The fact that the random generation time is negligible compared to the multiplication/division of two integers is crucial for the benchmarks. I am okay with it for now.

It shouldn't copy the seq, at least with ARC/ORC, since it is a sink parameter.

I think the code needs refactoring anyway: everything is stored in a single monolithic file. We could very well split bigints.nim into multiple nim files and include them all in bigints.nim. We should separate bigint initialisation, representation (toString), basic operations on it: +, -, *, /, shr, shl and advanced operations (operations that rely on simpler bigint operations): pow, powmod, gcd With the include system, it acts just like the same as the current single file. All the splitted files can go in a subdirectory, like the already existing src/bigints

Agreed.

@dlesnoff
Copy link
Contributor Author

dlesnoff commented Sep 4, 2022

@konsumlamm I have moved away the new function initRandomBigInt in a new file in bigints folder called utilities.nim, as requested.
Can you validate the changes ?
I have not checked whether it copies in place, but it should.

@demotomohiro
Copy link

How about to add rnd: var Rand parameter to initRandomBigInt and randomizeBigInt procs, and generate random numbers using it?
Doing so makes these procedures thread safe.

@rotu
Copy link
Contributor

rotu commented Oct 4, 2022

I did my own implementation (below), before I realized you already wrote one.

  1. The size of a limb is an implementation detail, not a feature. RandomMode = Limbs should not be exposed.
  2. I think the rand(lo..hi) is more in line with existing APIs and more intuitive.

For optimization purposes, if it's undesirable to construct ancillary BigInts to specify limits, it might make sense to have e.g. specialized types for arbitrary powers of two which just store an exponent and a sign (so you can do myBigInt -= fastPow2(900) as well as rand(0'bi..fastPow2(1000)).

import std/random
import bigints
import std/sugar
import std/options

proc rand(r: var Rand, x: Slice[BigInt]): BigInt =
  assert(x.a <= x.b, "empty slice")
  let spread = x.b - x.a
  if spread == 0'bi:
    return x.a
  let nbits = spread.fastLog2
  while true:
    let fulllimbs = nbits div 32
    var digits = collect:
      for i in 0 ..< fulllimbs:
        r.rand(uint32.low..uint32.high)
    let maxHighDigit = (spread shr (fulllimbs*32)).toInt[:uint32].get()
    digits.add(r.rand(uint32.low..maxHighDigit))
    result = initBigInt(digits)
    if result <= spread:
      break
  result += x.a

proc rand(x: Slice[BigInt]): BigInt = rand(randState(), x)

@dlesnoff
Copy link
Contributor Author

dlesnoff commented Oct 5, 2022

@rotu

I did my own implementation (below), before I realized you already wrote one.

No problem here, it is always good to have a comparison.

The size of a limb is an implementation detail, not a feature. RandomMode = Limbs should not be exposed.

You missed my code motivation here. I wanted to have a helper function for the library development and unit testing first and foremost. I need to generate both, there are slight variations in the generation between the two, and I don't expose it in the public API (unless there is an error).

I think the rand(lo..hi) is more in line with existing APIs and more intuitive.
This second way looked more difficult to generate in an uniformly distributed manner. Otherwise, it is indeed more untuitive (slices doesn't support steps though).

Quick comments on the code:
Don't use std/sugar. It is always possible to do without. Just importing it, not even using it, slows down the compilation quite a bit.
We don't want it, at least in the library, maybe for helper functions.

Why do you move out the conditional from the while loop ?

while true:
  ...
  if result <= spread:
    break

I still have to look closely, but the code looks simpler than mine, and I have carefully tried many problematic edge cases.

@demotomohiro I forgot! Glad to learn about thread safety, I have an applied mathematics degree and don't know much yet about thread safety. Sorry, but I have very few time to spare on the library.

Finally, I think that the function should have at least some test for the shape of sample's distribution (to check if its uniform).
One way to do this would be to take the remainder of each generated bigint by a set of small integers (less than a limb, or just a dozen) and to look if this is uniform.

@rotu
Copy link
Contributor

rotu commented Oct 6, 2022

@dlesnoff

You terribly missed my code motivation here. I wanted to have a helper function for the library development and unit testing first and foremost. I need to generate both, there are slight variations in the generation between the two, and I don't expose it in the public API (unless there is an error).

Aha! Yes, I did miss your intention :-).

Quick comments on the code: Don't use std/sugar. It is always possible to do without. Just importing it, not even using it, slows down the compilation quite a bit. We don't want it, at least in the library, maybe for helper functions.

I've since discovered newSeqWith.

Why do you move out the conditional from the while loop ?

I'm not sure I understand the question. I would use a do-while loop, but Nim doesn't support them.

The loop body generates a uniformly distributed random number in the range 0..limit, where limit is a number slightly above spread (in particular, limit has the same first limb as spread and all the other limbs equal to uint32.high). Then I postselect to ensure the chosen value is actually less than spread without breaking uniformity.

import bigints
import std/sequtils
import std/options
import std/random

proc rand(r: var Rand, x: Slice[BigInt]): BigInt =
  assert(x.a <= x.b, "empty slice")
  let spread = x.b - x.a
  if spread == 0'bi:
    return x.a
  let
    nbits = spread.fastLog2
    nFullLimbs = nbits div 32
    maxHighDigit = (spread shr (nFullLimbs*32)).toInt[:uint32].get()
  while true:
    var digits = newSeqWith(nFullLimbs, r.rand(uint32.low..uint32.high))
    digits.add(r.rand(uint32.low..maxHighDigit))
    result = initBigInt(digits)
    if result <= spread:
      break
  result += x.a

proc rand(x: Slice[BigInt]): BigInt = rand(randState(), x)

@dlesnoff
Copy link
Contributor Author

dlesnoff commented Oct 6, 2022

Oh, I see. First, excuse me, my message was rude.

I think your code would actually be a great addition to the lib. I thought the slice would be harder, but like you said, you just use rejection sampling.

If you do another PR with a bit of testing and make your function public, I'll review and approve it.

@dlesnoff dlesnoff marked this pull request as draft May 18, 2023 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants