Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libstd: Implement BigInt and BigUint. Issue #37 #4198

Merged
merged 5 commits into from
Jan 8, 2013
Merged

libstd: Implement BigInt and BigUint. Issue #37 #4198

merged 5 commits into from
Jan 8, 2013

Conversation

gifnksm
Copy link
Contributor

@gifnksm gifnksm commented Dec 15, 2012

Implement BigInt (arbitrary precision integer) type.

@brson
Copy link
Contributor

brson commented Dec 16, 2012

I'm excited about this. Looks cool.

@huonw
Copy link
Member

huonw commented Dec 16, 2012

How does this compare to this? Presumably the rust-gmp bindings are faster, but this has the advantage of being pure rust.

@kud1ing
Copy link

kud1ing commented Dec 16, 2012

@graydon
Copy link
Contributor

graydon commented Dec 17, 2012

Awesome. I figured we'd bind to acme bignum (it's in rt/) but this looks like a pretty direct reimplementation in plain rust. I'm into it!

@bstrie bstrie mentioned this pull request Dec 17, 2012
@thestinger
Copy link
Contributor

@Lenny222: The only real problem with using it with rust is the licensing one (it can't be in the standard library), and it's awesome to have pure rust libraries for things like this to evolve the performance of the language as a whole.

@dbaupp: gmp is very fast because it uses different algorithms for a task like multiplication based on the size of the input, so they have much better asymptotic performance than most other libraries (covered here). It's also full of hand-rolled assembly and years of optimization, so it would be quite hard to compete with it from a performance standpoint. I think it would be a better idea to compare against implementations that other languages use (Go, Haskell, Python, etc.) and work towards beating all of those 😉.

@kud1ing
Copy link

kud1ing commented Dec 18, 2012

Some of the missing Rust Shootout-benchmarks need big numbers.
Those benchmarks could be used to test and benchmark the new code.

See #2776

@ahmadsalim
Copy link

I had also done a Big Integer implementation in parallel (https://github.com/ahmadsalim/rust-bigint), if it can be in any help.

@brson
Copy link
Contributor

brson commented Dec 21, 2012

@ahmadsalim Thanks!

The two major differences I see are that the @gifnksm implementation has both BigInt and BigUint, while @ahmadsalim only has BigInt. @ahmadsalim uses ~ while @gifnksm uses @.

Let's consider this carefully.

Some observations:

  • we do need to avoid @ or else big ints will be second class compared to primitive ints.
  • having both signed and unsigned version seems appropriate, but maybe they should both go in the same std::bigint module.

@thestinger
Copy link
Contributor

I think using ~ is definitely the way to go, and I don't think it really implies more copies. It just requires the calling code to be a bit smarter in certain cases where unnecessary allocation can be avoided. There are idioms used by the APIs of C big integer libraries to do minimal copying, but it doesn't map well to operator overloading.

gmp has function definitions like this:

mpz_neg(output, input)
mpz_add(output, input1, input2)

The output can be the same variable as one of the inputs, or a different one. It will (for some operations) reuse the memory allocated for the output variable, whether or not the operation is actually in-place.

Equivalents in rust:

// gmp: mpz_add(x, y, z)
x = y + z // to initialize x
x.set_add(y, z) // to re-use memory allocated to x
// gmpxx uses an operator= overload using template hackery to do this for simple cases

An operation like i.set_add(x * y, z) still has an implicit temporary value from the multiplication, but in a loop you could store it in variable and use .set_mul(x, y) instead of reallocating each time.

// gmp: mpz_add(y, y, z)
y.set_add(y, z)
y += z // once rust can overload += separately

So basically, you just need a mutable in-place API and then a pure API implemented with that (set_neg in addition to neg, etc.).

@gifnksm
Copy link
Contributor Author

gifnksm commented Dec 22, 2012

Thank you for the many comments.

I try to reimplement std::bigint with using ~ vectors.
After that, I'll compare the performance of std::bigint and big int libralies in other languages (perl, python, ruby, haskell, gmp) by some simple benchmarks.

If possible, I would like to implement in-place calculation that was pointed out by @thestinger.

@ahmadsalim
Copy link

@gifnksm That seems to be the best solution, as you currently have an implementation of the optimized algorithms.
Although different strategies can be utilized, depending on the size of input, like GMP does.

@gifnksm
Copy link
Contributor Author

gifnksm commented Dec 23, 2012

I did simple benchmark tests. The result is https://gist.github.com/4360131#file-benchmark-result-txt .

This benchmark has two tests for each languages. The first test fib(n) displays the nth Fibonacci number. The second test factorial(n) displays the factorial of n. These tests measure the amount of time it takes to compute and display (converting to decimal).
In Haskell, because I don't know how to disable lazy evaluation, display time and computation time are not clearly separated. Since the test only works on python2, an error has occurred in the test of python in Arch Linux.

My first impression is my implementation is very slow.
Addition time and multiplication time are not so bad (but still slow. I think this is not a problem of the algorithm, and it's only a matter of how to program.)
Display time is terribly slow. I think this caused by the following two points.

  • Radix conversion is implemented by a simple division.
  • Slow implementation of the division.

For now, I try to implement the Karatsuba radix conversion.

@ahmadsalim
Copy link

@gifnksm The reason that Haskell does not strictly evaluate your functions are that the let-bindings themselves are lazy.
This can be solved in two ways, the first of which is to use the let-bound variable (in factorial case):

...
let fac = factorial n
return $! fac
... 

The second is to use bang patterns to strictly evaluate the variable at binding time:

{-# LANGUAGE BangPatterns #-}
...
let !fac = factorial n
...

Hopefully this answer will help you to do a more accurate comparison.

@gifnksm
Copy link
Contributor Author

gifnksm commented Dec 29, 2012

@ahmadsalim
I tried the first way, the program seems to be measured accurately.
Thank you!

Now, I'm elminating extra memory allocation operations.
Some calculations are now 10 times faster than before!

@nikomatsakis
Copy link
Contributor

This looks like a good start! r+ from me. I'm sure we can optimize and improve over time, but the interface seems minimal and reasonable.

@brson brson merged commit 68c689f into rust-lang:incoming Jan 8, 2013
@brson
Copy link
Contributor

brson commented Jan 8, 2013

Merged. Thanks for the thorough review @ahmadsalim, @gifnksm, @thestinger and everyone.

@brson
Copy link
Contributor

brson commented Jan 9, 2013

@gifnksm I had to disable some of the tests because they fail on x86: #4393. Please give them a look. You can test by configuring with --host-triple=i686-unknown-linux-gnu (or similar).

@gifnksm gifnksm deleted the bigint branch January 9, 2013 10:18
@gifnksm
Copy link
Contributor Author

gifnksm commented Jan 9, 2013

@brson Thank you for merging! I try to fix #4393.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants