Implement floating point conversion #110

rotu · 2022-07-26T22:58:27Z

Fixes Feature request: toBiggestInt #34

Fixes Feature request: toBiggestInt nim-lang#34

konsumlamm

Tbh, I'm not sure this feature is really worth implementing. Imo we shouldn't just implement every feature request without first thinking about it. Do you have any concrete use case? Is there any precedent in some standard bigint implementation out there?

konsumlamm · 2022-07-27T11:37:01Z

src/bigints.nim

@@ -1140,6 +1140,11 @@ func fastLog2*(a: BigInt): int =
 return -1
 bitops.fastLog2(a.limbs[^1]) + 32*(a.limbs.high)

+func toBiggestFloat*(x: BigInt): BiggestFloat =


Suggested change

func toBiggestFloat*(x: BigInt): BiggestFloat =

func toFloat*[T: SomeFloat](x: BigInt): Option[T] =

IMO there's no reason to special case BiggestFloat (which is float64 currently). The implementation should work the same with 32 bit floats. I also think returning an Option makes more sense, since not every BigInt can be represented as a float (unless we just round).

+1 for the Option (see recently closed PR for BigInt -> integer type conversion).
For the examples, you want to test the saturation values and the one above.
See : https://en.wikipedia.org/wiki/Double-precision_floating-point_format#Double-precision_examples for some values.

We might want to provide +Inf, -Inf, subnormal values instead of an Option though. I prefer the Option solution, since we lose (almost) all information by converting. The option may warn the user of the conversion problem.

The reason to special case BiggestFloat is because it's modeled after https://nim-lang.org/docs/system.html#toBiggestFloat%2CBiggestInt

The API for toFloat is ugly since there is no way to deduce the type parameter. I'm avoiding a generic interface since there's no clear guidance on when to use generics versus typedesc parameters. [RFC] guidelines for when to use typedesc vs generics RFCs#40

BigInt to integer is a very, very different case. Floating point math is, by nature, approximate. It is expected to be a lossy conversion, so rounding is the right thing to do here. I added some tests for the Infinite cases.

BTW, in looking into this, I also discovered this bug nim-lang/Nim#20102

The reason to special case BiggestFloat is because it's modeled after https://nim-lang.org/docs/system.html#toBiggestFloat%2CBiggestInt

I don't see that as a good reason to repeat this here.

The API for toFloat is ugly since there is no way to deduce the type parameter. I'm avoiding a generic interface since there's no clear guidance on when to use generics versus typedesc parameters. [RFC] guidelines for when to use typedesc vs generics RFCs#40

Wether that's ugly or not is quite subjective. I don't see a problem with toFloat[float64](...) (it isn't much more to type than toBiggestFloat). I don't see why we should special case BiggestFloat (aka float64).

BigInt to integer is a very, very different case. Floating point math is, by nature, approximate. It is expected to be a lossy conversion, so rounding is the right thing to do here. I added some tests for the Infinite cases.

That's a fair point.

Okay. So why proc toFloat[T:SomeFloat](x: BigInt):T and not proc toFloat(x: BigInt, T: type SomeFloat):T or even proc to(x:BigInt, T:type SomeFloat):T?

Because imo you should use generics for passing type parameters. I know that not everyone agrees with this though. If we ever get better type inference (which I really hope we do), the generic parameter here could even be inferred, based on subsequent uses of the result.

I switched to func to*(x: BigInt, T:type SomeFloat): T

This has the benefit of not repeating "float" in calls (x.toFloat[float32]()).

I didn't use generics for the following reasons:

Generic specialization does not work. (Error: overloaded 'to' leads to ambiguous calls if I try to use generics and add a to[SomeInteger]).

In method call syntax, it's ambiguous with an indexer (x.to[:float32]() leads to an unintuitive error)

FWIW I would actually support to implement toBiggestFloat instead of conversion for all types of float.

The reason to special case BiggestFloat is because it's modeled after https://nim- lang.org/docs/system.html#toBiggestFloat%2CBiggestInt

I don't see that as a good reason to repeat this here.

For me that is a good reason. It means having a consistent API across nim ecosystem (and indeed bigints api is modeled after system api). It would not be a good reason if toBiggestFloat in system was originally a bad api (so we should not copy it), but I do not see why that would be the case.

I don't see why we should special case BiggestFloat (aka float64).

Because the use case call for converting to a float and it make sense to convert to biggest float you have.
I do not think we should support conversion to multiple floats, since that would mean that the user would have to think: "which float should I use? why would I want to convert to a smaller float?"

src/bigints.nim

rotu · 2022-07-27T18:06:06Z

Tbh, I'm not sure this feature is really worth implementing. Imo we shouldn't just implement every feature request without first thinking about it. Do you have any concrete use case? Is there any precedent in some standard bigint implementation out there?

I've modeled it off of https://nim-lang.org/docs/system.html#toBiggestFloat%2CBiggestInt
The major use cases are:

for convenient display
for ballpark estimates when the actual value may or may not be needed with full precision

konsumlamm · 2022-07-27T21:44:03Z

for convenient display

for ballpark estimates when the actual value may or may not be needed with full precision

Can you elaborate? I don't see how conversion to float is needed for that.

rotu · 2022-07-28T00:51:19Z

for convenient display

for ballpark estimates when the actual value may or may not be needed with full precision

Can you elaborate? I don't see how conversion to float is needed for that.

Instead of reimplementing display formatting, float conversion allows you to leverage existing tools: https://nim-lang.org/docs/strformat.html#formatting-floats

Also, floating point allows you to do operations that may not be an integer, such as square roots. While no floating point can represent irrational numbers, an estimate may be good enough.

pietroppeter · 2022-07-29T10:04:34Z

I added a note above on why I actually think toBiggestFloat (the original proposal, changed in latest commit) is a better api. I think it could be interesting to hear the opinion of @narimiran on this matter.

demotomohiro · 2022-07-31T08:56:25Z

for convenient display

for ballpark estimates when the actual value may or may not be needed with full precision

Can you elaborate? I don't see how conversion to float is needed for that.

Instead of reimplementing display formatting, float conversion allows you to leverage existing tools: https://nim-lang.org/docs/strformat.html#formatting-floats

Also, floating point allows you to do operations that may not be an integer, such as square roots. While no floating point can represent irrational numbers, an estimate may be good enough.

Large BigInt values cannot be converted to float64 or they just become inf.
Here is example code that returns an estimate as mantissa and exponent part and approximated square root:

import std/[math, strformat, options]
import bigints

proc approximate10(x: BigInt; precision: range[1..18]): tuple[mantissa, exponent: int] =
  # int.high.float.log10 ≒ 18.96
  let
    # https://en.wikipedia.org/wiki/Logarithm#Change_of_base
    log10 = (x.fastLog2.float / 3.321928).int  # log2(10) = 3.321928
    exp = max(0, log10 - precision)

  ((x div 10.initBigInt.pow(exp)).toInt[:int].get, exp)

proc test =
  block:
    let
      a = 10.initBigInt.pow(40)
      (m, e) = a.approximate10(5)

    echo a
    echo &"≒ {m} * 10^{e}"

  block:
    let
      a = 1234567890.initBigInt
      (m, e) = a.approximate10(3)

    echo a
    echo &"≒ {m} * 10^{e}"

test()

proc sqrt(x: BigInt; maxIter: int): BigInt =
  # Quick approximated value
  #result = 1.initBigInt shl (x.fastLog2 div 2)
  #result = x shr (x.fastLog2 div 2 + 1)
  let
    log2 = x.fastLog2
    b2 = (x shr (log2 - 1)).toInt[:int].get and 1
  result = (1.initBigInt shl (log2 div 2)) or ((log2 mod 2).initBigInt shl (log2 div 2 - 2 + b2))

  # Refine value with Newton's method
  # https://en.wikipedia.org/wiki/Newton's_method
  for i in 1..maxIter:
    result = (result * result + x) div (result * 2.initBigInt)

proc testSqrt =
  block:
    var a = (256 + 128).initBigInt
    echo a
    echo a.sqrt(0)
    echo a.sqrt(1)
    echo a.sqrt(2)
    echo a.sqrt(3)

  block:
    var a = 4000000.initBigInt
    echo a
    echo a.sqrt(0)
    echo a.sqrt(1)
    echo a.sqrt(2)
    echo a.sqrt(3)

  block:
    var a = 6000000.initBigInt
    echo a
    echo a.sqrt(0)
    echo a.sqrt(1)
    echo a.sqrt(2)
    echo a.sqrt(3)

  block:
    var a = 8000000.initBigInt
    echo a
    echo a.sqrt(0)
    echo a.sqrt(1)
    echo a.sqrt(2)
    echo a.sqrt(3)

  block:
    var a = 9000000.initBigInt
    echo a
    echo a.sqrt(0)
    echo a.sqrt(1)
    echo a.sqrt(2)
    echo a.sqrt(3)

testSqrt()

Example output:

10000000000000000000000000000000000000000
≒ 1000000 * 10^34
1234567890
≒ 1234 * 10^6
384
16
20
19
19
4000000
1536
2070
2001
2000
6000000
2048
2488
2449
2449
8000000
2048
2977
2832
2828
9000000
2560
3037
3000
3000

This is quick implementation and not much tested.
But these algorithm would works for any BigInt value.

rotu · 2022-07-31T22:25:22Z

@demotomohiro,

I think you were trying to demonstrate this PR is unnecessary, but have shown that the opposite.

Note that fastLog2 returns a crude (truncated integer) estimate of the base-2 logarithm. You would have gotten correct formatting had you used a conversion with the float formatting functions.

Also, your Newton iteration converges on the wrong result due to quantization error. sqrt(384) is closer to 20 than 19. Plus I'm willing to bet your hardware has accelerated floating point operations that outperform this by several orders of magnitude.

dlesnoff · 2022-08-01T08:53:02Z

@konsumlamm

Is there any precedent in some standard bigint implementation out there?

Yes. There is. In GMP : https://gmplib.org/manual/Converting-Integers

Function: double mpz_get_d_2exp (signed long int *exp, const mpz_t op)
Convert op to a double, truncating if necessary (i.e. rounding towards zero), and returning the exponent separately.
The return value is in the range 0.5<=abs(d)<1 and the exponent is stored to *exp. d * 2^exp is the (truncated) op value. If op is zero, the return is 0.0 and 0 is stored to *exp.
This is similar to the standard C frexp function (see Normalization Functions in The GNU C Library Reference Manual).
Notice that they fixed the type of output to double and the rounding mode towards zero is not necessarily the one used in the rest of the program/by the computer.

Sorry, that is not a bigint implementation of the standard library of a language.

demotomohiro · 2022-08-02T07:31:19Z

@demotomohiro,

I think you were trying to demonstrate this PR is unnecessary, but have shown that the opposite.

Note that fastLog2 returns a crude (truncated integer) estimate of the base-2 logarithm. You would have gotten correct formatting had you used a conversion with the float formatting functions.

Also, your Newton iteration converges on the wrong result due to quantization error. sqrt(384) is closer to 20 than 19. Plus I'm willing to bet your hardware has accelerated floating point operations that outperform this by several orders of magnitude.

I just used fastLog2 because it seems you don't need accurate result and want to display it quickly.
Even if float formatting functions in standard library works correctly, not all BigInt can be converted to float.
Even if it is small enough to be converted to float, converting to float loses precision.
When you need accurate correct formatting, you need to do that without converting to float32/64.

When you need an accurate squre root of BitInt, you cannot convert it to float32/64 as it lose precision.
Even if sqrtsd/sqrtpd instruction in SSE runs fast, converting BigInt to float takes time and lose precision.
So unless you do more float operations after converting to float, converting to float would not much faster.
But people actually do many float operations after converting to float?

I'm not sure this feature is really needed or not.
But if you just want to display it nicely or get squre root, I think implementing it on bigints is better than converting it to float.
Because that should works on any values and you can get accurate result if you want.

Highest exponent of float32/64 is 2^127 / 2^1023.
When you convert BitInt to float32, it must be less than 2^128.
But when 128bits int is large enough for you, I think you would better to use fixed size int128 (https://github.com/status-im/nim-stint or https://github.com/nim-lang/Nim/blob/devel/compiler/int128.nim) as it is stack based and efficient than BigInt.
In case you need BigInt as 256bits int is too small but want to convert it to float64, the value must be smaller than 2^1024.

I think this PR is necessary if there are use cases like, you need a large int more than 256bits but final result is smaller than 2^1023, and you need to write it in specific format that have only float64 or pass it to libraries or API that takes only float64.
But I'm not sure if such an use case actually exists.

rotu · 2022-08-02T18:40:47Z

@demotomohiro
Yes, as soon as you put a value into a float, you are conceding exactness. If you know the result is needed with full precision and that every intermediate value is an integer, then yes, an integer is more appropriate than a floating point number. If integers work for your use-case, use them!

Optimization is premature here, but conversion to a float is a CONSTANT time operation --- it's faster than a bigint addition, and certainly faster than your Newton iteration.

@dlesnoff
Another precedent is Python. The int type in Python is arbitrary precision and it can be explicitly converted to float, which is a 64-bit floating point number. It is also implicitly converted when using non-integer-valued operations like /.

demotomohiro · 2022-08-11T14:18:18Z

@rotu
This approximated BigInt sqrt calculates square root by taking highest 62 bits, convert it to float and use math.sqrt.
It is constant time as it reads only highest 62 bits.
And it works for values larger than 2^1023.
https://gist.github.com/demotomohiro/704e6110919cf163de18ab36b8bbef4c

rotu · 2022-08-11T21:35:49Z

@rotu This approximated BigInt sqrt calculates square root by taking highest 62 bits, convert it to float and use math.sqrt. It is constant time as it reads only highest 62 bits. And it works for values larger than 2^1023. https://gist.github.com/demotomohiro/704e6110919cf163de18ab36b8bbef4c

Do you really want to write and check this sort of code for every non-arithmetic function you might use?

Furthermore, this implementation betrays its inaccuracy. You have slight accuracy issues when the square root requires looking at more digits than are in the significand. And returning the result as a bigint falsely implies the result is precise down to the unit.

The correct approach for an "exact" square root of a bigint x should be (IMO) a pair (r, d) such that r^2 <= x, x < (r+1)^2, and r^2+d =x. (this is by analogy with the correct approach to integer division: a quotient remainder pair)

At any rate, while I personally have written code that requires values greater than 2^64, I've never had to use values larger than 2^1023!

My PR is only geared toward the approach where you need an approximate answer - that's why I make no attempt to introduce a floating-point to bigint conversion.

demotomohiro · 2022-08-14T19:27:22Z

Furthermore, this implementation betrays its inaccuracy. You have slight accuracy issues when the square root requires looking at more digits than are in the significand. And returning the result as a bigint falsely implies the result is precise down to the unit.

IEEE_754 64bit float has 53bit significand bits.
When the input value is smaller than 2^(53*2), my implementation can lose precision as it convert the result to integer.
But when the input value is larger than that, it should be as accurate as converting BigInt to float and call math.sqrt.
It takes highest 62 bits that is larger than 64bit floats significand bits.

The correct approach for an "exact" square root of a bigint x should be (IMO) a pair (r, d) such that r^2 <= x, x < (r+1)^2, and r^2+d =x. (this is by analogy with the correct approach to integer division: a quotient remainder pair)

GMP has mpz_sqrt and mpz_sqrtrem.
mpz_sqrt returns the truncated integer part of the square root of the input.
mpz_sqrtrem returns the square root of the input and remainder.
https://gmplib.org/manual/Integer-Roots

At any rate, while I personally have written code that requires values greater than 2^64, I've never had to use values larger than 2^1023!

If you don't use values larger than 2^1023, fixed size int types like https://github.com/status-im/nim-stint might be better than bigints.
BigInt type uses 4 * 8 bytes in addition to space that hold the int value in order to it can have variable precision integer.
seq uses 3 * 8 bytes for len, p, and cap.
As BigInt has isNegative, that adds 1 bytes + 7 bytes of padding.
Fixed size int uses all bits for the int value like Nim's int type.
For example, when BigInt has value 2^511 - 1, it uses 512 + 4 * 8 * 8 = 768 bits.
It is 1.5 times larger than Stint[512].
When BigInt has value 2^1023 - 1, it uses 1024 + 256 = 1280 bits.
It is 1.25 times larger than Stint[1024].

rotu · 2022-08-14T21:07:22Z

But when the input value is larger than that, it should be as accurate as converting BigInt to float and call math.sqrt.

Yes. This is NOT necessarily correct down to the unit. Returning the value as a BigInt incorrectly suggests it is! Such errors are expected when you're knowingly working with floating point numbers, which are approximate by nature.

mpz_sqrtrem returns the square root of the input and remainder.

I love when I have the same idea as smarter people than myself!

If you don't use values larger than 2^1023, fixed size int types like https://github.com/status-im/nim-stint might be better than bigints.

True. But this is the 'bigints' repo. I'll defer to the judgement of package maintainers whether this functionality is appropriate here.

Implement toBiggestFloat

b38ff2b

Fixes Feature request: toBiggestInt nim-lang#34

konsumlamm reviewed Jul 27, 2022

View reviewed changes

don't use BiggestInt in implementation of toBiggestFloat

6c73172

Add tests for infinity

b08bbbe

support for other floating point types

7fe47ad

rotu changed the title ~~Implement toBiggestFloat~~ Implement floating point conversion Jul 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement floating point conversion #110

Implement floating point conversion #110

rotu commented Jul 26, 2022

konsumlamm left a comment

konsumlamm Jul 27, 2022

dlesnoff Jul 27, 2022

dlesnoff Jul 27, 2022 •

edited

Loading

rotu Jul 27, 2022

konsumlamm Jul 27, 2022

rotu Jul 27, 2022

konsumlamm Jul 28, 2022

rotu Jul 28, 2022

pietroppeter Jul 29, 2022

rotu commented Jul 27, 2022

konsumlamm commented Jul 27, 2022

rotu commented Jul 28, 2022

pietroppeter commented Jul 29, 2022

demotomohiro commented Jul 31, 2022

rotu commented Jul 31, 2022 •

edited

Loading

dlesnoff commented Aug 1, 2022 •

edited

Loading

demotomohiro commented Aug 2, 2022

rotu commented Aug 2, 2022

demotomohiro commented Aug 11, 2022

rotu commented Aug 11, 2022

demotomohiro commented Aug 14, 2022

rotu commented Aug 14, 2022

	func toBiggestFloat*(x: BigInt): BiggestFloat =
	func toFloat*[T: SomeFloat](x: BigInt): Option[T] =

Implement floating point conversion #110

Are you sure you want to change the base?

Implement floating point conversion #110

Conversation

rotu commented Jul 26, 2022

konsumlamm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dlesnoff Jul 27, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rotu commented Jul 27, 2022

konsumlamm commented Jul 27, 2022

rotu commented Jul 28, 2022

pietroppeter commented Jul 29, 2022

demotomohiro commented Jul 31, 2022

rotu commented Jul 31, 2022 • edited Loading

dlesnoff commented Aug 1, 2022 • edited Loading

demotomohiro commented Aug 2, 2022

rotu commented Aug 2, 2022

demotomohiro commented Aug 11, 2022

rotu commented Aug 11, 2022

demotomohiro commented Aug 14, 2022

rotu commented Aug 14, 2022

dlesnoff Jul 27, 2022 •

edited

Loading

rotu commented Jul 31, 2022 •

edited

Loading

dlesnoff commented Aug 1, 2022 •

edited

Loading