Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Max encodable BigInt is smaller than max encodable Number? #109

Closed
achingbrain opened this issue Feb 10, 2024 · 1 comment
Closed

Max encodable BigInt is smaller than max encodable Number? #109

achingbrain opened this issue Feb 10, 2024 · 1 comment

Comments

@achingbrain
Copy link
Contributor

I just came across this weird edge case:

it('encodes big numbers', () => {
  const obj = {
    number: Number.MAX_VALUE
  }
  const buf = encode(obj)
  const out = decode(buf)

  assert.deepEqual(out, obj)
})
// passes
it('encodes big bigints', () => {
  const obj = {
    bigint: BigInt(Number.MAX_VALUE)
  }
  const buf = encode(obj) // <-- throws: CBOR decode error: encountered BigInt larger than allowable range
  const out = decode(buf)

  assert.deepEqual(out, obj)
})

The error is thrown during encoding (not decoding as the message suggests), and it seems encodable Number ranges are bigger than BigInt ranges?

@rvagg
Copy link
Owner

rvagg commented Feb 12, 2024

Not quite, there is a space within which BigInt is useful, but not mandatory, but it's the gap between Number.MAX_SAFE_INTEGER and the maximum 64-bit unsigned integer, which we can't (safely) represent in JavaScript. We'll still accept any Number you give, but it'll max out at the same place as BigInt. When we get above the 32-bit range, we switch to converting whatever you give it to a BigInt and checking whether it's bigger than 64-bit before encoding:

const buint = BigInt(uint)

Interesting theme though! Let me prattle on a bit about this, I hope it's informative:

In CBOR we use the unsigned integer major (0) and negative (unsigned) integer major (1) to represent integers, this only gives us 64 bits, although the negative gives us one extra because it's essentially a "uint minus 1". So we get a maximum inclusive range of [BigInt('-18446744073709551616'), BigInt('18446744073709551615')] which are represented using all the bits in the fields: 3bffffffffffffffff (negint) 1bffffffffffffffff (uint).

So we end up with two problems for JavaScript:

  1. The safe range: because numbers are all just IEEE754 floats, the safe range is just the 53 bits of mantissa where you're guaranteed to be representing the whole number portion (beyond that your bets are off and you're into crazy land where you have to approximate using the other 11 bit exponent portion, very unsafe but it lets you ~represent big numbers).
  2. BigInt has become normalised and easy(ish) to use and you can go bigger than 64 bits, so JavaScript effectively doesn't care about this whole arbitrary sized integer problem that most languages do; sort of.

So cborg will do the following iirc:

  1. Accept any Number and just deal with it, so if you're above MAX_SAFE_INTEGER then that on you, we'll encode what you give and you'll get back the approximation of it in a round-trip.
  2. Accept any BigInt, regardless of how small, but max out at the bounds of what we can fit in the 64 bits we have available - the error you're getting.
  3. On decode we split at MAX_SAFE_INTEGER because we've lost the information about what form you gave it to us in. So if it's a number below that then you'll get a Number, if it's above that then you'll get a BigInt. Which is really annoying tbh because you end up getting type surprises and Number and BigInt don't really play nicely if you are making assumptions. This has been a challenge but the three alternatives seem to be:
  • Always give a Number but end up giving the wrong number much of the time above MAX_SAFE_INTEGER
  • Always give a BigInt but be user-hostile and force everyone to convert, it would also be very breaking over what we've always done with DAG-CBOR.
  • Error if it decodes a number above MAX_SAFE_INTEGER.

Number.MAX_VALUE is a little bit crazy, represented as an integer in pure bytes you'd be at ~1024 bits (Number.MAX_VALUE.toString(2).length). So you're into territory that we need to do something more novel.

On the topic of bigger integers, it is theoretically possible to encode big ints with CBOR, there's a standard for it and a registered tag, which we could have (and still could, maybe) opted to use for DAG-CBOR. Then we'd get all the BigInts but we'd still have to make a decision about the representation decision both in and out of JavaScript for the <=64 bit ones (do you still squish in 64 bit ints to the uint majors, or just 53 bit? what do you do with that range on decode?).

To date, the answer has always been - use bytes to represent these and have a standard for it (essentially an ADL). And this has grown up in Filecoin and it's the one that's generally recommended for use. Unfortunately it's been standardised around Go's comfort zone, but fortunately it's not too non-standard since it's really just: the byte representation of the integer in big endian, with some special case handling for negatives.

Go has a big.Int#Bytes() and big.Int#SetBytes() which makes this really easy. https://github.com/filecoin-project/go-state-types/blob/f4366fdb96de135e00911166c1b461f0c2bdd52e/big/int.go#L278-L338

Thankfully for Rust, it also has easy handling of big endian representations to and from bytes to big ints so it can deal with the same thing: https://github.com/filecoin-project/ref-fvm/blob/fce783ab982e914abdecc82a4b9ea07fff98d7d5/shared/src/bigint/bigint_ser.rs#L24-L70

In JS, I never had reason to write one of these but Glif has to deal with them, here's the encoder: https://github.com/glifio/modules/blob/1a1ba285434fd3d37592f1955a9225a53287ecc7/packages/filecoin-number/src/FilecoinNumber.ts#L183-L188, but I can't find a decoder, which is a bit surprising, maybe it's elsewhere. They also use bignumber.js and it would be nice to just be able to use BigInt for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants