Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial sketch 160-bit (32+128) RawVal #570

Closed
wants to merge 1 commit into from
Closed

Initial sketch 160-bit (32+128) RawVal #570

wants to merge 1 commit into from

Conversation

graydon
Copy link
Contributor

@graydon graydon commented Nov 11, 2022

This is a not-totally-sketch sketch of a fairly major (though not especially user-visible) change to RawVal. I'm opening it here for discussion, not as a definite proposal but because it's at the point where a decent number of tests are actually passing and it's worth discussing before either forging ahead or abandoning.

Background

RawVal is the "polymorphic" type used to carry values that can be one-of-many-types -- numbers, object handles, symbols, booleans, errors -- back and forth between the (native) host and the (WASM) guest. It's used for passing arguments to contracts, as well as storing values in our host-side polymorphic containers (maps and vecs). It's currently bit-packed into a 64-bit value, because .. WASM only really knows about values that are 32 or 64 bits. Everything else you have to tell it about fairly manually.

Users don't directly see RawVal very often, they usually see a wrapper type around it that imbues it with a bit more knowledge of its content. But RawVals are fairly ubiquitous under the covers of the system, and a lot of the WASM bytecode we generate is concerned with packing, unpacking, tag-testing and converting them.

Summary of changes:

  • RawVal is changed to a 160-bit value: a pair of u32 + u128

    • The u32 is a control word, which holds the type-tag and any other metadata
    • The u128 is the payload word, which holds the non-metadata content of the value
  • A pile of new code is added to call and return plumbing to "explode" and "implode" RawVals to and from multiple-argument sets -- sequences of u32 and u64 values that WASM supports -- and do returns via caller-allocated return-pointers, and similar messiness because there's no stable and widely-supported ABI in WASM that we can rely on for passing multi-word values between the host and guest yet. We're essentially hand-implementing an ABI.

  • Some of this code is in the SDK, and there's a companion branch for it that's required for this to work.

  • The "wrapper" types around RawVal -- used when we know the subtype of value carried in a RawVal, such as when we have a u32 or Object handle -- are instead turned into "subset" types that only carry the bits relevant to them, don't carry a whole RawVal at all anymore (though they can reconstruct one on demand if needed).

    • Object for example is Object(u64), Status is Status(u64), Symbol is Symbol(u128), and so on.
    • This allows passing and returning such "subset" types without engaging the big-expensive-RawVal ABI
    • Actually quite a lot of host functions take Object and u32 args, not RawVal. Only polymorphic functions like vec_get which can return "anything" (because vector-contents are polymorphic) need to return RawVal.

Rationale

Why consider this? A few reasons:

  1. It lets us raise the size of Symbol from 10 characters to 21 characters. Users are chafing with 10 chars a bit.
  2. It eliminates the "weird" number situation in the existing encoding, where "u63" fits (but what's that?) and u64 and i64 have to be boxed as Objects. All normal Rust scalar types fit unboxed into this experiment's large RawVals, including i128 and u128.
  3. It supports standardizing on an unboxed i128 as a ubiquitous fixed-point arithmetic type for asset values. Currently we are somewhat on the fence about how people are likely to be representing asset-amounts. It's possible that u63 and/or boxed i64 values will be the norm (possibly using the Stellar-native scale factor of 7 digits -- it's quite a decent range) but it's also fairly likely that people coming from Ethereum or other ecosystems will be expecting bigger "standard" number types for asset-amounts, and will wind up using BigInt everywhere (or rolling their own on top of Bytes) if we don't provide something standard.
  4. Given the luxuriously-sized i128, it might support elimination of BigInt from the object repertoire entirely, which is somewhat overkill for use-cases that would be ok with i128, and a bit tricky to instrument safely / correctly for gas-metering.

Impacts

  • Broadly speaking, it seems to work. There remain some bugs.
  • Codesize goes up, but not horribly. Worst cases double, average cases that are heavy on host functions with mostly "subset"-typed arguments are more like 10-40% overhead.
before after contract
6522 8840 soroban_auth_advanced_contract.wasm
996 1534 soroban_auth_contract.wasm
456 412 soroban_cross_contract_a_contract.wasm
903 665 soroban_cross_contract_b_contract.wasm
1015 1558 soroban_custom_types_contract.wasm
963 638 soroban_deployer_contract.wasm
424 539 soroban_deployer_test_contract.wasm
509 616 soroban_errors_contract.wasm
566 730 soroban_events_contract.wasm
409 461 soroban_hello_world_contract.wasm
425 510 soroban_increment_contract.wasm
7478 10614 soroban_liquidity_pool_contract.wasm
21638 31565 soroban_liquidity_pool_router_contract.wasm
283 262 soroban_logging_contract.wasm
11897 17271 soroban_single_offer_contract.wasm
11497 16377 soroban_single_offer_contract_xfer_from.wasm
21451 31040 soroban_single_offer_router_contract.wasm
4655 7872 soroban_timelock_contract.wasm
31481 31481 soroban_token_contract.wasm
11405 17020 soroban_wallet_contract.wasm

Discussion

I do not know exactly what to make of this. Knowing it's possible is interesting, but it's also fairly costly and the benefits might not justify it. I am interested in hearing input from others, especially around the question of number types.

The way I see it we have 3-and-a-half options:

  1. Stick with current, encourage use of u63 for asset-amounts (which are almost always positive), assume scale=7 is good enough. It seems to have been basically ok for the classic Stellar protocol, though we occasionally need support routines that minimize intermediate rounding, like a 3-arg A*B/C operation conducted in 128-bit precision. We can absolutely build that sort of thing into the SDK and/or host functions though.
    1.1. Possiby encourage using BigInt for asset-amounts, and possibly shift BigInt to a type with a menu of of fixed-size-but-big types, like u128, u256, u512 and u1024, such as supported by the crypto_bigint library. This has a more predictable cost model and range (eg. one can know that if you are working in u256 that your type will convert back to an Ethereum value too). Though it also lacks a few functions in our existing BigInt such as pow.
  2. Move to this experiment, wire in (say) scale=18 which is the norm (and in the past suggested mandatory-value) in ERC-20, and certainly plenty for any real use-cases. We get more "breathing room" in our value repertoire, and a slightly simpler mental model for users, at the expense of code size (and thus performance).
  3. Move to an even-larger version of this experiment, say with RawVal payload being 256 bits. This is a bit of an appealing target as well, in some ways, since it's both "ridiculously huge for fixed-point math", and "interoperates exactly with Ethereum values", and also "is able to store SHA256 outputs and Ed25519 points as unboxed values". But since those latter two tend to be opaque constants rather than number-like values with lots of temporaries created and forgotten through arithmetic expressions, the value of keeping them unboxed is not totally clear to me.

One thing to recognize is that no matter what we put in as "standard" types (i.e. with pre-defined tags in the XDR, standard helper routines for printing and converting, standard operations as host functions), users can always "ship their own" in a contract. They can include fixed-point arithmetic or unboxed u256 values or whatever. It'll just be a bit janky -- slower than native-supported, non-interoperable, harder to debug, hurt their codesize, etc. -- but they can do it. So we don't need to cover all corner cases. We need to do something good-enough that most contracts have something familiar to reach for.

Personally I am .. somewhat disappointed in the cost and complexity of this experiment and am leaning towards options 1 or 1.1 above -- stay where we are and encourage either u63/i64 or a specific size of Object-handle based BigInt for normal asset values, with a menu of BigInt options for interop -- but I would love to hear others' opinions. Especially those working on "standard token contract interfaces" -- we probably want to smooth down those interfaces, make the type repertoire support their needs directly.

@tsachiherman
Copy link

Would it make sense to make the RawVal into a variable length var ? ( i.e. pointer ).
The reason I'm suggesting it is because it could reduce the code size dramatically and provide the scalability we need to support BigInt all in one. Note that the above isn't super great from memory alignment on 64 bit machines.
Note that one "trick" if we were to use pointer based approach is to "predefine" pseudo-pointers to the common values ( i.e. -1, 0, 1 ), and use these "baked" values in the control word.

Comment on lines 57 to +58
#[derive(Copy, Clone)]
pub struct Symbol(RawVal);
pub struct Symbol(u128);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to derive Eq on Symbol now, which will mean we can remove some hackery to get around not being able to use them in match statements.

@jayz22
Copy link
Contributor

jayz22 commented Nov 11, 2022

A few general thoughts (haven't finished looking through the details yet):

  • It may be worthwhile to continue experimenting to a point where we can do a full calibration on majority of the host (e.g. what we are doing in this PR, which gives us a better idea of the cost benefits in terms of cpu and memory. Since you mentioned previously 1. the contract size itself may not be a perfect proxy for performance (and especially since the sizes are pretty close). 2. there may be room of improvement on the sdk side related to UDT, so the size numbers here may not be final. I can help with the calibration part when it's ready.
  • I like the fact we can do native 128 bit bigint with this change. But we need an actual example (something that's arithmetic heavy) that uses u128 vs bigint to tell what the differences are. I can also help with this part once it's ready to test.
  • What would be the additional cost (both in terms of dev work needed and the resulting contract size) of going to 256 bits? Since you are already passing the minimal sized value through host vm interface (the "polymorphic" optimization you mentioned), can we assume it won't be a significant jump in contract size going from 128 to 256 bits word size? The interop with Ethereum's values does sound appearing.

@leighmcculloch
Copy link
Member

Overall I think this is good. There's a few things weaved in here that I like. Some of these things seem unrelated to the V160 approach.

  • The biggest being that Symbols are 21 characters. This seems like a little thing but it really does make a difference to the developer experience.

  • Using primitive values like u32 in the host interface instead of RawVal's containing u32s seems like a good move. We don't need them wrapped. The wrapping only creates additional work on the guest side with little benefit. And it's nice to see in this PR that many of our optimizations for things like comparing a RawVal with a 0 value are no longer needed.

  • Being able to have u64 and i64 with the same costs is good. As far as I can tell 64-bit values would now be equally cheap as 32bit values to use. It worries me that today using those types is more expensive, and using a negative i64 is also more expensive. While the optimization is good, I worry people will chase optimizations in ways that make contracts hard to make sense.

  • Changing types like Symbol so they embed the raw value rather than a RawVal is also helpful. It makes it possible to derive Eq on the types, which means they can be used in matching.

  • Use of 128-bit types that are already available in Rust also feels like we're leaning into that native Rust experience.

In response to the discussion about what type to use for amounts.

There's an angle with choosing i64 that we should acknowledge: greater interop with Stellar assets whose trust lines are already signed 64-bit. It would simplify implementations of auto import/export.

I agree 128-bit is compelling because surely it will address the vast majority of cases.

In regards to EVM interop, other chains seem to be doing fine with using values less than u256:

The thing that concerns me about picking 64-bit or 128-bit is that if we do have tokens/use cases that show up requiring a BigInt, or 256-bit, the interoperability story is unclear. E.g. If tokens use a common interface that are u63 amounts, and then a bunch of tokens use u128, how do we make it so that contracts written for one are interoperable with the other? BigInt seems like the easy solve for this. I imagine there are some other possible solutions, such as allowing up-casts from 64-bit to 128-bit values, but that doesn't fix the problem for return values, and it makes things unpredictable.

@sisuresh
Copy link
Contributor

Nice work! The contract size increases are unfortunate, but I'm also curious what the runtime performance differences are. That would be good to know before making a decision.

For asset balances, I was leaning towards u128 over u63 to cover more use cases, and assumed that the only reason to go above that would be to support EVM assets, but as Leigh pointed out, other chains have been fine using less than 256 bits. I do think u63 can cover most uses cases (as we've seen on Stellar classic), so if the cost to use u128 is significant, u63 should be fine. BigInt sounds like the easiest thing to use with the most flexibility, but I'm worried about the performance issues, and edge cases with an unbounded number. We should also look at the performance differences between BigInt and u63 for the common case (balances less than u63). Also something interesting that I found - Elrond (another blockchain that uses wasm) uses BigUInt for it's balances, and it actually has an implementation in the protocol.

@leighmcculloch You mention that "There's an angle with choosing i64 that we should acknowledge: greater interop with Stellar assets whose trust lines are already signed 64-bit. It would simplify implementations of auto import/export". Is this true? The same overflow, limit, and liability checks need to be done regardless of which type we use.

@graydon
Copy link
Contributor Author

graydon commented Nov 14, 2022

Thanks for the feedback so far! Some replies:

@tsachiherman:

Would it make sense to make the RawVal into a variable length var ? ( i.e. pointer ).

No, or rather, it already is: what this change covers is basically "the size of a tagged pointer" where certain tags indicate it's not-a-pointer but rather some other type of scalar (a small symbol packed into a pointer-sized thing, a number, etc.) The "pointer" case of a RawVal also isn't exactly a pointer, it's an Object(handle), and that handle "points" into the index-space of objects allocated on the host. It's not really meaningful to talk about pointers as a conceptual class in this context, anything pointer-like is either an index into something held host-side or a linear-memory address in the guest. We can't really make all RawVals into linear-memory addresses in the guest since the host has no ability to allocate in that space, it's entirely under guest control.

@jayz22:

I like the fact we can do native 128 bit bigint with this change. But we need an actual example (something that's arithmetic heavy) that uses u128 vs bigint to tell what the differences are

Mhm. I think .. possibly the most interesting question to answer here in terms of performance is where the size threshold is where it makes sense to do arithmetic on-the-host vs. in-the-guest. Like suppose we had a boxed u128 object, held on the host. We could define a host function u128_add(object,object)->object or we could define a host function u128_to_guest(object,ptr:u32) which writes the u128 to a guest-chosen linear memory address on the guest shadow stack, and then the guest does all the arithmetic it wants and then calls u128_from_guest(ptr:u32)->object to convert back to an object. The first interface -- host-side arithmetic -- makes more sense the larger the number is: for a 4096-bit BigInt it's probably worth doing on the host. But for 128 or 256, I'm less certain; guest-side arithmetic is probably cheaper overall (and doesn't produce a mess of temporary objects). We don't (for example) provide a u64_add(object,object)->object host function, for this reason: we provide obj_to_u64(object)->u64 and let the user do their u64 arithmetic on the guest. I think exploring these costs in detail might actually go a long way to answering what to do in this bug, and I am thinking I'll explore it today on a different branch and report back.

What would be the additional cost (both in terms of dev work needed and the resulting contract size) of going to 256 bits?

In terms of contract size: I think it would be less than another doubling, but probably a bit more cost. A lot of the cost here has to do with all the bouncing back and forth between the value stack, locals and linear memory, and that's closer to a fixed overhead than a variable one.

In terms of implementation cost: probably only a few days, it's fairly easy to generalize what I did here to wider words.

@leighmcculloch :

The biggest being that Symbols are 21 characters

Is that big enough to justify all the rest? Like if we don't find a clear win on any other axes, would you consider it important enough to raise symbol length that we should pay this cost?

Using primitive values like u32 in the host interface instead of RawVal's containing u32s seems like a good move. We don't need them wrapped. The wrapping only creates additional work on the guest side with little benefit. And it's nice to see in this PR that many of our optimizations for things like comparing a RawVal with a 0 value are no longer needed

We can keep some of this regardless. The u32-linear-memory-addresses and carriers-of-single-bytes in particular I think it's fine to pass as u64 and just range-restrict (out-of-bounds accesses at the u32 level are already an error). There's no major win size or performance-wise to using u32 vs. u64 values anywhere in wasm as far as I can tell.

Changing types like Symbol so they embed the raw value rather than a RawVal is also helpful

We can just special-case all of these regardless (i.e. not derive it but impl it). Any non-Object RawVal wrapper/extract type should be Eq-able.

There's an angle with choosing i64 that we should acknowledge: greater interop with Stellar assets whose trust lines are already signed 64-bit. It would simplify implementations of auto import/export

I think we should discuss this part in detail because if it's true it's a strong argument! I think so also but am not certain.

@sisuresh :

For asset balances, I was leaning towards u128 over u63 to cover more use cases

Ok. Interesting! This might work well if we do the menu-of-BigInt-sizes and make u128 the smallest such size, with guest-side support for arithmetic.

Also something interesting that I found - Elrond (another blockchain that uses wasm) uses BigUInt for it's balances

Yes, and .. as far as I can tell (it's a little hard to follow in their code) I think it's actually implemented by traversing another FFI in the host and winding up in Go code, delegating to https://pkg.go.dev/math/big -- see https://github.com/ElrondNetwork/wasm-vm/blob/master/arwen/elrondapi/bigIntOps.go where (I think) it bottoms out. We will definitely wind up going faster than that!

@leighmcculloch
Copy link
Member

leighmcculloch commented Nov 14, 2022

The biggest being that Symbols are 21 characters

Is that big enough to justify all the rest? Like if we don't find a clear win on any other axes, would you consider it important enough to raise symbol length that we should pay this cost?

No. This isn't important enough to increase overall contract size, and from what I can tell, increase the complexity of several layers.

Using primitive values like u32 in the host interface instead of RawVal's containing u32s seems like a good move. We don't need them wrapped. The wrapping only creates additional work on the guest side with little benefit. And it's nice to see in this PR that many of our optimizations for things like comparing a RawVal with a 0 value are no longer needed

We can keep some of this regardless. The u32-linear-memory-addresses and carriers-of-single-bytes in particular I think it's fine to pass as u64 and just range-restrict (out-of-bounds accesses at the u32 level are already an error). There's no major win size or performance-wise to using u32 vs. u64 values anywhere in wasm as far as I can tell.

Agreed, let's try and move to this regardless. And +1 we don't specifically need u32 vs u64, but the saving is on not having to run any instructions to convert RawVal containing u32 into a u32, since it can just be a u32. Maybe that's not enough of a saving to worry about.

Changing types like Symbol so they embed the raw value rather than a RawVal is also helpful

We can just special-case all of these regardless (i.e. not derive it but impl it). Any non-Object RawVal wrapper/extract type should be Eq-able.

Agreed, let's do this regardless of V160.

@jayz22
Copy link
Contributor

jayz22 commented Nov 14, 2022

But for 128 or 256, I'm less certain; guest-side arithmetic is probably cheaper overall

How does guest perform 256-bit arithmetics without implementing/importing their own/3rd-party "bigint" library in their contracts? I thought one of the main benefits is to provide standard big number arithmetic (including fixed-point arith, up to the word size) natively in the host?

@leighmcculloch
Copy link
Member

@leighmcculloch You mention that "There's an angle with choosing i64 that we should acknowledge: greater interop with Stellar assets whose trust lines are already signed 64-bit. It would simplify implementations of auto import/export". Is this true? The same overflow, limit, and liability checks need to be done regardless of which type we use.

Ref: @sisuresh #570 (comment)

There's an angle with choosing i64 that we should acknowledge: greater interop with Stellar assets whose trust lines are already signed 64-bit. It would simplify implementations of auto import/export

I think we should discuss this part in detail because if it's true it's a strong argument! I think so also but am not certain.

Ref: @graydon #570 (comment)

@sisuresh I think you're right, this doesn't make every operation seamlessly succeed without concerns of overflow or limits. If we implement auto import/export in the way that balances are 100% stored in trust lines where balances are i64, then having amounts transmitted in any type with more than 63-bits of space seems confusing from an interop point-of-view because no value greater than 63-bits would ever succeed at being stored.

Do you think if the balance stored was limited to i64 / 63-bits, would there be a reason to pass around amounts in a larger bitsize?

@graydon
Copy link
Contributor Author

graydon commented Nov 14, 2022

@jayz22 :

How does guest perform 256-bit arithmetics without implementing/importing their own/3rd-party "bigint" library in their contracts? I thought one of the main benefits is to provide standard big number arithmetic (including fixed-point arith, up to the word size) natively in the host?

For u128 it's already supported by the rust language, so it would be fairly natural to surface an SDK method that unpacks an object to a rust u128 that users can use normal rust u128 arithmetic on.

For u256 we could fairly easily include an "officially supported" guest-side u256 library in the SDK -- crypto_bigint or such. Same with medium-sized fixed-point: we could commit to fpdec or fixed if we wanted to have a "standard" type with a mix of host-side storage and guest-side arithmetic.

(Or, of course, we could do it all host side. I'll try to produce some numbers to make it clear which side is more efficient!)

@graydon
Copy link
Contributor Author

graydon commented Nov 15, 2022

Some followup and (mostly) a conclusion: I'm going to abandon the experiment and close this bug.

Why:

  1. We can get "larger symbols" by not limiting ourselves to the use of the Symbol type everywhere we need the name of something. We can just allow Bytes / some future putative Text type for names-of-things and treat Symbol as an optimized case. So the symbol argument is moot.
  2. We can fairly easily add u128 and i128 as boxed types (the same way u64 and i64 are treated) with functions to inject or extract the high and low 64-bit words composing them. This lets us do u128 arithmetic on the guest (if we want! this is still a bit TBD) and store u128 values the same way we would be doing it with "big RawVal" without incurring the size hit on all RawVals.
  3. The SDK can/will gloss over the difference between "weird" representations of number types already -- the contract signature just sees normal rust types like u64 or u128.
  4. In consultation with others here it seems like we are all comfortable removing BigInt from the host entirely and making i128/u128 our largest number type at least for the foreseeable future. We might add-back a BigInt type in some unforeseen future -- either arbitrary-size or fixed-but-very-large size -- but we can't think of any real use-cases for it at the moment, and we're more likely to be adding host functions that operate on unique elliptic-curve coordinates represented by Bytes buffers rather than BigInts anyways.

There are some followup issues coming out of this I'll file bugs for:

  • Add i128 and u128 to the XDR, environment interface, host and SDK
  • Remove BigInt from the XDR, environment interface, host and SDK
  • Relax Symbol restriction on function names and possibly other types in the host, environment and SDK (eg. UDT enum constructors, struct fields), allow Bytes (or Text) as well.
  • Decide whether to do any larger-number arithmetic on the host, or just push it all to the guest.
  • Define either in the SDK (guest-side) or the environment (host-side) utility functions to do fixed-point arithmetic at a given precision, rounding modes, and fused / non-overflowing composite operations like a = b * c / d
  • Decide on the number-type that the standard token contract interface uses: either i64, u64, i128, u128, plus or minus a mechanism for denoting scale factor / "decimals".

Thanks for the help navigating this topic, everyone!

@graydon graydon closed this Nov 15, 2022
@jayz22
Copy link
Contributor

jayz22 commented Nov 15, 2022

This lets us do u128 arithmetic on the guest (if we want! this is still a bit TBD) and store u128 values the same way we would be doing it with "big RawVal"

Another advantage of doing the u128 arithmetic on the guest is it fits more seamlessly with metering. All the cost are boiled down to wasm instructions which are accounted for automatically, no need for the "calibration + linear component" per arithmetic op on the host side.

removing BigInt from the host entirely and making i128/u128 our largest number type at least for the foreseeable future

I'm still a little unsure about u128 being the largest number. One not-so-unforeseeable future use case is Speedex, where the l2 norm of an asset's demand (u128) needs to be computed for the tatonnement process (see CAP-45). Similarly I could see other contracts needing to perform some numerical optimization process that depends on larger numbers.
I guess they can implement their own / import 3rd party bignum libraries in their contract. But would it make more sense for us to provide a menu-of-options containing 128, 256 and 512 bit large numbers? (If we are going to provide 128 anyway).

@graydon
Copy link
Contributor Author

graydon commented Nov 15, 2022

Another advantage of doing the u128 arithmetic on the guest is it fits more seamlessly with metering. All the cost are boiled down to wasm instructions which are accounted for automatically, no need for the "calibration + linear component" per arithmetic op on the host side.

Yes, good point! Though for u128 the cost of a host-side operation will be extremely stable. I think we can experiment more on this front before concluding where it's best to situate the operations. Another thing to consider is that if we ever JIT the code, guest-side will get faster still and guest-to-host transitions will represent an even larger proportional overhead.

I'm still a little unsure about u128 being the largest number. One not-so-unforeseeable future use case is Speedex, where the l2 norm of an asset's demand (u128) needs to be computed for the tatonnement process (see CAP-45). Similarly I could see other contracts needing to perform some numerical optimization process that depends on larger numbers. I guess they can implement their own / import 3rd party bignum libraries in their contract. But would it make more sense for us to provide a menu-of-options containing 128, 256 and 512 bit large numbers? (If we are going to provide 128 anyway).

Yeah, I ... think this is something we can probably delegate to "contract ships its own code in the guest". I hear what you're saying but I think it's enough of a corner case not to burden the rest of the system with it. The number repertoire of RawVal / SCVal has more to do with function inputs and outputs -- values we expect to see stored and exchanged, not just as temporaries. Though of course if there's a sufficiently-ubiquitous internal / temporary type that occurs in lots of contracts, we should probably standardize-and-ship (either as a RawVal / SCVal type or a library-supported temporary) that too. But there will always be a few contracts that want more elaborate local types than are built-in to the protocol. Even for exchange of such types, there's always Bytes as an escape hatch.

@jayz22
Copy link
Contributor

jayz22 commented Nov 15, 2022

I hear what you're saying but I think it's enough of a corner case not to burden the rest of the system with it.

That's probably true. I guess my main point is, going from BigInt (an arbitrarily precision big number) to u128 (the smallest option of big integer) seems like a large jump and I wanted to make sure the intermediate options were properly being considered.

I also haven't done enough research to know if we won't need u256 for interop with EVM. So far the argument against is "a couple other chains did fine without it". I hope we don't, but curious what the ecosystem feedback on it will be.

@graydon
Copy link
Contributor Author

graydon commented Nov 16, 2022

I hear what you're saying but I think it's enough of a corner case not to burden the rest of the system with it.

That's probably true. I guess my main point is, going from BigInt (an arbitrarily precision big number) to u128 (the smallest option of big integer) seems like a large jump and I wanted to make sure the intermediate options were properly being considered.

Well, I don't think a u256 host object (SCObject) type (referenced by a 64-bit RawVal) is off the table; the primary purpose of this PR was to explore making wider RawVal and I think that question is answered, and either of a 128-bit or a 256-bit RawVal is off the table. But RawVal is not SCObject.

Some secondary questions explored in this discussion were:

  1. What the final set of numeric host objects (SCObjects / HostObjects) should be
  2. For any of those types that are of fixed size, whether to encourage guest-side or host-side arithmetic

I think we have good reason to believe that u128 is worth adding to the host-object repertoire (SCObject / HostObject), and good evidence that guest-side arithmetic is sufficiently cheap for u128 that we can encourage its use that way. Also since u128 and i128 are literally built-in to the Rust language as normal types, there's really nothing at all we can do to discourage their use in the guest. It's just a matter of giving the SCObject / HostObject repertoire a place to store such values polymorphically.

I think we have a sort of "awaiting further evidence" conclusion here when it comes to other fixed-size BigInt types larger than u128. It might be that fixed-size u256 or u512 turn out to be useful. We can add them as additional SCObjects; if so they won't be based on the variable-sized BigInt code. They'll be based on crypto_bigint or perhaps something even smaller like ParityTech's uint crate. One nice thing about such types -- fixed size, not variable -- is that they're easy to start using in guest code, and then once we see their utility, move into the host in a subsequent version of the host interface.

I think we have heard fairly clear silence / an absence of definite use-cases for specifically variable size BigInt in the host. We can keep it, of course. But nothing seems to be demanding it.

Variable-sized BigInt does have a few natural advantages:

  • It subsumes all other cases
  • It's fairly future-proof

But it also has a fair number of disadvantages:

  • It's the least-flexible option (arithmetic has to happen on the host, can't on the guest)
  • It's the least-interoperable (no other systems seem likely to produce or consume it, possibly excepting Elrond)
  • It's probably the slowest (even host-side it's going to involve heap allocations and variable-length loops)
  • It carries the biggest implementation challenges (unstable implementation semantics, difficult metering)

So .. I think I'm fairly comfortable removing it. Not thrilled of course, it's closing a door that has benefits, but I think given the options and weight of concerns, the balance seems to lean that way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants