-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial sketch 160-bit (32+128) RawVal #570
Conversation
Would it make sense to make the RawVal into a variable length var ? ( i.e. pointer ). |
#[derive(Copy, Clone)] | ||
pub struct Symbol(RawVal); | ||
pub struct Symbol(u128); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to derive Eq
on Symbol
now, which will mean we can remove some hackery to get around not being able to use them in match
statements.
A few general thoughts (haven't finished looking through the details yet):
|
Overall I think this is good. There's a few things weaved in here that I like. Some of these things seem unrelated to the V160 approach.
In response to the discussion about what type to use for amounts. There's an angle with choosing i64 that we should acknowledge: greater interop with Stellar assets whose trust lines are already signed 64-bit. It would simplify implementations of auto import/export. I agree 128-bit is compelling because surely it will address the vast majority of cases. In regards to EVM interop, other chains seem to be doing fine with using values less than u256:
The thing that concerns me about picking 64-bit or 128-bit is that if we do have tokens/use cases that show up requiring a BigInt, or 256-bit, the interoperability story is unclear. E.g. If tokens use a common interface that are u63 amounts, and then a bunch of tokens use u128, how do we make it so that contracts written for one are interoperable with the other? |
Nice work! The contract size increases are unfortunate, but I'm also curious what the runtime performance differences are. That would be good to know before making a decision. For asset balances, I was leaning towards u128 over u63 to cover more use cases, and assumed that the only reason to go above that would be to support EVM assets, but as Leigh pointed out, other chains have been fine using less than 256 bits. I do think u63 can cover most uses cases (as we've seen on Stellar classic), so if the cost to use u128 is significant, u63 should be fine. BigInt sounds like the easiest thing to use with the most flexibility, but I'm worried about the performance issues, and edge cases with an unbounded number. We should also look at the performance differences between BigInt and u63 for the common case (balances less than u63). Also something interesting that I found - Elrond (another blockchain that uses wasm) uses BigUInt for it's balances, and it actually has an implementation in the protocol. @leighmcculloch You mention that "There's an angle with choosing i64 that we should acknowledge: greater interop with Stellar assets whose trust lines are already signed 64-bit. It would simplify implementations of auto import/export". Is this true? The same overflow, limit, and liability checks need to be done regardless of which type we use. |
Thanks for the feedback so far! Some replies:
No, or rather, it already is: what this change covers is basically "the size of a tagged pointer" where certain tags indicate it's not-a-pointer but rather some other type of scalar (a small symbol packed into a pointer-sized thing, a number, etc.) The "pointer" case of a
Mhm. I think .. possibly the most interesting question to answer here in terms of performance is where the size threshold is where it makes sense to do arithmetic on-the-host vs. in-the-guest. Like suppose we had a boxed
In terms of contract size: I think it would be less than another doubling, but probably a bit more cost. A lot of the cost here has to do with all the bouncing back and forth between the value stack, locals and linear memory, and that's closer to a fixed overhead than a variable one. In terms of implementation cost: probably only a few days, it's fairly easy to generalize what I did here to wider words.
Is that big enough to justify all the rest? Like if we don't find a clear win on any other axes, would you consider it important enough to raise symbol length that we should pay this cost?
We can keep some of this regardless. The u32-linear-memory-addresses and carriers-of-single-bytes in particular I think it's fine to pass as u64 and just range-restrict (out-of-bounds accesses at the u32 level are already an error). There's no major win size or performance-wise to using u32 vs. u64 values anywhere in wasm as far as I can tell.
We can just special-case all of these regardless (i.e. not derive it but impl it). Any non-Object RawVal wrapper/extract type should be Eq-able.
I think we should discuss this part in detail because if it's true it's a strong argument! I think so also but am not certain.
Ok. Interesting! This might work well if we do the menu-of-BigInt-sizes and make u128 the smallest such size, with guest-side support for arithmetic.
Yes, and .. as far as I can tell (it's a little hard to follow in their code) I think it's actually implemented by traversing another FFI in the host and winding up in Go code, delegating to https://pkg.go.dev/math/big -- see https://github.com/ElrondNetwork/wasm-vm/blob/master/arwen/elrondapi/bigIntOps.go where (I think) it bottoms out. We will definitely wind up going faster than that! |
No. This isn't important enough to increase overall contract size, and from what I can tell, increase the complexity of several layers.
Agreed, let's try and move to this regardless. And +1 we don't specifically need u32 vs u64, but the saving is on not having to run any instructions to convert RawVal containing u32 into a u32, since it can just be a u32. Maybe that's not enough of a saving to worry about.
Agreed, let's do this regardless of V160. |
How does guest perform 256-bit arithmetics without implementing/importing their own/3rd-party "bigint" library in their contracts? I thought one of the main benefits is to provide standard big number arithmetic (including fixed-point arith, up to the word size) natively in the host? |
Ref: @sisuresh #570 (comment)
Ref: @graydon #570 (comment) @sisuresh I think you're right, this doesn't make every operation seamlessly succeed without concerns of overflow or limits. If we implement auto import/export in the way that balances are 100% stored in trust lines where balances are i64, then having amounts transmitted in any type with more than 63-bits of space seems confusing from an interop point-of-view because no value greater than 63-bits would ever succeed at being stored. Do you think if the balance stored was limited to i64 / 63-bits, would there be a reason to pass around amounts in a larger bitsize? |
@jayz22 :
For u128 it's already supported by the rust language, so it would be fairly natural to surface an SDK method that unpacks an object to a rust u128 that users can use normal rust u128 arithmetic on. For u256 we could fairly easily include an "officially supported" guest-side u256 library in the SDK -- (Or, of course, we could do it all host side. I'll try to produce some numbers to make it clear which side is more efficient!) |
Some followup and (mostly) a conclusion: I'm going to abandon the experiment and close this bug. Why:
There are some followup issues coming out of this I'll file bugs for:
Thanks for the help navigating this topic, everyone! |
Another advantage of doing the u128 arithmetic on the guest is it fits more seamlessly with metering. All the cost are boiled down to wasm instructions which are accounted for automatically, no need for the "calibration + linear component" per arithmetic op on the host side.
I'm still a little unsure about u128 being the largest number. One not-so-unforeseeable future use case is Speedex, where the l2 norm of an asset's demand (u128) needs to be computed for the tatonnement process (see CAP-45). Similarly I could see other contracts needing to perform some numerical optimization process that depends on larger numbers. |
Yes, good point! Though for u128 the cost of a host-side operation will be extremely stable. I think we can experiment more on this front before concluding where it's best to situate the operations. Another thing to consider is that if we ever JIT the code, guest-side will get faster still and guest-to-host transitions will represent an even larger proportional overhead.
Yeah, I ... think this is something we can probably delegate to "contract ships its own code in the guest". I hear what you're saying but I think it's enough of a corner case not to burden the rest of the system with it. The number repertoire of |
That's probably true. I guess my main point is, going from BigInt (an arbitrarily precision big number) to u128 (the smallest option of big integer) seems like a large jump and I wanted to make sure the intermediate options were properly being considered. I also haven't done enough research to know if we won't need u256 for interop with EVM. So far the argument against is "a couple other chains did fine without it". I hope we don't, but curious what the ecosystem feedback on it will be. |
Well, I don't think a u256 host object ( Some secondary questions explored in this discussion were:
I think we have good reason to believe that u128 is worth adding to the host-object repertoire (SCObject / HostObject), and good evidence that guest-side arithmetic is sufficiently cheap for u128 that we can encourage its use that way. Also since u128 and i128 are literally built-in to the Rust language as normal types, there's really nothing at all we can do to discourage their use in the guest. It's just a matter of giving the SCObject / HostObject repertoire a place to store such values polymorphically. I think we have a sort of "awaiting further evidence" conclusion here when it comes to other fixed-size I think we have heard fairly clear silence / an absence of definite use-cases for specifically variable size Variable-sized
But it also has a fair number of disadvantages:
So .. I think I'm fairly comfortable removing it. Not thrilled of course, it's closing a door that has benefits, but I think given the options and weight of concerns, the balance seems to lean that way. |
This is a not-totally-sketch sketch of a fairly major (though not especially user-visible) change to
RawVal
. I'm opening it here for discussion, not as a definite proposal but because it's at the point where a decent number of tests are actually passing and it's worth discussing before either forging ahead or abandoning.Background
RawVal
is the "polymorphic" type used to carry values that can be one-of-many-types -- numbers, object handles, symbols, booleans, errors -- back and forth between the (native) host and the (WASM) guest. It's used for passing arguments to contracts, as well as storing values in our host-side polymorphic containers (maps and vecs). It's currently bit-packed into a 64-bit value, because .. WASM only really knows about values that are 32 or 64 bits. Everything else you have to tell it about fairly manually.Users don't directly see
RawVal
very often, they usually see a wrapper type around it that imbues it with a bit more knowledge of its content. ButRawVal
s are fairly ubiquitous under the covers of the system, and a lot of the WASM bytecode we generate is concerned with packing, unpacking, tag-testing and converting them.Summary of changes:
RawVal
is changed to a 160-bit value: a pair ofu32
+u128
u32
is a control word, which holds the type-tag and any other metadatau128
is the payload word, which holds the non-metadata content of the valueA pile of new code is added to call and return plumbing to "explode" and "implode"
RawVals
to and from multiple-argument sets -- sequences of u32 and u64 values that WASM supports -- and do returns via caller-allocated return-pointers, and similar messiness because there's no stable and widely-supported ABI in WASM that we can rely on for passing multi-word values between the host and guest yet. We're essentially hand-implementing an ABI.Some of this code is in the SDK, and there's a companion branch for it that's required for this to work.
The "wrapper" types around
RawVal
-- used when we know the subtype of value carried in aRawVal
, such as when we have au32
orObject
handle -- are instead turned into "subset" types that only carry the bits relevant to them, don't carry a wholeRawVal
at all anymore (though they can reconstruct one on demand if needed).Object(u64)
, Status isStatus(u64)
, Symbol isSymbol(u128)
, and so on.RawVal
ABIObject
andu32
args, notRawVal
. Only polymorphic functions likevec_get
which can return "anything" (because vector-contents are polymorphic) need to returnRawVal
.Rationale
Why consider this? A few reasons:
Symbol
from 10 characters to 21 characters. Users are chafing with 10 chars a bit.u64
andi64
have to be boxed asObject
s. All normal Rust scalar types fit unboxed into this experiment's largeRawVal
s, includingi128
andu128
.i128
as a ubiquitous fixed-point arithmetic type for asset values. Currently we are somewhat on the fence about how people are likely to be representing asset-amounts. It's possible thatu63
and/or boxedi64
values will be the norm (possibly using the Stellar-native scale factor of 7 digits -- it's quite a decent range) but it's also fairly likely that people coming from Ethereum or other ecosystems will be expecting bigger "standard" number types for asset-amounts, and will wind up usingBigInt
everywhere (or rolling their own on top ofBytes
) if we don't provide something standard.i128
, it might support elimination ofBigInt
from the object repertoire entirely, which is somewhat overkill for use-cases that would be ok withi128
, and a bit tricky to instrument safely / correctly for gas-metering.Impacts
Discussion
I do not know exactly what to make of this. Knowing it's possible is interesting, but it's also fairly costly and the benefits might not justify it. I am interested in hearing input from others, especially around the question of number types.
The way I see it we have 3-and-a-half options:
1.1. Possiby encourage using
BigInt
for asset-amounts, and possibly shiftBigInt
to a type with a menu of of fixed-size-but-big types, likeu128
,u256
,u512
andu1024
, such as supported by the crypto_bigint library. This has a more predictable cost model and range (eg. one can know that if you are working inu256
that your type will convert back to an Ethereum value too). Though it also lacks a few functions in our existingBigInt
such aspow
.RawVal
payload being 256 bits. This is a bit of an appealing target as well, in some ways, since it's both "ridiculously huge for fixed-point math", and "interoperates exactly with Ethereum values", and also "is able to store SHA256 outputs and Ed25519 points as unboxed values". But since those latter two tend to be opaque constants rather than number-like values with lots of temporaries created and forgotten through arithmetic expressions, the value of keeping them unboxed is not totally clear to me.One thing to recognize is that no matter what we put in as "standard" types (i.e. with pre-defined tags in the XDR, standard helper routines for printing and converting, standard operations as host functions), users can always "ship their own" in a contract. They can include fixed-point arithmetic or unboxed u256 values or whatever. It'll just be a bit janky -- slower than native-supported, non-interoperable, harder to debug, hurt their codesize, etc. -- but they can do it. So we don't need to cover all corner cases. We need to do something good-enough that most contracts have something familiar to reach for.
Personally I am .. somewhat disappointed in the cost and complexity of this experiment and am leaning towards options 1 or 1.1 above -- stay where we are and encourage either
u63
/i64
or a specific size ofObject
-handle basedBigInt
for normal asset values, with a menu ofBigInt
options for interop -- but I would love to hear others' opinions. Especially those working on "standard token contract interfaces" -- we probably want to smooth down those interfaces, make the type repertoire support their needs directly.