-
-
Notifications
You must be signed in to change notification settings - Fork 670
Avoid masking/sign extension between function calls #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
But, if you call this from JS now, with two valid integers in 16-bit range, the result seen on the JS-side wouldn't be in 16-bit range if it overflows, wouldn't it? Btw., in the sources this is where |
Mmh, that's a good point. Weird how this is completely ignored by LLVM at the moment. I think exports / imports probably require masking their arguments / return values. Everything inside of WebAssembly however doesn't need any masking (except for certain operations). |
Another thing that comes to my mind here is that this sequence i32.const 16
i32.shl
i32.const 16
i32.shr_s is going to replaced with a sign extension operation in the future. That's just the way it needs to be done at this point in time. The body would then look like get_local $p0
get_local $p1
i32.add
i32.extend16_s
return if I understood this correctly. |
I'm not sure if it's possible, but I think you'd need to be able to determine the ABI of a function. So if you know a function is being exported to JS, you want to do the sign / zero extension (vice versa for the imports) and for everything internal you want to skip it. Interestingly our discussion here caused this bug to be found in Rust. So for Rust the external functions are all marked with the C ABI and will do the sign / zero extension, while all internal functions (so the Rust ABI) won't. I'm not sure if you could do something similar. An automatic analysis may not be possible, as function pointers may be ambiguous otherwise. For function pointers you could however always assume it's the "assemblyscript" abi and generate an assemblyscript abi wrapper for all the c abi functions (i.e. the exported ones). You may need a C ABI anyway in some shape or form, as it for example also defines that all structs are supposed to be passed as pointers (at least atm), which I think is true for AssemblyScript atm as well, but could possibly change in the future. So atm this: function add16_internal(x: i16, y: i16): i16 {
return x + y;
}
export function add16_cabi(x: i16, y: i16): i16 {
return add16_internal(x, y);
} compiles to: (func $main/add16_internal (type $t0) (param $p0 i32) (param $p1 i32) (result i32)
get_local $p0
get_local $p1
i32.add
i32.const 16
i32.shl
i32.const 16
i32.shr_s
return)
(func $main/add16_cabi (export "add16_cabi") (type $t0) (param $p0 i32) (param $p1 i32) (result i32)
get_local $p0
get_local $p1
call $main/add16_internal
return) and long term you probably want: (func $main/add16_internal (type $t0) (param $p0 i32) (param $p1 i32) (result i32)
get_local $p0
get_local $p1
i32.add
return)
(func $main/add16_cabi (export "add16_cabi") (type $t0) (param $p0 i32) (param $p1 i32) (result i32)
get_local $p0
get_local $p1
call $main/add16_internal
i32.const 16
i32.shl
i32.const 16
i32.shr_s
return) |
I agree, so this could be summarized by avoiding masking/sign extension between function calls. Shouldn't be too hard to implement for direct calls. Indirect calls should be doable as well, as long as restrictions on function signatures prohibit interchanging return and parameter types, which it currently looks like. I also like the idea of ABI wrappers at the outside, so exported functions used internally would still benefit from no masking. |
I think this also related to current issue. Implicitly casting from export function foo(a: i32): i32 {
return a > 0;
} Out main part (optimized):
But with explicit: export function foo(a: i32): i32 {
return <i32>(a > 0);
} Output main part (optimized):
|
Hmm, this export function foo(a: i32): i32 {
return a > 0;
} compiles to (func $foo (; 0 ;) (type $ii) (param $0 i32) (result i32)
(return
(i32.gt_s
(get_local $0)
(i32.const 0)
)
)
) for me without optimizations. |
I didn't perform any optimizations to see what the compiler itself is generating. Optimizations are a Binaryen-only thing. Maybe I missed something to reproduce this properly? |
Sorry, I publish slightly different version. Actually I mean this: export function foo(a: i32): i32 {
return (a > 0) + 1;
} After optimization: (i32.and
(i32.add
(i32.gt_s
(get_local $0)
(i32.const 0)
)
(i32.const 1)
)
(i32.const 1)
) vs (i32.add
(i32.gt_s
(get_local $0)
(i32.const 0)
)
(i32.const 1)
) |
I believe it's not looking up the contextual type there, so it doesn't know that whatever value comes from |
Currently thinking about giving this a shot. These are the places where I'd expect masking/sign-extension to occur:
Everything else can remain unwrapped (?), including assigning to locals and (non-boundary-crossing) globals. At some places, masking/sign-extension can be avoided statically, like where the input is already guaranteed to be in a valid range, i.e. a constant the compiler knows the value of. That correct so far? Did I miss something? As always, that's the first time I'll be implementing something like this. 😃 |
That sounds good so far. |
Just hit this one: WebAssembly/binaryen#1521 |
This should be handled in #87 now. Might still be missing a few obvious or not-so-obvious cases in canOverflow - if anyone would like to take a look :) Closing for now, but feel free to reopen if there are significant issues. |
Booleans, u8, i8, u16 and i16 are optimized poorly it seems. There is a lot of masking going on at the moment that is not necessary.
Consider the following two functions:
They get compiled down to:
As you can see, the add16 function has a lot of masking operations, while the add32 is just a pure addition. Compare this to the following Rust code:
Which gets compiled down to the following code:
As you can see, they are the same, as adding two i16 values is the same as adding two i32 values, because the upper 16-bits of the 32-bit wide "register" is getting masked away anyway when storing the 16-bit wide value in memory with a i32.store16_u instruction. So there's only very few instructions where actual masking is necessary (like dividing).
The text was updated successfully, but these errors were encountered: