Skip to content

Avoid masking/sign extension between function calls #38

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
CryZe opened this issue Mar 11, 2018 · 14 comments
Closed

Avoid masking/sign extension between function calls #38

CryZe opened this issue Mar 11, 2018 · 14 comments

Comments

@CryZe
Copy link

CryZe commented Mar 11, 2018

Booleans, u8, i8, u16 and i16 are optimized poorly it seems. There is a lot of masking going on at the moment that is not necessary.

Consider the following two functions:

export function add16(x: i16, y: i16): i16 {
  return x + y;
}

export function add32(x: i32, y: i32): i32 {
  return x + y;
}

They get compiled down to:

  (func $main/add16 (export "add16") (type $t0) (param $p0 i32) (param $p1 i32) (result i32)
    get_local $p0
    get_local $p1
    i32.add
    i32.const 16
    i32.shl
    i32.const 16
    i32.shr_s
    return)
  (func $main/add32 (export "add32") (type $t0) (param $p0 i32) (param $p1 i32) (result i32)
    get_local $p0
    get_local $p1
    i32.add
    return)

As you can see, the add16 function has a lot of masking operations, while the add32 is just a pure addition. Compare this to the following Rust code:

#[no_mangle]
pub extern fn add16(x: i16, y: i16) -> i16 {
  x + y
}

#[no_mangle]
pub extern fn add32(x: i32, y: i32) -> i32 {
  x + y
}

Which gets compiled down to the following code:

  (func $add16 (export "add16") (type $t0) (param $p0 i32) (param $p1 i32) (result i32)
    get_local $p1
    get_local $p0
    i32.add)
  (func $add32 (export "add32") (type $t0) (param $p0 i32) (param $p1 i32) (result i32)
    get_local $p1
    get_local $p0
    i32.add)

As you can see, they are the same, as adding two i16 values is the same as adding two i32 values, because the upper 16-bits of the 32-bit wide "register" is getting masked away anyway when storing the 16-bit wide value in memory with a i32.store16_u instruction. So there's only very few instructions where actual masking is necessary (like dividing).

@dcodeIO
Copy link
Member

dcodeIO commented Mar 11, 2018

But, if you call this from JS now, with two valid integers in 16-bit range, the result seen on the JS-side wouldn't be in 16-bit range if it overflows, wouldn't it?

Btw., in the sources this is where wrapSmallIntegers/possiblyOverflows is used, if you'd like to take a look for yourself. Not everything is masked/sign-extended actually.

@CryZe
Copy link
Author

CryZe commented Mar 11, 2018

Mmh, that's a good point. Weird how this is completely ignored by LLVM at the moment. I think exports / imports probably require masking their arguments / return values. Everything inside of WebAssembly however doesn't need any masking (except for certain operations).

@dcodeIO
Copy link
Member

dcodeIO commented Mar 14, 2018

Another thing that comes to my mind here is that this sequence

    i32.const 16
    i32.shl
    i32.const 16
    i32.shr_s

is going to replaced with a sign extension operation in the future. That's just the way it needs to be done at this point in time. The body would then look like

    get_local $p0
    get_local $p1
    i32.add
    i32.extend16_s
    return

if I understood this correctly.

@CryZe
Copy link
Author

CryZe commented Mar 15, 2018

I'm not sure if it's possible, but I think you'd need to be able to determine the ABI of a function. So if you know a function is being exported to JS, you want to do the sign / zero extension (vice versa for the imports) and for everything internal you want to skip it. Interestingly our discussion here caused this bug to be found in Rust. So for Rust the external functions are all marked with the C ABI and will do the sign / zero extension, while all internal functions (so the Rust ABI) won't. I'm not sure if you could do something similar. An automatic analysis may not be possible, as function pointers may be ambiguous otherwise. For function pointers you could however always assume it's the "assemblyscript" abi and generate an assemblyscript abi wrapper for all the c abi functions (i.e. the exported ones).

You may need a C ABI anyway in some shape or form, as it for example also defines that all structs are supposed to be passed as pointers (at least atm), which I think is true for AssemblyScript atm as well, but could possibly change in the future.

So atm this:

function add16_internal(x: i16, y: i16): i16 {
  return x + y;
}

export function add16_cabi(x: i16, y: i16): i16 {
  return add16_internal(x, y);
}

compiles to:

  (func $main/add16_internal (type $t0) (param $p0 i32) (param $p1 i32) (result i32)
    get_local $p0
    get_local $p1
    i32.add
    i32.const 16
    i32.shl
    i32.const 16
    i32.shr_s
    return)

  (func $main/add16_cabi (export "add16_cabi") (type $t0) (param $p0 i32) (param $p1 i32) (result i32)
    get_local $p0
    get_local $p1
    call $main/add16_internal
    return)

and long term you probably want:

  (func $main/add16_internal (type $t0) (param $p0 i32) (param $p1 i32) (result i32)
    get_local $p0
    get_local $p1
    i32.add
    return)

  (func $main/add16_cabi (export "add16_cabi") (type $t0) (param $p0 i32) (param $p1 i32) (result i32)
    get_local $p0
    get_local $p1
    call $main/add16_internal
    i32.const 16
    i32.shl
    i32.const 16
    i32.shr_s
    return)

@dcodeIO
Copy link
Member

dcodeIO commented Mar 15, 2018

I agree, so this could be summarized by avoiding masking/sign extension between function calls. Shouldn't be too hard to implement for direct calls. Indirect calls should be doable as well, as long as restrictions on function signatures prohibit interchanging return and parameter types, which it currently looks like. I also like the idea of ABI wrappers at the outside, so exported functions used internally would still benefit from no masking.

@CryZe CryZe changed the title Small types are optimized poorly Avoid masking/sign extension between function calls Mar 15, 2018
@MaxGraey
Copy link
Member

MaxGraey commented Mar 21, 2018

I think this also related to current issue. Implicitly casting from bool to i32 produce extra code. For example:

export function foo(a: i32): i32 {
  return a > 0;
}

Out main part (optimized):

...
get_local $p0
i32.const 0
i32.gt_s
i32.const 1
i32.add
i32.const 1
i32.and

But with explicit:

export function foo(a: i32): i32 {
  return <i32>(a > 0);
}

Output main part (optimized):

...
get_local $p0
i32.const 0
i32.gt_s

@dcodeIO
Copy link
Member

dcodeIO commented Mar 21, 2018

Hmm, this

export function foo(a: i32): i32 {
  return a > 0;
}

compiles to

 (func $foo (; 0 ;) (type $ii) (param $0 i32) (result i32)
  (return
   (i32.gt_s
    (get_local $0)
    (i32.const 0)
   )
  )
 )

for me without optimizations.

@dcodeIO
Copy link
Member

dcodeIO commented Mar 21, 2018

I didn't perform any optimizations to see what the compiler itself is generating. Optimizations are a Binaryen-only thing. Maybe I missed something to reproduce this properly?

@MaxGraey
Copy link
Member

MaxGraey commented Mar 21, 2018

Sorry, I publish slightly different version. Actually I mean this:

export function foo(a: i32): i32 {
  return (a > 0) + 1;
}

After optimization:
(implicit)

(i32.and
  (i32.add
   (i32.gt_s
    (get_local $0)
    (i32.const 0)
   )
   (i32.const 1)
  )
  (i32.const 1)
 )

vs
(explicit)

(i32.add
  (i32.gt_s
    (get_local $0)
    (i32.const 0)
  )
  (i32.const 1)
 )

@dcodeIO
Copy link
Member

dcodeIO commented Mar 21, 2018

I believe it's not looking up the contextual type there, so it doesn't know that whatever value comes from bool + i32 doesn't have to be wrapped. Most likely not the only thing that needs to be improved when wrapping (or better: not wrapping).

@dcodeIO
Copy link
Member

dcodeIO commented Apr 27, 2018

Currently thinking about giving this a shot. These are the places where I'd expect masking/sign-extension to occur:

  • Small int LHS/RHS input to i32.div_s, i32.div_u
  • Small int arguments to imported functions
  • Small int return values of exported functions
  • Assigning to a small int global that is exported (when mutable-global lands)
  • Conversions from a small int type to a larger int type (this somewhat bothers me)
  • Small int LHS/RHS of comparisons

Everything else can remain unwrapped (?), including assigning to locals and (non-boundary-crossing) globals. At some places, masking/sign-extension can be avoided statically, like where the input is already guaranteed to be in a valid range, i.e. a constant the compiler knows the value of.

That correct so far? Did I miss something? As always, that's the first time I'll be implementing something like this. 😃

@CryZe
Copy link
Author

CryZe commented Apr 27, 2018

That sounds good so far.

@dcodeIO
Copy link
Member

dcodeIO commented Apr 28, 2018

Just hit this one: WebAssembly/binaryen#1521

@dcodeIO
Copy link
Member

dcodeIO commented May 5, 2018

This should be handled in #87 now. Might still be missing a few obvious or not-so-obvious cases in canOverflow - if anyone would like to take a look :)

Closing for now, but feel free to reopen if there are significant issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants