Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For floating point operations, allow inputs to be arbitrary, including SNaNs. #883

Merged
merged 6 commits into from
Oct 25, 2019

Conversation

nlewycky
Copy link
Contributor

@nlewycky nlewycky commented Oct 17, 2019

Description

For floating point operations, allow inputs to be arbitrary, including SNaNs.

Instead of ensuring inputs are canonical NaNs on every operation, we tag outputs as pending such a canonicalization check, so that a sequence of computations can have a single canonicalization step at the end.

There's an extra wriggle for SIMD. The Wasm type system only indicates them as V128, so it's possible that we might do computations as F32x4Add, I8x16Add, F64x2Add in a row with no other computations in between. Since a canonicalization may change the bit patterns in a way that transforms one non-NaN to another non-NaN in the next subsequent instructions interpretation, most SIMD functions apply pending canonicalizations to their inputs, even the integer SIMD operations.

Review

  • Add a short description of the the change to the CHANGELOG.md file

…g SNaNs.

Instead of ensuring outputs are arithmetic NaNs on every function, we tag them as pending such a check, so that a sequence of computation can have a single canonicalization step at the end.

There's an extra wriggle for SIMD. The Wasm type system only indicates them as V128, so it's possible that we might do computations as F32x4Add, I8x16Add, F64x2Add in a row with no other computations in between. Thus, most SIMD functions apply pending canonicalizations to their inputs, even integer SIMD operations.
@nlewycky nlewycky requested a review from losfair as a code owner October 17, 2019 19:21
@nlewycky
Copy link
Contributor Author

@syrusakbary I'd appreciate if you could benchmark with this PR, please!

@nlewycky
Copy link
Contributor Author

bors try

bors bot added a commit that referenced this pull request Oct 17, 2019
@bors
Copy link
Contributor

bors bot commented Oct 17, 2019

try

Build succeeded

  • wasmerio.wasmer

@nlewycky
Copy link
Contributor Author

nlewycky commented Oct 22, 2019

I ran some benchmarks with particle-repel:

master
simd: 36.56s 36.69s 36.43s
non-simd: 100.65s 100.66s, 100.62s

feature/llvm-nan-but-fast
simd: 48.69s 48.70s 48.74s
non-simd: 62.48s 62.63s 62.40s

The speedup with non-SIMD is what I expected. I know we have to handle SIMD differently, and that it can put canonicalizations where we didn't use to have them sometimes, but I was not expecting a slowdown compared to our current approach.

It looks like wasi-sdk-5 doesn't emit a SIMD FDiv operation, but instead emits individual FDivs. At master, we got lucky with the LLVM autovectorizer reassembling it into a SIMD operation, but with this branch, we got unlucky and one of the fdivs moved into a different basic block. Overall we emit fewer instructions, even for the SIMD case, with this patch. That usually translates into better performance, but in this particular benchmark, the fdiv performance dominates.

@syrusakbary
Copy link
Member

Good to merge once we have the changes integrated in the Changelog

@nlewycky
Copy link
Contributor Author

bors r+

bors bot added a commit that referenced this pull request Oct 25, 2019
883: For floating point operations, allow inputs to be arbitrary, including SNaNs. r=nlewycky a=nlewycky

# Description
For floating point operations, allow inputs to be arbitrary, including SNaNs.

Instead of ensuring inputs are canonical NaNs on every operation, we tag outputs as pending such a canonicalization check, so that a sequence of computations can have a single canonicalization step at the end.

There's an extra wriggle for SIMD. The Wasm type system only indicates them as V128, so it's possible that we might do computations as F32x4Add, I8x16Add, F64x2Add in a row with no other computations in between. Since a canonicalization may change the bit patterns in a way that transforms one non-NaN to another non-NaN in the next subsequent instructions interpretation, most SIMD functions apply pending canonicalizations to their inputs, even the integer SIMD operations.

# Review

- [x] Add a short description of the the change to the CHANGELOG.md file


Co-authored-by: Nick Lewycky <nick@wasmer.io>
Co-authored-by: nlewycky <nick@wasmer.io>
Co-authored-by: Syrus Akbary <me@syrusakbary.com>
@bors
Copy link
Contributor

bors bot commented Oct 25, 2019

Build succeeded

  • wasmerio.wasmer

@bors bors bot merged commit dae9949 into master Oct 25, 2019
@bors bors bot deleted the feature/llvm-nan-but-fast branch October 25, 2019 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants