Update and revamp wasm32 SIMD intrinsics #874

alexcrichton · 2020-07-13T22:08:41Z

Lots of time and lots of things have happened since the simd128 support
was first added to this crate. Things are starting to settle down now so
this commit syncs the Rust intrinsic definitions with the current
specification (https://github.com/WebAssembly/simd). Unfortuantely not
everything can be enabled just yet but everything is in the pipeline for
getting enabled soon.

This commit also applies a major revamp to how intrinsics are tested.
The intention is that the setup should be much more lightweight and/or
easy to work with after this commit.

At a high-level, the changes here are:

Testing with node.js and #[wasm_bindgen] has been removed. Instead
intrinsics are tested with Wasmtime which has a nearly complete
implementation of the SIMD spec (and soon fully complete!)
Testing is switched to wasm32-wasi to make idiomatic Rust bits a bit
easier to work with (e.g. panic!)
Testing of this crate's simd128 feature for wasm is re-enabled. This
will run on CI and both compile and execute intrinsics. This should
bring wasm intrinsics to the same level of parity as x86 intrinsics,
for example.
New wasm intrinsics have been added:
- iNNxMM_loadAxA_{s,u}
- vNNxMM_load_splat
- v8x16_swizzle
- v128_andnot
- iNNxMM_abs
- iNNxMM_narrow_*_{u,s}
- iNNxMM_bitmask - commented out until LLVM is updated to LLVM 11
- iNNxMM_widen_*_{u,s} - commented out until
  Implement SIMD widening instructions for x86 bytecodealliance/wasmtime#1994 lands
- iNNxMM_{max,min}_{u,s}
- iNNxMM_avgr_u
Some wasm intrinsics have been removed:
- i64x2_trunc_*
- f64x2_convert_*
- i8x16_mul
The v8x16.shuffle instruction is exposed. This is done through a
macro (not macro_rules!, but macro). This is intended to be
somewhat experimental and unstable until we decide otherwise. This
instruction has 16 immediate-mode expressions and is as a result
unsuited to the existing constify_* logic of this crate. I'm hoping
that we can game out over time what a macro might look like and/or
look for better solutions. For now, though, what's implemented is the
first of its kind in this crate (an architecture-specific macro), so
some extra scrutiny looking at it would be appreciated.
Lots of assert_instr annotations have been fixed for wasm.
All wasm simd128 tests are uncommented and passing now.

This is still missing tests for new intrinsics and it's also missing
tests for various corner cases. I hope to get to those later as the
upstream spec itself gets closer to stabilization.

In the meantime, however, I went ahead and updated the hex.rs example
with a wasm implementation using intrinsics. With it I got some very
impressive speedups using Wasmtime:

test benches::large_default  ... bench:     213,961 ns/iter (+/- 5,108) = 4900 MB/s
test benches::large_fallback ... bench:   3,108,434 ns/iter (+/- 75,730) = 337 MB/s
test benches::small_default  ... bench:          52 ns/iter (+/- 0) = 2250 MB/s
test benches::small_fallback ... bench:         358 ns/iter (+/- 0) = 326 MB/s

or otherwise using Wasmtime hex encoding using SIMD is 15x faster on 1MB
chunks or 7x faster on small <128byte chunks.

All of these intrinsics are still unstable and will continue to be so
presumably until the simd proposal in wasm itself progresses to a later
stage. Additionaly we'll still want to sync with clang on intrinsic
names (or decide not to) at some point in the future.

rust-highfive · 2020-07-13T22:08:44Z

r? @gnzlbg

(rust_highfive has picked a reviewer for you, use r? to override)

Lots of time and lots of things have happened since the simd128 support was first added to this crate. Things are starting to settle down now so this commit syncs the Rust intrinsic definitions with the current specification (https://github.com/WebAssembly/simd). Unfortuantely not everything can be enabled just yet but everything is in the pipeline for getting enabled soon. This commit also applies a major revamp to how intrinsics are tested. The intention is that the setup should be much more lightweight and/or easy to work with after this commit. At a high-level, the changes here are: * Testing with node.js and `#[wasm_bindgen]` has been removed. Instead intrinsics are tested with Wasmtime which has a nearly complete implementation of the SIMD spec (and soon fully complete!) * Testing is switched to `wasm32-wasi` to make idiomatic Rust bits a bit easier to work with (e.g. `panic!)` * Testing of this crate's simd128 feature for wasm is re-enabled. This will run on CI and both compile and execute intrinsics. This should bring wasm intrinsics to the same level of parity as x86 intrinsics, for example. * New wasm intrinsics have been added: * `iNNxMM_loadAxA_{s,u}` * `vNNxMM_load_splat` * `v8x16_swizzle` * `v128_andnot` * `iNNxMM_abs` * `iNNxMM_narrow_*_{u,s}` * `iNNxMM_bitmask` - commented out until LLVM is updated to LLVM 11 * `iNNxMM_widen_*_{u,s}` - commented out until bytecodealliance/wasmtime#1994 lands * `iNNxMM_{max,min}_{u,s}` * `iNNxMM_avgr_u` * Some wasm intrinsics have been removed: * `i64x2_trunc_*` * `f64x2_convert_*` * `i8x16_mul` * The `v8x16.shuffle` instruction is exposed. This is done through a `macro` (not `macro_rules!`, but `macro`). This is intended to be somewhat experimental and unstable until we decide otherwise. This instruction has 16 immediate-mode expressions and is as a result unsuited to the existing `constify_*` logic of this crate. I'm hoping that we can game out over time what a macro might look like and/or look for better solutions. For now, though, what's implemented is the first of its kind in this crate (an architecture-specific macro), so some extra scrutiny looking at it would be appreciated. * Lots of `assert_instr` annotations have been fixed for wasm. * All wasm simd128 tests are uncommented and passing now. This is still missing tests for new intrinsics and it's also missing tests for various corner cases. I hope to get to those later as the upstream spec itself gets closer to stabilization. In the meantime, however, I went ahead and updated the `hex.rs` example with a wasm implementation using intrinsics. With it I got some very impressive speedups using Wasmtime: test benches::large_default ... bench: 213,961 ns/iter (+/- 5,108) = 4900 MB/s test benches::large_fallback ... bench: 3,108,434 ns/iter (+/- 75,730) = 337 MB/s test benches::small_default ... bench: 52 ns/iter (+/- 0) = 2250 MB/s test benches::small_fallback ... bench: 358 ns/iter (+/- 0) = 326 MB/s or otherwise using Wasmtime hex encoding using SIMD is 15x faster on 1MB chunks or 7x faster on small <128byte chunks. All of these intrinsics are still unstable and will continue to be so presumably until the simd proposal in wasm itself progresses to a later stage. Additionaly we'll still want to sync with clang on intrinsic names (or decide not to) at some point in the future.

alexcrichton · 2020-07-13T22:21:56Z

crates/core_arch/src/wasm32/simd128.rs

+///
+/// All indexes `$i*` must have the type `u32`.
+#[allow_internal_unstable(platform_intrinsics, rustc_attrs)]
+pub macro v8x16_shuffle(


It's probably worth expanding a bit on what I'm thinking here as well. My hope with this is that we'll enable a platform-specific macro which is not available in the prelude bit instead must be imported:

use std::arch::wasm32::v8x16_shuffle;

This will have the macro clearly scoped to wasm32, and using a macro (as can be seen below) allows us to still control the implementation in such a way that's opaque to the caller, but also in a way where we can satisfy the compiler's codegen requirements. Namely all the immediate indices here can be required as constant by the compiler itself, ensuring that we can actually codegen the instruction to LLVM and other backends and such.

I believe that we have all the pieces necessary to have unstable internals that aren't actually exposed to the user. We can also get function-like type checking by carefully crafting the macro to ensure that the input expressions all have an expected type.

Like I mentioned in the PR description though this is, AFAIK, a new thing to do. I don't believe we've pioneered this elsewhere so I don't know if there are hidden downsides (such as using macro instead of macro_rules!). I'm pretty certain we're not altering the current stability story of libcore, and I think we can continue to figure out the best way to expose an intrinsic like this over time.

FWIW the clang implementation uses a macro that calls a __builtin_* function, as does many x86_64 intrinsics with constant arguments where they're all required to be macros in C.

I believe that the proper way to expose this kind of functionality in the long term is to use const generics. I hope that we will eventually be able to use where clauses to constrain the const values to be within a certain range, which is needed to avoid LLVM-level ICEs due to bad intrinsic inputs.

Since this is all unstable I'm happy to merge it as it is, but this is something that we should definitely revisit prior to stabilization.

Oh nice! I hadn't thought about the use of const generics here (haven't had a chance to use them myself yet!)

I tested this out and it works well for v8x16_shuffle implementation. It turns out that simd_shuffle16 in the compiler actually protects us against out-of-bounds indices (they're compile time errors), so I think that happens either with the macro or with the const-generic function. (although this would be best with where clauses rather than relying on the just-happens-to-be-caught-during-codegen behavior)

One thing I had difficult though was implementing a const-generic v16x8_shuffle in terms of v8x16_shuffle. I'm sure it's possible to implement natively but this would mean that we probably would want to include v16x8_shuffle as an "intrinsic" since it wouldn't be easily implementable externally.

Actually, I've gone ahead and pushed up a version that uses const generics instead. I went ahead and propagated this to everything else using #[rustc_args_required_const] in the simd module as well. Given the trajectory of const generics it looks like this may work well for wasm simd!

This commit unconditionally exposes SIMD functions from the `wasm32` module. This is done in such a way that the standard library does not need to be recompiled to access SIMD intrinsics and use them. This, hopefully, is the long-term story for SIMD in WebAssembly in Rust. It's unlikely that all WebAssembly runtimes will end up implementing SIMD so the standard library is unlikely to use SIMD any time soon, but we want to make sure it's easily available to folks! This commit enables all this by ensuring that SIMD is available to the standard library, regardless of compilation flags. This'll come with the same caveats as x86 support, where it doesn't make sense to call these functions unless you're enabling simd support one way or another locally. Additionally, as with x86, if you don't call these functions then the instructions won't show up in your binary. While I was here I went ahead and expanded the WebAssembly-specific documentation for the wasm32 module as well, ensuring that the current state of SIMD/Atomics are documented.

This sync the names of the intrinsics with the current spec, renamed in January.

alexcrichton · 2020-07-14T12:38:22Z

Ok I've pushed up two more commits, one is to avoid #[cfg] on the simd128 module, meaning it will now always be available in the standard library. This brings it even more in line with x86 to make sure it's easier to use simd intrinsics.

The second is a sync of the names of the atomic memory intrinsics, which I'll need to update the usage of in the standard library when the submodule is updated (which I can do after this lands)

alexcrichton · 2020-07-14T12:38:46Z

@Amanieu would you be up for reviewing this? Or should I try to rope in some other wasm folks?

Amanieu · 2020-07-14T12:46:26Z

I'm happy to review this but I won't be able to get around to it before the end of this week.

alexcrichton · 2020-07-14T12:47:00Z

Ok cool, thanks! And no rush of course, just figured it was time for me to get around to updating the wasm simd support :)

Support for them at runtime was added to Wasmtime

This commit switches all wasm simd intrinsics to using const generics instead of the funky `#[rustc_args_required_const]` attribute. This is ideally a bit more future-proof and more readily expresses the constraints of these instructions!

alexcrichton · 2020-07-16T19:38:40Z

FWIW I think const generics works a lot better for SIMD intrinsics than #[rustc_args_required_const], and if that's used going forward (at least for non-x86 platforms) then it largely just leaves a question of what to do with previously stable items (which I think is just memory_size/memory_grow for wasm). I'm not really sure how we'll address that but we could probably think of a new name and just deprecate the old versions.

Amanieu · 2020-07-16T19:57:16Z

Since #[rustc_args_required_const] is unstable and only used in stdarch I'm considering turning it into syntax sugar for const generics. Calls to a function with #[rustc_args_required_const] will automatically be desugared to a call to a const generic function.

crates/core_arch/src/wasm32/simd128.rs

alexcrichton · 2020-07-17T14:23:37Z

Ok I switched const indices to usize, switched to normal parameters for *_const functions, and I went ahead and removed v128_const since we're clearly deviating from the "spec" anyway with vector creation intrinsics, so there's no real need to have an exact v128.const equivalent.

alexcrichton · 2020-07-17T14:25:42Z

One thing that may be pretty reasonable to say though is that the text format for SIMD has special forms of v128.const, namely:

v128.const i8x16 i8 i8 i8 i8 i8 i8 i8 i8 i8 i8 i8 i8 i8 i8 i8 i8
v128.const i16x8 i16 i16 i16 i16 i16 i16 i16 i16
v128.const i32x4 i32 i32 i32 i32
v128.const i64x2 i64 i64
v128.const f32x4 f32 f32 f32 f32
v128.const f64x2 f64 f64

so we're arguably just matching that form where you can create a vector with any of the member types without having to go through all the gymnastics of byte transmutes yourself.

alexcrichton · 2020-07-18T15:42:37Z

Thanks again @Amanieu for your help reviewing this!

This commit updates the src/stdarch submodule primarily to include rust-lang/stdarch#874 which updated and revamped WebAssembly SIMD intrinsics and renamed WebAssembly atomics intrinsics. This is all unstable surface area of the standard library so the changes should be ok here. The SIMD updates also enable SIMD intrinsics to be used by any program any any time, yay! cc rust-lang#74372, a tracking issue I've opened for the stabilization of SIMD intrinsics

…kruppe Update stdarch submodule This commit updates the src/stdarch submodule primarily to include rust-lang/stdarch#874 which updated and revamped WebAssembly SIMD intrinsics and renamed WebAssembly atomics intrinsics. This is all unstable surface area of the standard library so the changes should be ok here. The SIMD updates also enable SIMD intrinsics to be used by any program any any time, yay! cc rust-lang#74372, a tracking issue I've opened for the stabilization of SIMD intrinsics

rust-highfive assigned gnzlbg Jul 13, 2020

alexcrichton force-pushed the wasm-simd-revamp branch from 0b5db66 to 1ef033a Compare July 13, 2020 22:16

alexcrichton force-pushed the wasm-simd-revamp branch from 1ef033a to 47e3245 Compare July 13, 2020 22:19

alexcrichton commented Jul 13, 2020

View reviewed changes

alexcrichton added 2 commits July 14, 2020 05:37

Update names of wasm atomic intrinsics

30d1962

This sync the names of the intrinsics with the current spec, renamed in January.

alexcrichton added 3 commits July 14, 2020 05:57

Fix build script output

11f70bf

Fix issues found when updating libstd's submodule

8d7ad15

Tweak a few items in the docs

e572992

alexcrichton mentioned this pull request Jul 15, 2020

Tracking issue for WebAssembly SIMD support rust-lang/rust#74372

Closed

7 tasks

alexcrichton added 3 commits July 15, 2020 10:01

Update SIMD tracking issue

262dd22

Fix xcrate-macros test

e04182d

Fixup some stability annotations and feature directives

011c9df

alexcrichton force-pushed the wasm-simd-revamp branch from 80a4a8d to 011c9df Compare July 15, 2020 18:07

alexcrichton added 2 commits July 15, 2020 12:21

Enable widen instructions

78a11ba

Support for them at runtime was added to Wasmtime

Amanieu reviewed Jul 16, 2020

View reviewed changes

crates/core_arch/src/wasm32/simd128.rs Outdated Show resolved Hide resolved

Amanieu reviewed Jul 16, 2020

View reviewed changes

crates/core_arch/src/wasm32/simd128.rs Outdated Show resolved Hide resolved

alexcrichton added 4 commits July 17, 2020 07:16

Switch const-level indexing to use usize

bfae7ea

Pin wasmtime to 0.19.0

2505f7c

Switch *_const to non-const-generics

ae88f82

Remove v128_const

6ada7ba

Amanieu merged commit 56ee497 into rust-lang:master Jul 18, 2020

alexcrichton deleted the wasm-simd-revamp branch July 18, 2020 15:42

alexcrichton mentioned this pull request Jul 18, 2020

Update stdarch submodule rust-lang/rust#74482

Merged

Daniel-Liu-c0deb0t mentioned this pull request Feb 14, 2021

Uncomment wasm32 bitmask instruction #1004

Closed

daxpedda mentioned this pull request Jul 11, 2024

Add missing #[target_feature(enable = "simd128")] #1609

Closed

Update and revamp wasm32 SIMD intrinsics #874

Update and revamp wasm32 SIMD intrinsics #874

Uh oh!

Conversation

alexcrichton commented Jul 13, 2020

Uh oh!

rust-highfive commented Jul 13, 2020

Uh oh!

alexcrichton Jul 13, 2020

Choose a reason for hiding this comment

Uh oh!

Amanieu Jul 16, 2020

Choose a reason for hiding this comment

Uh oh!

Amanieu Jul 16, 2020

Choose a reason for hiding this comment

Uh oh!

alexcrichton Jul 16, 2020

Choose a reason for hiding this comment

Uh oh!

alexcrichton Jul 16, 2020

Choose a reason for hiding this comment

Uh oh!

alexcrichton commented Jul 14, 2020

Uh oh!

alexcrichton commented Jul 14, 2020

Uh oh!

Amanieu commented Jul 14, 2020

Uh oh!

alexcrichton commented Jul 14, 2020

Uh oh!

alexcrichton commented Jul 16, 2020

Uh oh!

Amanieu commented Jul 16, 2020

Uh oh!

Uh oh!

Uh oh!

alexcrichton commented Jul 17, 2020

Uh oh!

alexcrichton commented Jul 17, 2020

Uh oh!

alexcrichton commented Jul 18, 2020

Uh oh!

Uh oh!