Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for WebAssembly SIMD support #74372

Closed
4 of 7 tasks
alexcrichton opened this issue Jul 15, 2020 · 44 comments · Fixed by #86204
Closed
4 of 7 tasks

Tracking issue for WebAssembly SIMD support #74372

alexcrichton opened this issue Jul 15, 2020 · 44 comments · Fixed by #86204
Labels
A-SIMD Area: SIMD (Single Instruction Multiple Data) C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. O-wasm Target: WASM (WebAssembly), http://webassembly.org/ T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@alexcrichton
Copy link
Member

alexcrichton commented Jul 15, 2020

I'm opening this as a tracking issue for the SIMD intrinsics in the {std,core}::arch::wasm32 module. Eventually we're going to want to stabilize these intrinsics for the WebAssembly target, so I think it's good to have a canonical place to talk about them! I'm also going to update the #![unstable] annotations to point to this issue to direct users here if they want to use these intrinsics.

The WebAssembly simd proposal is currently in "phase 3". I would say that we probably don't want to consider stabilizing these intrinsics until the proposal has at least reached "phase 4" where it's being standardized, because there are still changes to the proposal happening over time (small ones at this point, though). As a brief overview, the WebAssembly simd proposal adds a new type, v128, and a suite of instructions to perform data processing with this type. The intention is that this is readily portable to a lot of architectures so usage of SIMD can be fast in lots of places.

For rust stabilization purposes the code for all these intrinsics lives in the rust-lang/stdarch git repository. All code lives in crates/core_arch/src/wasm32/simd128.rs. I've got a large refactoring and sync queued up for that module, so I'm going to be writing this issue with the assumption that it will land mostly as designed there.

Currently the design principles for the SIMD intrinsics are:

  • Like the existing memory_size, memory_grow and unreachable intrinsics, most intrinsics are named after the instruction that it represents. There is generally a 1:1 mapping with new instructions added to WebAssembly and intrinsics in the module.
  • The type signature of each intrinsic is intended to match the textual description of each intrinsic
  • Each intrinsic has #[target_feature(enable = "simd128")] which forces them all to be unsafe
  • Some gotchas for specific intrinsics are:
    • v128.const is exposed through a suite of const functions, one for each vector type (but not unsigned, just signed integers). Additionally the arguments are not actually required to be constant, so it's expected that the compiler will make the best choice about how to generate a runtime vector.
    • Instructions using lane indices, such as v8x16_shuffle and *_{extract,replace}_lane use const generics to represent constant arguments. This is different from x86_64 which uses the older #[rustc_args_required_const] attribute.
    • Shuffles are provided for v16x8, v32x4, and v64x2 as conveniences instead of only providing v8x16_shuffle. All of them are implemented in terms of the v8x16.shuffle instruction, however.
  • There is a singular v128 type, not a type for each size of vector that intrinsics operate with
  • The extract_lane intrinsics return the value type associated with the intrinsic name, they do not all return i32 unlike the actual WebAssembly instruction. This means that we do not have extract_lane_s and extract_lane_u intrinsics because the compiler will select the appropriate one depending on the context.

It's important to note that clang has an implementation of these intrinsics in the wasm_simd128.h header. The current design of the Rust wasm32 module is different in that:

  • The prefix wasm_* isn't used.
  • Only one datatype, v128, is exposed instead of types for each size/kind of vector
  • Naming can be different depending on the intrinsic. For example clang has wasm_i16x8_load_8x8 and wasm_u16x8_load_8x8 while Rust has i16x8_load8x8_s and i16x8_load8x8_u.

Most of these differences are largely stylistic, but there are some that are conveniences (like other forms of shuffles) which might be nice to expose in Rust as well. All the conveniences still compile down to one instruction, it's just different how users specify in code how the instruction is generated. I believe it should be possible for conveniences to live outside the standard library as well, however.

How SIMD will be used

If the SIMD proposal were to move to stage 4 today I think we're in a really good spot for stabilization. #74320 is a pretty serious bug we will want to fix before full stabilization but I don't believe the fix will be hard to land in LLVM (I've already talked with some folks on that side).

Other than that SIMD-in-wasm is different from other platforms where a binary with SIMD will refuse to run on engines that do not have SIMD support. In that sense there is no runtime feature detection available to SIMD consumers. (at least not natively)

After rust-lang/stdarch#874 lands programs will simply use #[target_feature(enable = "...")] or RUSTFLAGS and everything should work. The SIMD intrinsics will always be exposed from the standard library (but the standard library itself will not use them) and available to users. If programs don't use the intrinsics then SIMD won't get emitted, otherwise when used the binary will use v128.

Open Questions

A set of things we'll need to settle on before stabilizing (and this will likely expand over time) is:

  • Handle the difference between Clang and Rust. This could come in a number of forms such as accepting the difference or trying to unify the two. Either way the standard itself, unlike for x86, does not nor do I think will it provide a standard convention of how to expose these instructions in languages.
  • Audit and confirm the types of pointers in various *_load_* and *_store_* instructions. Primarily the instructions that load 64 bits (8x8, 16x4, ...) I'm unsure of on the types of their pointer arguments.
  • Figure out if the usage of const generics is ok for v8x16_shuffle and lane managment instructions.
  • Confirm the deviation of not having i8x16_extract_lane_s is ok (e.g. having i8x16_extract_lane returning i8 is all we need), same for i16x8.
  • Consider relaxing #[target_feature] "requires unsafe" rules for these WebAssembly intrinsics. Intrinsic like f32x4_splat have no fundamental reason they need to be unsafe. The only reason they're unsafe is because #[target_feature] is used on them to ensure that SIMD instructions are generated in LLVM.
  • Consider switching *_{any,all}_true to returning a bool
  • A general audit of intrinsic names and signatures to ensure they match the specification.
@alexcrichton alexcrichton added O-wasm Target: WASM (WebAssembly), http://webassembly.org/ C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC labels Jul 15, 2020
@jonas-schievink jonas-schievink added the A-SIMD Area: SIMD (Single Instruction Multiple Data) label Jul 15, 2020
alexcrichton added a commit to alexcrichton/rust that referenced this issue Jul 28, 2020
This commit updates the src/stdarch submodule primarily to include
rust-lang/stdarch#874 which updated and revamped WebAssembly SIMD
intrinsics and renamed WebAssembly atomics intrinsics. This is all
unstable surface area of the standard library so the changes should be
ok here. The SIMD updates also enable SIMD intrinsics to be used by any
program any any time, yay!

cc rust-lang#74372, a tracking issue I've opened for the stabilization of SIMD
intrinsics
bors added a commit to rust-lang-ci/rust that referenced this issue Jul 28, 2020
…kruppe

Update stdarch submodule

This commit updates the src/stdarch submodule primarily to include
rust-lang/stdarch#874 which updated and revamped WebAssembly SIMD
intrinsics and renamed WebAssembly atomics intrinsics. This is all
unstable surface area of the standard library so the changes should be
ok here. The SIMD updates also enable SIMD intrinsics to be used by any
program any any time, yay!

cc rust-lang#74372, a tracking issue I've opened for the stabilization of SIMD
intrinsics
@est31
Copy link
Member

est31 commented Dec 25, 2020

Is there a way to enable wasm SIMD globally so that it can be adopted by the autovectorizer?

Emscripten seems to have such a feature: https://emscripten.org/docs/porting/simd.html

To enable SIMD, pass the -msimd128 flag at compile time. This will also turn on LLVM’s autovectorization passes, so no source modifications are necessary to benefit from SIMD.

@jsheard
Copy link
Contributor

jsheard commented Dec 31, 2020

Is there a way to enable wasm SIMD globally so that it can be adopted by the autovectorizer?

Enabling simd128 via target-feature seems to do the trick, this simple case gets vectorized: https://godbolt.org/z/hcaxxf

@CryZe
Copy link
Contributor

CryZe commented Feb 8, 2021

Consider switching *_{any,all}_true to returning a bool

They really should, it doesn't seem like the compiler can reason about it... though maybe that is because the compiler doesn't even inline any of the intrinsics at all, which seems like a decently big problem:

https://rust.godbolt.org/z/h6Y5PY

@Amanieu
Copy link
Member

Amanieu commented Mar 21, 2021

A lot of the load/store intrinsics feel redundant: the memory access itself is better done in Rust code rather than by the intrinsic itself.

Examples:

  • v128_load/v128_store are literally just *m and *m = a. Also, they are inconsistent with the Clang version which allows unaligned access while the Rust version doesn't.
  • v128_loadN_splat is just iNxM_splat(*x).
  • v128_loadN_lane is just iNxM_replace_lane(a, *val).
  • v128_storeN_lane is just *m = iNxM_extract_lane(v).
  • v128_loadN_zero doesn't have a direct intrinsic equivalent, but probably should have.

I would keep v128_loadNxM_{s,u} though since these do not directly translate to Rust code.

@alexcrichton
Copy link
Member Author

I agree v128_{load,store} are redundant, especially because it's something Rust has to support anyway because you can take the address of any value at any time. I figured it was worth adding them for completeness, but you've got a good point that it may be best to use the ptr::{read,write}_unaligned intrinsics instead of just raw pointer reads/writes.

For v128_{load,store}N_lane as well as v128_loadN_zero I think unfortunately at this time need to expose the intrinsics. LLVM doesn't seem to optimize the equivalent paterns into the relevant instruction, although I'm sure that at some point in the future it will gain the ability to do so. I glanced at the discussions adding some of those intrinsics awhile back and I think the rationale was that compilers like V8 aren't great at pattern-matching multiple instructions to optimize the load/splat for example, and I think that ended up motivating the addition of the instructions. (I'm not 100% sure on this point though). In general though I think the idea is that compilers aren't always good enough to fuse the operations so there's a special instruction for it. (and compilers in this case I guess is both the one compiling wasm to native code but also LLVM compiling source code to wasm).

I've been adding most of the intrinsics so far but unfortunately I haven't had the opportunity to write a compiled-to-wasm thing which actually uses the instructions. In that sense I'm mostly shooting in the dark as to what the APIs should be. So far I've tried to maximize availability (ensuring there's a function-per-instruction, regardless of how silly it is to have) and stick close to the most standardized piece, the spec names/type signatures. I did this to stay in the spirit of the x86 intrinsics which are copied verbatim from the spec and we provide virtually no abstractions to make them nice to use (even in some cases where it would be easy to do so). Overall, though, I'm not sure if this is the best tradeoff for wasm. WebAssembly isn't the same as x86 in this regard, so we may get more bang for our buck by putting more thought and effort into what a usable API would be (at the cost of time for stabilization, of course).

@alexcrichton
Copy link
Member Author

alexcrichton commented Apr 28, 2021

tl;dr; I'd like to ask if a @rust-lang/libs team member would be willing to
propose an FCP on this issue for stabilization.


Ok I think enough things have landed now that I'd like to propose that this
issue is considered for stabilization. I've read over the OP and it's still
quite accurate, the only difference is that the WebAssembly SIMD
proposal
is now at phase 4 which is WebAssembly's "close to
stabilization". Additionally browsers will start shipping this feature ungated
in the future, with Chrome already scheduled to do so and also
implemented in
Firefox
.
Consequently the instruction set that we're targeting is stable at this point
and I think we're in a position where we can stabilize the Rust intrinsics
themselves.

To recap, this tracking proposal is for v128-related functions in the
core::arch::wasm32
module
(also available in
std of course). The intrinsics and v128 type in this module are intended to
expose the functionality of the WebAssembly simd
proposal
which has both an English
description

as well as a proposed formal
specification
.

The v128 type and the new instructions added to WebAssembly can be used very
similarly to existing CPU intrinsics that Rust has for x86 and x86_64.
Unlike x86_64, however, WebAssembly does not have a standard like the Intel
Intrinsics Guide or precedent in other languages of how to bind the SIMD support
in the language itself. Consequently a decision to stabilize these intrinsics in
Rust will be sort of blazing a new trail rather than following existing
precedent.

The design princicples for the SIMD intrinsics in the wasm32 module I'd
propose stabilizing are:

  • One type, v128, is exposed. This mirrors the underlying type added to
    WebAssembly itself. We could alternatively add types like i32x4 instead of
    just a singular v128 type, but this would require extra conversions to also
    be defined and given the precedent of x86_64 SIMD support in Rust the arch
    module's primary goal is "easiest for compilers to implement and the library
    team to stabilize" instead of "easiest for developers to use".

  • Instructions are exposed with an unsafe function that has an appropriate
    #[target_feature] annotation on it. WebAssembly is unlike other targets
    where the unsafe here is not necessary. Unlike x86_64 which isn't
    guaranteed to raise a CPU fault on invalid intrinsics executed, WebAssembly
    won't even begin to execute code unless the entire module validates.
    These intrinsics could be safe if not for the current design of the
    #[target_feature] attributes, and ideally these would be made safe in a
    future Rust release as a non-breaking change.

  • Instruction type signatures closely mirror the instruction's type signature
    but use Rust native types where possible. For example i16x8_all_true returns
    a bool instead of an i32 like the instruction does.

  • The v128.const instruction is exposed as a variety of constructors (see
    below), but none of which require constant arguments. The compiler (LLVM) is
    left to select the most appropriate instruction (or sequence of instructions)
    for creating a vector. This may be v128.const, but it may also not be.

  • Instruction function names and type signatures are inspired by their
    instruction name and instruction signature, but are not required to follow it.
    WebAssembly instruction names follow WebAssembly conventions (e.g. only having
    one i32 type instead of two i32 and u32 types) which aren't necessarily
    conventions for Rust as well. Furthermore WebAssembly exposes types like a
    Rust bool as an i32. This means that each instruction is considered in
    isolation of how to best expose it to Rust.

    Some general principles for naming intrinsics are:

    • Conveniences are exposed when the instruction is extremely low level. For
      example u64x2 is exposed which is typically lowered to v128.const.
    • Suffixes like _s and _u for instructions are baked into the funciton
      names to to indicate the signededness of the vector argument.
    • Rust functions are provided for both signed and unsigned names of vectors
      (with corresponding argument types).

    Unlike x86_64 where there are thousands of intrinsics that can't be
    hand-verified and scrutinized, WebAssembly is exposing ~200 intrinsics which
    is expected to be reasonable enough to bikeshed intrinsics individually and
    find the best names. A table of how instructions are exposed is:

Wasm Instruction Rust Function(s)
v128.load v128_load
v128.load8x8_s i16x8_load_extend_i8x8
v128.load8x8_u i16x8_load_extend_u8x8
v128.load16x4_s i32x4_load_extend_i16x4
v128.load16x4_u i32x4_load_extend_u16x4
v128.load32x2_s i64x2_load_extend_i32x2
v128.load32x2_u i64x2_load_extend_u32x2
v128.load8_splat v128_load8_splat
v128.load16_splat v128_load16_splat
v128.load32_splat v128_load32_splat
v128.load64_splat v128_load64_splat
v128.load32_zero v128_load32_zero
v128.load64_zero v128_load64_zero
v128.store v128_store
v128.load8_lane v128_load8_lane
v128.load16_lane v128_load16_lane
v128.load32_lane v128_load32_lane
v128.load64_lane v128_load64_lane
v128.store8_lane v128_store8_lane
v128.store16_lane v128_store16_lane
v128.store32_lane v128_store32_lane
v128.store64_lane v128_store64_lane
v128.const i8x16, u8x16, i16x8, u16x8, i32x4, u32x4, i64x2, u64x2, f32x4, f64x2
i8x16.shuffle i8x16_shuffle, i16x8_shuffle, i32x4_shuffle, i64x2_shuffle
i8x16.extract_lane_s i8x16_extract_lane
i8x16.extract_lane_u u8x16_extract_lane
i16x8.extract_lane_s i16x8_extract_lane
i16x8.extract_lane_u u16x8_extract_lane
i32x4.extract_lane i32x4_extract_lane, u32x4_extract_lane
i64x2.extract_lane i64x2_extract_lane, u64x2_extract_lane
f32x4.extract_lane f32x4_extract_lane
f64x2.extract_lane f64x2_extract_lane
i8x16.replace_lane i8x16_replace_lane, u8x16_replace_lane
i16x8.replace_lane i16x8_replace_lane, u16x8_replace_lane
i32x4.replace_lane i32x4_replace_lane, u32x4_replace_lane
i64x2.replace_lane i64x2_replace_lane, u64x2_replace_lane
f32x4.replace_lane f32x4_replace_lane
f64x2.replace_lane f64x2_replace_lane
i8x16.swizzle i8x16_swizzle
i8x16.splat i8x16_splat, u8x16_splat
i16x8.splat i16x8_splat, u16x8_splat
i32x4.splat i32x4_splat, u32x4_splat
i64x2.splat i64x2_splat, u64x2_splat
f32x4.splat f32x4_splat
f64x2.splat f64x2_splat
i8x16.eq i8x16_eq
i8x16.ne i8x16_ne
i8x16.lt_s i8x16_lt
i8x16.lt_u u8x16_lt
i8x16.gt_s i8x16_gt
i8x16.gt_u u8x16_gt
i8x16.le_s i8x16_le
i8x16.le_u u8x16_le
i8x16.ge_s i8x16_ge
i8x16.ge_u u8x16_ge
i16x8.eq i16x8_eq
i16x8.ne i16x8_ne
i16x8.lt_s i16x8_lt
i16x8.lt_u u16x8_lt
i16x8.gt_s i16x8_gt
i16x8.gt_u u16x8_gt
i16x8.le_s i16x8_le
i16x8.le_u u16x8_le
i16x8.ge_s i16x8_ge
i16x8.ge_u u16x8_ge
i32x4.eq i32x4_eq
i32x4.ne i32x4_ne
i32x4.lt_s i32x4_lt
i32x4.lt_u u32x4_lt
i32x4.gt_s i32x4_gt
i32x4.gt_u u32x4_gt
i32x4.le_s i32x4_le
i32x4.le_u u32x4_le
i32x4.ge_s i32x4_ge
i32x4.ge_u u32x4_ge
i64x2.eq i64x2_eq
i64x2.ne i64x2_ne
i64x2.lt_s i64x2_lt
i64x2.gt_s i64x2_gt
i64x2.le_s i64x2_le
i64x2.ge_s i64x2_ge
f32x4.eq f32x4_eq
f32x4.ne f32x4_ne
f32x4.lt f32x4_lt
f32x4.gt f32x4_gt
f32x4.le f32x4_le
f32x4.ge f32x4_ge
f64x2.eq f64x2_eq
f64x2.ne f64x2_ne
f64x2.lt f64x2_lt
f64x2.gt f64x2_gt
f64x2.le f64x2_le
f64x2.ge f64x2_ge
v128.not v128_not
v128.and v128_and
v128.andnot v128_andnot
v128.or v128_or
v128.xor v128_xor
v128.bitselect v128_bitselect
v128.any_true v128_any_true
i8x16.abs i8x16_abs
i8x16.neg i8x16_neg
i8x16.popcnt i8x16_popcnt
i8x16.all_true i8x16_all_true
i8x16.bitmask i8x16_bitmask
i8x16.narrow_i16x8_s i8x16_narrow_i16x8
i8x16.narrow_i16x8_u u8x16_narrow_i16x8
i8x16.shl i8x16_shl
i8x16.shr_s i8x16_shr
i8x16.shr_u u8x16_shr
i8x16.add i8x16_add
i8x16.add_sat_s i8x16_add_sat
i8x16.add_sat_u u8x16_add_sat
i8x16.sub i8x16_sub
i8x16.sub_sat_s i8x16_sub_sat
i8x16.sub_sat_u u8x16_sub_sat
i8x16.min_s i8x16_min
i8x16.min_u u8x16_min
i8x16.max_s i8x16_max
i8x16.max_u u8x16_max
i8x16.avgr_u u8x16_avgr
i16x8.extadd_pairwise_i8x16_s i16x8_extadd_pairwise_i8x16
i16x8.extadd_pairwise_i8x16_u i16x8_extadd_pairwise_u8x16
i16x8.abs i16x8_abs
i16x8.neg i16x8_neg
i16x8.qmulr_sat_s i16x8_q15mulr_sat
i16x8.all_true i16x8_all_true
i16x8.bitmask i16x8_bitmask
i16x8.narrow_i32x4_s i16x8_narrow_i32x4
i16x8.narrow_i32x4_u u16x8_narrow_i32x4
i16x8.extend_low_i8x16_s i16x8_extend_low_i8x16
i16x8.extend_high_i8x16_s i16x8_extend_high_i8x16
i16x8.extend_low_i8x16_u i16x8_extend_low_u8x16
i16x8.extend_high_i8x16_u i16x8_extend_high_u8x16
i16x8.shl i16x8_shl
i16x8.shr_s i16x8_shr
i16x8.shr_u u16x8_shr
i16x8.add i16x8_add
i16x8.add_sat_s i16x8_add_sat
i16x8.add_sat_u u16x8_add_sat
i16x8.sub i16x8_sub
i16x8.sub_sat_s i16x8_sub_sat
i16x8.sub_sat_u u16x8_sub_sat
i16x8.mul i16x8_mul
i16x8.min_s i16x8_min
i16x8.min_u u16x8_min
i16x8.max_s i16x8_max
i16x8.max_u u16x8_max
i16x8.avgr_u u16x8_avgr
i16x8.extmul_low_i8x16_s i16x8_extmul_low_i8x16
i16x8.extmul_high_i8x16_s i16x8_extmul_high_i8x16
i16x8.extmul_low_i8x16_u i16x8_extmul_low_u8x16
i16x8.extmul_high_i8x16_u i16x8_extmul_high_u8x16
i32x4.extadd_pairwise_i16x8_s i32x4_extadd_pairwise_i16x8
i32x4.extadd_pairwise_i16x8_u i32x4_extadd_pairwise_u16x8
i32x4.abs i32x4_abs
i32x4.neg i32x4_neg
i32x4.all_true i32x4_all_true
i32x4.bitmask i32x4_bitmask
i32x4.extend_low_i16x8_s i32x4_extend_low_i16x8
i32x4.extend_high_i16x8_s i32x4_extend_high_i16x8
i32x4.extend_low_i16x8_u i32x4_extend_low_u16x8
i32x4.extend_high_i16x8_u i32x4_extend_high_u16x8
i32x4.shl i32x4_shl
i32x4.shr_s i32x4_shr
i32x4.shr_u u32x4_shr
i32x4.add i32x4_add
i32x4.sub i32x4_sub
i32x4.mul i32x4_mul
i32x4.min_s i32x4_min
i32x4.min_u u32x4_min
i32x4.max_s i32x4_max
i32x4.max_u u32x4_max
i32x4.dot_i16x8_s i32x4_dot_i16x8
i32x4.extmul_low_i16x8_s i32x4_extmul_low_i16x8
i32x4.extmul_high_i16x8_s i32x4_extmul_high_i16x8
i32x4.extmul_low_i16x8_u i32x4_extmul_low_u16x8
i32x4.extmul_high_i16x8_u i32x4_extmul_high_u16x8
i64x2.abs i64x2_abs
i64x2.neg i64x2_neg
i64x2.all_true i64x2_all_true
i64x2.bitmask i64x2_bitmask
i64x2.extend_low_i32x4_s i64x2_extend_low_i32x4
i64x2.extend_high_i32x4_s i64x2_extend_high_i32x4
i64x2.extend_low_i32x4_u i64x2_extend_low_u32x4
i64x2.extend_high_i32x4_u i64x2_extend_high_u32x4
i64x2.shl i64x2_shl
i64x2.shr_s i64x2_shr
i64x2.shr_u u64x2_shr
i64x2.add i64x2_add
i64x2.sub i64x2_sub
i64x2.mul i64x2_mul
i64x2.extmul_low_i32x4_s i64x2_extmul_low_i32x4
i64x2.extmul_high_i32x4_s i64x2_extmul_high_i32x4
i64x2.extmul_low_i32x4_u i64x2_extmul_low_u32x4
i64x2.extmul_high_i32x4_u i64x2_extmul_high_u32x4
f32x4.ceil f32x4_ceil
f32x4.floor f32x4_floor
f32x4.trunc f32x4_trunc
f32x4.nearest f32x4_nearest
f32x4.abs f32x4_abs
f32x4.neg f32x4_neg
f32x4.sqrt f32x4_sqrt
f32x4.add f32x4_add
f32x4.sub f32x4_sub
f32x4.mul f32x4_mul
f32x4.div f32x4_div
f32x4.min f32x4_min
f32x4.max f32x4_max
f32x4.pmin f32x4_pmin
f32x4.pmax f32x4_pmax
f64x2.ceil f64x2_ceil
f64x2.floor f64x2_floor
f64x2.trunc f64x2_trunc
f64x2.nearest f64x2_nearest
f64x2.abs f64x2_abs
f64x2.neg f64x2_neg
f64x2.sqrt f64x2_sqrt
f64x2.add f64x2_add
f64x2.sub f64x2_sub
f64x2.mul f64x2_mul
f64x2.div f64x2_div
f64x2.min f64x2_min
f64x2.max f64x2_max
f64x2.pmin f64x2_pmin
f64x2.pmax f64x2_pmax
i32x4.trunc_sat_f32x4_s i32x4_trunc_sat_f32x4
i32x4.trunc_sat_f32x4_u u32x4_trunc_sat_f32x4
f32x4.convert_i32x4_s f32x4_convert_i32x4
f32x4.convert_i32x4_u f32x4_convert_u32x4
i32x4.trunc_sat_f64x2_s_zero i32x4_trunc_sat_f64x2_zero
i32x4.trunc_sat_f64x2_u_zero u32x4_trunc_sat_f64x2_zero
f64x2.convert_low_i32x4_s f64x2_convert_low_i32x4
f64x2.convert_low_i32x4_u f64x2_convert_low_u32x4
f32x4.demote_f64x2_zero f32x4_demote_f64x2_zero
f64x2.promote_low_f32x4 f64x2_promote_low_f32x4

It's worth nothing that Clang has a header file for WebAssembly
intrinsics in the same way it has one for x86_64. The Rust names do not
precisely match the names exposed in C. The first shift is that Rust functions
are not prefixed with wasm_ (since Rust has a wasm32 module unlike C does).
Even accounting for this difference Rust has other naming differences:

  • wasm_i16x8_load8x8 vs i16x8_load_extend_i8x8 (and related)
  • wasm_u16x8_load8x8 vs i16x8_load_extend_u8x8 (and related)
  • wasm_i8x16_make vs i8x16 (and related)
  • wasm_i8x16_const vs i8x16 (and related)
  • C lacks a wasm_u8x16_splat function (only has i8x16)
  • C lacks a wasm_u8x16_replace_lane function (only has i8x16)
  • C lacks a wasm_u32x4_extract_lane function (only has i32x4)
  • wasm_u32x4_extend_low_u16x8 vs i32x4_extend_low_u16x8 (and related)
  • wasm_u16x8_extmul_high_u8x16 vs i16x8_extmul_high_u8x16 (and related)

The intention behind these design decisions is that all the functionality of the
proposal should be exposed in Rust, but at the same time we give ourselves
wiggle room where possible to add conveniences. The conveniences aren't super
principled other than evaluating intrinsics one-by-one. There are other possible
points naturally on the design spectrum for this module, for example providing
many types like u8x16 with first-class methods and such. Functions taking
memory arguments could arguably take safe references instead of raw
pointers as well. As @Amanieu pointed
out

other intrinsics are unlikely to ever get used or can otherwise be trivially
composed from other intrinsics (relying on LLVM to optimize, which it doesn't
always do just yet). My hope is that by getting a few more eyeballs on this we
can figure out a good balance for Rust and what we'd like the intrinsics to be.

Implementation Status

It's worth touching on the implementation status of this proposal currently as
well. At this time the stdarch
repository

has an implementation for all of the above intrinsics. After
#84654 lands LLVM will support all
instructions except i64x2.abs (just a minor LLVM issue) and the final update
to stdarch to match LLVM will be in
rust-lang/stdarch#1131. Note that actually updating the
stdarch submodule in rust-lang/rust happening in
#83278 is blocked. This means that the
current documentation for
wasm
does not match this
proposal, despite this proposal all being implemented.

It's also worth mentioning that this is all a relatively new feature in LLVM.
LLVM 12 an prior do not support the WebAssembly simd proposal as-is. Just before
reaching stage 4 a number of opcodes were renumbered, so LLVM versions 12 and
prior will produce invalid WebAssembly binaries using the simd feature. A number
of backports have been performed onto Rust's LLVM 12 fork so the Rust fork of
LLVM fully supports WebAssembly SIMD, but this means that until LLVM 13 is
released it's unlikely that cross-language-LTO or similar interop will be
supported between C and Rust.

@CryZe
Copy link
Contributor

CryZe commented Apr 28, 2021

i8x16, u8x16, i16x8, u16x8, i32x4, u32x4, i64x2, u64x2, f32x4, f64x2

I'm still somewhat skeptical of those being the names for the v128.const constructor functions as that means no such types could ever live in this module, and idk if we truly want to block off that possibility forever. I agree that for now just a v128 type and these free standing functions is fine, but maybe those constructor functions could be named with a _const suffix like the actual instruction (or e.g. _make like C seems to use).

@alexcrichton
Copy link
Member Author

There was a small bikeshed here about those names where _const was a bit of a misnomer since the arguments aren't required to be constant. Other prefixes like _make or _new seem fine to me!

I don't think, though, that this prevents us from having dedicated types for each lane width because functions go into the value namespace and structs go into the type namespace.

@CryZe
Copy link
Contributor

CryZe commented Apr 29, 2021

Seems like it prevents the type from being a tuple struct though: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=40b1e2abd2ade54dd598db826ed594bf

But as long as using a non-tuple struct for those types doesn't cause any problems such as repr(simd) or so not working anymore, then I guess it's fine.

@joshtriplett joshtriplett added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label May 5, 2021
@joshtriplett
Copy link
Member

I think it's appropriate to start checking for consensus here, and we can use concerns to track blockers.

@rfcbot merge

stdarch needs an update to a version matching the proposal:
@rfcbot concern update-stdarch

I'd love to see #[target_feature] made safe on wasm targets, if that's feasible and not excessively difficult, to avoid having to make a transition in the future:
@rfcbot concern wasm-safe-target-feature

@rfcbot
Copy link

rfcbot commented May 5, 2021

Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels May 5, 2021
@BurntSushi
Copy link
Member

@alexcrichton Amazing work! I think this all pretty much LGTM but I do have a few questions:

Firstly, it looks like the target feature name is just simd128, but that seems... a bit generic? I guess I would expect it to include wasm in it or something more specific than just simd128.

Secondly, with respect to safe target feature, I see this in the docs today:

Unlike x86_64, however, WebAssembly does not currently have dynamic detection at runtime as to whether SIMD is supported (this is one of the motivators for the conditional sections proposal, but that is still pretty early days).

If it does get dynamic detection, does that change the discussion around a safe target feature?

Thirdly and lastly, have we considered using a naming scheme that matches the WASM spec as closely as possible? e.g., We invent names where necessary, but otherwise do a mechanical translation of things (e.g., . -> _). I only suggest this because I've always like this property of our std::arch::x86_64 module in that you just need to take the pre-existing name, search it and ge the result you want.

@alexcrichton
Copy link
Member Author

Good questions and thanks for reading over this!

The name simd128 is currently chosen because it matches LLVM's feature name for this. Unlike CPU features for x86 WebAssembly primarily has proposal names to go by. The proposal itself is just called "simd", though, which wouldn't really help much here. I think though that it's already somewhat implied that target_feature feature names are not scoped to their own targets, e.g. sse2 doesn't really imply x86_64 except by historical association. That being said I don't think there's an issue renaming it to wasm-simd128 or something like that, but I figured that it wouldn't be necessary due to the lack of scoping for other platforms.

Currently WebAssembly doesn't have any form of dynamic detectino for supported features, and even on the horizon I'm not sure there's really anything viable for adding this in a form that looks like x86. The closest equivalent for WebAssembly is conditional sections but afaik that proposal is sort of dead in the water right now and doesn't have a future. There are possible alternatives of a roughly similar shape, but it's all about selection at compile time instead of runtime. Basically I think it's tough to answer precisely what would happen if wasm gets dynamic detection because it's unclear to me what it means to get dynamic detection. My guess, though, is that nothing will ever be unsafe. The whole point of #[target_feature] and unsafe is that we don't know what happens if a CPU executes code it doesn't understand, but the whole point of WebAssembly is this never happens. For this reason I think it's safe to say that for WebAssembly specifically the unsafe-ty of #[target_feature] is never necessary. (even in to the future)

For naming, that is indeed an alternative! I actually originally implemented that. I ended up coming around to Rust-specific names for a few reasons though:

  • If we match instructions exactly, then i8x16.shuffle is an interesting case. Only providing that is pretty unergonomic when we could easily provide i64x2.shuffle as well (and other variants). Basically some instructions are intended to have multiple kinds of high-level operations map down to them, so I think it makes sense to provide all those high level operations. This means that we would already buy into mulitple-functions-per-instruction. Naming here isn't too too hard, though, we can still roughly match the instruction.
  • Next up there was i8x16.extrace_lane_u. WebAssembly doesn't have a u8 type so the _u and _s instructions are sign or zero-extending the 8-bit lane into a 32-bit integer (as a WebAssembly i32). Binding these instructions could still be done precisely by their instruction name itself, but if we have _u and _s for 8/16-bit integers it's sort of odd that we don't also have it for u32/u64. It makes sense from an instruction set level, but not necessarily from a Rust language level.
  • Overall, the WebAssembly instructions are designed (rightfully so) to be consistent with themselves and the rest of the WebAssembly specification, not high-level languages. This means that instruction names are specifically chosen for their consistency with other instructions, not for the ergonomics or understand of just reading the instruction. For example i8x16.shuffle has nothing really to do with i8x16 as a type, it works with all other lane types. Instructions like i16x8.extadd_pairwise_i8x16_u are somewhat ambiguous as to where the _u applies (whereas in Rust i16x8_extadd_pairwise_u8x16 is more clear). Another good example of this is Names of load instructions WebAssembly/simd#297 where instructions were renamed to be more consistent with themselves, and from my reading that was mostly the only motivation involved in doing so).

I originally thought 1:1 with instructions would be the way to go. After using it a bit, though, and getting other feedback, I've personally come to the conclusion that the best option is to design the names specifically for Rust. Given the size of the instruction set I think this is feasible (unlike x86) and given the downsides of matching wasm exactly I think we have a lot to gain in terms of readability of code itself. What I don't necessarily have a great gauge on is how folks will typically write SIMD-accelerated wasm code. If they start from desired instructions and work backwards I think renaming things could impede this. If they start from the wasm32 module and figure out what's available I think renaming things will help since conveniences will be provided and names are sometimes more understandable.

That's at least my thinking on the names so far. I don't really mind implementing either way, and I don't personally lean super strongly either way. I'm just currently leaning more towards custom names from Rust. (it's worth pointing out the custom Rust names more closely match what C is doing currently in clang as well, which is to diverge from the spec and use nicer prefixes and have conveniences)

@BurntSushi
Copy link
Member

RE simd128: It's a good point about sse2, it isn't namespaced. I think the thing there is that sse2 is kinda weird in and of itself, where simd is just kind of a generic thing. I would personally like wasm-simd128 better, even though I know it's a little longer to type.

RE dynamic detection: I think if you're certain that the intrinsics won't become unsafe to call some day, then I think that assuages the root of my concern there.

RE naming:

and I don't personally lean super strongly either way.

I think I'm with you there, and personally, absent other data, I'm inclined to defer to your experience actually using the routines. I think we're in agreement about the trade offs here, and in particular, that having precisely matching names is a nice property. But perhaps the only reason I appreciate it so much for x86 is indeed because of its size, which doesn't apply here. And if the C folks are devising their own names too, then I guess we might as well too.

One nice thing to do (not a stabilization blocker of course) would be to add the WASM spec name instruction to the docs of each intrinsic and add a doc alias for them. I think that might actually get us the best of both worlds. Then folks could type in the WASM spec name and get back the Rust equivalent.

@workingjubilee
Copy link
Member

workingjubilee commented May 6, 2021

re: wasm-simd128
or wasimd128, webassembly-fixed-width-simd, wasm-v128...
I agree that simd128 seems like it invites a name collision without qualification. Not today, but perhaps in a decade. I would anticipate, since all compiler invocations are implicitly scoped by target, that the concern of simd128 and it having a possible name collision would be on the diagnostic side and reading code, or mixed invocations of a wasm + native project and possibly accidentally setting a value for one that was not intended to be set for the other. So not so much a strict technical issue, only clarity, and only due to an abundance of caution.

re: syntax, name-matching, etc:
Importantly, Rust's wasm intrinsics will be "first to field" for many users. Intrinsic functions offered by Intel or Arm exist to thinly wrap the more rough-hewn APIs of raw assembly, and they have an active and expansive userbase who would expect them to be named the same in other languages if they're supported at all. But here, we are in fact wrapping a "raw assembly" (kinda) API in new functions, and so I believe Rust is entitled to deviate where it likes, because Rust is less in the role of a compiler implementing Intel or Arm's intrinsics word for word, but rather is "Intel" or "Arm".

re: "Is this ready?"
Hm, it's more a compiler-level concern technically, but is this blocking? #80108

@CryZe
Copy link
Contributor

CryZe commented May 26, 2021

Should the bitmask instructions possibly return u8 / u16 / u32 based on the lane count? At the moment they all return i32 which is of course the type on "machine level", but since the intrinsics all take appropriate Rust integer types, it seems like the bitmask instructions don't properly reflect that. I noticed this because actually casting to u16 on a i8x16_bitmask emits an unnecessary mask & 0xFFFF.

Oh, also I've started implementing a bunch of WASM SIMD for various crates tiny-skia, hashbrown, memchr, ... and I've noticed that by default our naming convention uses iNxM for all the intrinsics where the sign doesn't matter. This has been really confusing. So for the bitmask for example you have to use i8x16_bitmask as u8x16_bitmask doesn't exist, even though the sign doesn't actually matter. I feel like there should either also be a u8x16_bitmask that is just an alias (maybe at least a docs alias) or for instructions where there's no real notion of a sign, we should prefer the unsigned naming. (Actually maybe this example isn't great as the bitmask intrinsic cares about the "high bit" of each lane, which you could interpret as the sign bit depending on what you are doing)

@alexcrichton
Copy link
Member Author

Returning a smaller type makes sense to me, I can work on implementing that. You're thinking of u16 for i8x16_bitmask and u8 for all other types presumably, right?

For the naming this was indeed one aspect I was worried about. I was worried that there was no u32x4_add since it was the same as i32x4_add functionally. Personally I would be ok adding u32x4_add as a pub use alias of i32x4_add to fix this, and doing that for all types where the signed/unsigned interpretation is the same. I'd be curious to hear the feedback of others though! In any case this is easy enough to implement backwards compatibly too.

Also thanks for implementing these optimizations! Do you have some PRs to show as examples of ergonomics and such?

alexcrichton added a commit to alexcrichton/rust that referenced this issue May 28, 2021
This commit updates the compiler's handling of the `#[target_feature]`
attribute when applied to functions on WebAssembly-based targets. The
compiler in general requires that any functions with `#[target_feature]`
are marked as `unsafe` as well, but this commit relaxes the restriction
for WebAssembly targets where the attribute can be applied to safe
functions as well.

The reason this is done is that the motivation for this feature of the
compiler is not applicable for WebAssembly targets. In general the
`#[target_feature]` attribute is used to enhance target CPU features
enabled beyond the basic level for the rest of the compilation. If done
improperly this means that your program could execute an instruction
that the CPU you happen to be running on does not understand. This is
considered undefined behavior where it is unknown what will happen (e.g.
it's not a deterministic `SIGILL`).

For WebAssembly, however, the target is different. It is not possible
for a running WebAssembly program to execute an instruction that the
engine does not understand. If this were the case then the program would
not have validated in the first place and would not run at all. Even if
this were allowed in some hypothetical future where engines have some
form of runtime feature detection (which they do not right now) any
implementation of such a feature would generate a trap if a module
attempts to execute an instruction the module does not understand. This
deterministic trap behavior would still not fall into the category of
undefined behavior because the trap is deterministic.

For these reasons the `#[target_feature]` attribute is now allowed on
safe functions, but only for WebAssembly targets. This notably enables
the wasm-SIMD intrinsics proposed for stabilization in rust-lang#74372 to be
marked as safe generally instead of today where they're all `unsafe` due
to the historical implementation of `#[target_feature]` in the compiler.
@rfcbot rfcbot added the finished-final-comment-period The final comment period is finished for this PR / Issue. label May 30, 2021
@rfcbot
Copy link

rfcbot commented May 30, 2021

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

@rfcbot rfcbot added to-announce Announce this issue on triage meeting and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels May 30, 2021
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Jun 3, 2021
…sm, r=joshtriplett

rustc: Allow safe #[target_feature] on wasm

This commit updates the compiler's handling of the `#[target_feature]`
attribute when applied to functions on WebAssembly-based targets. The
compiler in general requires that any functions with `#[target_feature]`
are marked as `unsafe` as well, but this commit relaxes the restriction
for WebAssembly targets where the attribute can be applied to safe
functions as well.

The reason this is done is that the motivation for this feature of the
compiler is not applicable for WebAssembly targets. In general the
`#[target_feature]` attribute is used to enhance target CPU features
enabled beyond the basic level for the rest of the compilation. If done
improperly this means that your program could execute an instruction
that the CPU you happen to be running on does not understand. This is
considered undefined behavior where it is unknown what will happen (e.g.
it's not a deterministic `SIGILL`).

For WebAssembly, however, the target is different. It is not possible
for a running WebAssembly program to execute an instruction that the
engine does not understand. If this were the case then the program would
not have validated in the first place and would not run at all. Even if
this were allowed in some hypothetical future where engines have some
form of runtime feature detection (which they do not right now) any
implementation of such a feature would generate a trap if a module
attempts to execute an instruction the module does not understand. This
deterministic trap behavior would still not fall into the category of
undefined behavior because the trap is deterministic.

For these reasons the `#[target_feature]` attribute is now allowed on
safe functions, but only for WebAssembly targets. This notably enables
the wasm-SIMD intrinsics proposed for stabilization in rust-lang#74372 to be
marked as safe generally instead of today where they're all `unsafe` due
to the historical implementation of `#[target_feature]` in the compiler.
bors added a commit to rust-lang-ci/rust that referenced this issue Jun 3, 2021
…, r=joshtriplett

rustc: Allow safe #[target_feature] on wasm

This commit updates the compiler's handling of the `#[target_feature]`
attribute when applied to functions on WebAssembly-based targets. The
compiler in general requires that any functions with `#[target_feature]`
are marked as `unsafe` as well, but this commit relaxes the restriction
for WebAssembly targets where the attribute can be applied to safe
functions as well.

The reason this is done is that the motivation for this feature of the
compiler is not applicable for WebAssembly targets. In general the
`#[target_feature]` attribute is used to enhance target CPU features
enabled beyond the basic level for the rest of the compilation. If done
improperly this means that your program could execute an instruction
that the CPU you happen to be running on does not understand. This is
considered undefined behavior where it is unknown what will happen (e.g.
it's not a deterministic `SIGILL`).

For WebAssembly, however, the target is different. It is not possible
for a running WebAssembly program to execute an instruction that the
engine does not understand. If this were the case then the program would
not have validated in the first place and would not run at all. Even if
this were allowed in some hypothetical future where engines have some
form of runtime feature detection (which they do not right now) any
implementation of such a feature would generate a trap if a module
attempts to execute an instruction the module does not understand. This
deterministic trap behavior would still not fall into the category of
undefined behavior because the trap is deterministic.

For these reasons the `#[target_feature]` attribute is now allowed on
safe functions, but only for WebAssembly targets. This notably enables
the wasm-SIMD intrinsics proposed for stabilization in rust-lang#74372 to be
marked as safe generally instead of today where they're all `unsafe` due
to the historical implementation of `#[target_feature]` in the compiler.
alexcrichton added a commit to alexcrichton/stdarch that referenced this issue Jun 10, 2021
This is a follow-up from rust-lang/rust#74372 which has finished FCP for
the stabilization of wasm intrinsics. This marks them all stable, as-is
and additionally marks the functions which create integer vectors as
`const`-stable as well. The only remaining unstable bits are that
`f32x4` and `f64x2` are `const`-unstable. Mostly just because I couldn't
figure out how to make them `const`-stable.
alexcrichton added a commit to alexcrichton/stdarch that referenced this issue Jun 10, 2021
This is a follow-up from rust-lang/rust#74372 which has finished FCP for
the stabilization of wasm intrinsics. This marks them all stable, as-is
and additionally marks the functions which create integer vectors as
`const`-stable as well. The only remaining unstable bits are that
`f32x4` and `f64x2` are `const`-unstable. Mostly just because I couldn't
figure out how to make them `const`-stable.
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Jun 10, 2021
@alexcrichton
Copy link
Member Author

I've posted the final PR for stabilization to #86204

bors added a commit to rust-lang-ci/rust that referenced this issue Jun 11, 2021
std: Stabilize wasm simd intrinsics

This commit performs two changes to stabilize Rust support for
WebAssembly simd intrinsics:

* The stdarch submodule is updated to pull in rust-lang/stdarch#1179.
* The `wasm_target_feature` feature gate requirement for the `simd128`
  feature has been removed, stabilizing the name `simd128`.

This should conclude the FCP started on rust-lang#74372 and...

Closes rust-lang#74372
@bors bors closed this as completed in e05bb26 Jun 11, 2021
@newpavlov
Copy link
Contributor

What will happen if code dependent on SIMD intrinsics will be compiled without the necessary target feature being enabled? Will compiler generate respective SIMD instructions no questions asked (well, it will not inline them, but it's not important for the question)? Shouldn't it generate a compilation error? Otherwise generated WASM may silently become unusable in runtimes without SIMD support (imagine an incorrect change somewhere deep in a project's dependency tree).

@alexcrichton
Copy link
Member Author

As the documentation indicates:

This means to generate a binary without SIMD you’ll need to avoid both options above plus calling into any intrinsics in this module.

where the "both options" are using -Ctarget-feature=+simd128 or #[target_feature(enable = "simd128")].

The compiler will always generate simd instructions if you call intrinsics in std::arch::wasm32, regardless of your compilation settings for the rest of the codegen unit. This is the same as for all other platform intrinsics, for example, in the x86 and x86_64 modules.

@newpavlov
Copy link
Contributor

newpavlov commented Jul 29, 2021

This is the same as for all other platform intrinsics, for example, in the x86 and x86_64 modules.

Yes, but on other platforms those intrinsics are unsafe, so there is a higher barrier for making mistake. Also having a respective instruction in a generated binary does not cause CPU to reject that binary, i.e. on x86 it's fine to have unchecked SIMD instructions in dead branches. In a certain sense using WASM SIMD intrinsic effectively changes target. So i wonder if the current handling of them is too lax.

Also #[target_feature(enable = "simd128")] is useless in my understanding. Since there is no runtime detection (and by your words there probably will not be one), WASM target features currently should be enabled for a whole project. I don't see a use case for enabling it only for selected functions.

@est31
Copy link
Member

est31 commented Jul 29, 2021

Yes, but on other platforms those intrinsics are unsafe, so there is a higher barrier for making mistake.

Isn't this a violation of RFC 2045? It seems to me that RFC 2396's requirement to make calling functions with target features on functions that don't have target features unsafe has not been implemented?

Since there is no runtime detection (and by your words there probably will not be one)

@newpavlov the docs you linked mention runtime detection proposals.

@newpavlov
Copy link
Contributor

newpavlov commented Jul 29, 2021

the docs you linked mention runtime detection proposals.

Yes, I know and this is why I asked this question on reddit. But it looks like that @alexcrichton has acted under assumption that dynamic detection will not be added to WASM:

Currently WebAssembly doesn't have any form of dynamic detectino for supported features, and even on the horizon I'm not sure there's really anything viable for adding this in a form that looks like x86. The closest equivalent for WebAssembly is conditional sections but afaik that proposal is sort of dead in the water right now and doesn't have a future. There are possible alternatives of a roughly similar shape, but it's all about selection at compile time instead of runtime. Basically I think it's tough to answer precisely what would happen if wasm gets dynamic detection because it's unclear to me what it means to get dynamic detection.

BTW I want to discuss this point from the same comment a bit:

The whole point of #[target_feature] and unsafe is that we don't know what happens if a CPU executes code it doesn't understand, but the whole point of WebAssembly is this never happens.

In my understanding, unsafety of target_feature not caused only by concerns of CPU behavior (after all, outside of some very edge scenarios you usually will get a simple invalid opcode exception). It's probably not even the top reason, see this @gnzlbg's comment referenced in the target_feature RFC. Are we sure that those codegen concerns are not applicable to WASM?

@alexcrichton
Copy link
Member Author

There's some more discussion on #84988 for reference, but my thoughts on this topic are:

  • Wasm engines, forever and all of time, will reject modules they do not understand. This means if you have a simd instruction in their an the wasm engine doesn't understand it, then a correct engine will never run any code and simply reject the module.
  • If dynamic feature detection is added then it will be added in such a way that the same modules can be executed differently on two engines (one for example with simd and without). If you have correctly configured your module and dynamic detection then the module that doesn't support simd will never see the simd code. This means it's never executed. If you incorrectly configured the dynamic feature detection then the engine that doesn't support simd will see simd instructions it does not understand. From the first point this means that the runtime will reject the module.
  • Some concerns on the original target_feature RFC were indeed connected to "assemblers and things can assume weird things". AFAIK those are generally theoretical concerns but still ones we take seriously. Again WebAssembly is different because the concern worried about here is that instructions for the wrong architecture leak into other parts of the program (e.g. an assembler realizes that you're using avx2 things so it assumes it can just keep using those, even though it's on a different path that's not supposed to execute when avx2 is not detected or something like that). As always wasm engines will either run the module or reject it entirely.

Basically there's gotchas and subtelties about generating modules in WebAssembly that do use simd, don't use simd, or try to dynamically detect simd (assuming some sort of future proposal), but these are all build time concerns. Whatever happens the final module will have some defined semantics based on the instructions its using (which are presumably all stable in the upstream wasm spec with clearly-defined semantics). This means that you'll either run the module on an engine understanding all instructions, in which case everything will behave exactly as expected, or you won't, in which case nothing will execute at all.

Overall there is no situation where a unknown wasm instruction is executed. There's situations your build isn't what you expect, but that's not UB that's a configuration issue.

@newpavlov
Copy link
Contributor

newpavlov commented Jul 29, 2021

Wasm engines, forever and all of time, will reject modules they do not understand.

IIUC with the features.supported proposal engines will accept feature blocks which they do not understand, but instead of parsing it, they will replace it as a whole with the unreachable instruction. It's effectively analogue of the ud2 instruction from x86. The question is now: is it allowed for safe Rust code to trigger architecture's "unreachable" instructions? AFAIK in the case of x86 the answer is no. Do we allow it for WASM?

Also I wonder how in the presence of runtime feature detection compiler will handle functions which use SIMD intrinsics, but do not provide a non-SIMD fallback. Will it simply wrap every intrinsic with a features.supported check and use unreachable in fallback branches?

There's situations your build isn't what you expect, but that's not UB that's a configuration issue.

Yes, it's not UB which would cause memory corruption, but it's still a real pitfall. Is it possible to make use of simd128 SIMD intrinsics without globally enabled target feature a compilation error? It would completely solve my concerns (at the very least until some kind dynamic detection will be added, but I really hope that we wil lget something similar to target restriction contexts before that). Also I think it would significantly reduce probability of unpleasant surprises caused by such misconfiguration, especially considering that its source may be deep in a dependency tree.

@joshtriplett
Copy link
Member

joshtriplett commented Jul 30, 2021 via email

@est31
Copy link
Member

est31 commented Jul 30, 2021

Yeah one of the reasons to make target_feature unsafe was because some instruction sets might override the meaning of instructions or do other UB when they encounter instructions they don't support. On wasm, there is a safe guarantee that it'll trap so it's well defined.

coastalwhite pushed a commit to coastalwhite/rust that referenced this issue Aug 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-SIMD Area: SIMD (Single Instruction Multiple Data) C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. O-wasm Target: WASM (WebAssembly), http://webassembly.org/ T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.