Implement fcvt_to_sint_sat (f32x4 -> i32x4) for x86 #1820

abrown · 2020-06-04T17:11:01Z

This is a follow-on to #1765 and should be merged after that PR.

The most notable change here is the addition of the ISA-specific flags assert_no_nans and assert_in_bounds which are disabled by default; when enabled they allow Cranelift to reduce the cost (in number of instructions) of lowering this and other SIMD instructions.

This instruction converts i32x4 to f32x4 in several AVX512 feature sets.

This instruction is necessary for lowering `fcvt_from_uint`.

This converts an `i32x4` into an `f32x4` with some rounding either by using an AVX512VL/F instruction--VCVTUDQ2PS--or a long sequence of SSE4.1 compatible instructions.

…t_from_uint

The NaN semantics of the Wasm SIMD spec do not closely align to x86's NaN semantics, resulting in generated code with extra instructions, e.g. to quiet NaNs. This flag allows users to assert that floating-point operations will not produce NaNs (for SIMD primarily, but this could be used eventually in a scalar context) so that Cranelift can emit fewer instructions--hopefully faster code.

This reuses the `x86_cvtt2si` instruction since the packed and scalar versions seem to group together well.

With user assertions, use CVTTPS2DQ directly; otherwise, use a lengthy sequence to quiet NaNs and saturate overflow.

…cvt_to_sint_sat

github-actions · 2020-06-04T18:32:59Z

Subscribe to Label Action

cc @bnjbvr

This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:meta", "cranelift:wasm"

Thus the following users have been cc'd because of the following labels:

bnjbvr: cranelift

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

bnjbvr

I haven't looked at all the commits, but just looking at the one introducing assert_no_nans, I think we need a broader discussion, to consider if it's a good idea to do so. That is, would we to introduce this flag, how applicable would it be? Right now, there are a few things that make me think that it's not doable to have a function-wide flag for this. User-controlled inputs could create NaN results flowing in into other operators assuming no NaNs, so the only way to correctly support the intent of this flag would be to forbid these other instructions which can create NaNs too. A direct example is a divide by zero, which will create NaNs, if I'm not mistaken? This can also include reinterpreting bits from one type to another, which kind of limits the scope of optimizations we can really do here.

My proposal:

split the implementation of these assert_no_nans codegen into a separate PR, so the rest of this work can move forward without being blocked on this
open an issue to discuss with others the goal, interest and implementation alternatives for assert_no_nans, cc Dan/Julian/Chris/me. (One thought is that it'd be more doable with a flag per instruction, or that a data-flow analysis might or might not be sufficient to figure out where NaNs can flow in, etc.)

Does it make sense?

abrown · 2020-06-09T22:45:59Z

@bnjbvr, agreed that this needs more discussion; I posted in Zulip a week ago about this: why don't we discuss there?

abrown · 2020-06-09T23:08:04Z

cranelift/codegen/src/isa/x86/enc_tables.rs

+                // allowed in that lane.
+                let ones_constant = pos.func.dfg.constants.insert(vec![0xff; 16].into());
+                let ones = pos.ins().vconst(F32X4, ones_constant);
+                let arg1 = pos.ins().band(arg, ones);


Can be removed per https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/SIMD/near/200302379

Looking at this more, I think the and is needed, but the operand of the and should be cmpeqps(arg, arg), not all ones.

abrown · 2020-06-12T22:23:48Z

Closed in favor of #1876.

abrown added 10 commits June 3, 2020 16:30

Add x86_vcvtudq2ps instruction

c5b3e06

This instruction converts i32x4 to f32x4 in several AVX512 feature sets.

Add x86_pblendw instruction

474663c

This instruction is necessary for lowering `fcvt_from_uint`.

Add AVX512F flag

b320161

Add x86 legalization for fcvt_from_uint.f32x4

b90e621

This converts an `i32x4` into an `f32x4` with some rounding either by using an AVX512VL/F instruction--VCVTUDQ2PS--or a long sequence of SSE4.1 compatible instructions.

Translate Wasm's f32x4.convert_i32x4_u instruction to Cranelift's fcv…

9f0de44

…t_from_uint

Add 'assert_in_bounds' flag to allow users to skip bounds checks

02d1b93

Add encoding for x86 CVTTPS2DQ

7ea2055

This reuses the `x86_cvtt2si` instruction since the packed and scalar versions seem to group together well.

Legalize fcvt_to_sint_sat.i32x4 on x86

8a20f12

With user assertions, use CVTTPS2DQ directly; otherwise, use a lengthy sequence to quiet NaNs and saturate overflow.

Translate Wasm's i32x4.trunc_sat_f32x4_s instruction to Cranelift's f…

bd68242

…cvt_to_sint_sat

abrown requested a review from bnjbvr June 4, 2020 17:11

This was referenced Jun 4, 2020

Legalize fmin/fmax with NaN quieting semantics #1821

Closed

Implement fcvt_to_uint_sat (f32x4 -> i32x4) for x86 #1822

Closed

abrown marked this pull request as ready for review June 4, 2020 17:23

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:meta Everything related to the meta-language. cranelift:wasm labels Jun 4, 2020

bnjbvr reviewed Jun 9, 2020

View reviewed changes

abrown commented Jun 9, 2020

View reviewed changes

abrown mentioned this pull request Jun 12, 2020

Implement fcvt_to_sint_sat (f32x4 -> i32x4) for x86 #1876

Merged

abrown closed this Jun 12, 2020

abrown deleted the f32x4-to-i32x4 branch May 17, 2021 18:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement fcvt_to_sint_sat (f32x4 -> i32x4) for x86 #1820

Implement fcvt_to_sint_sat (f32x4 -> i32x4) for x86 #1820

abrown commented Jun 4, 2020 •

edited

Loading

github-actions bot commented Jun 4, 2020

bnjbvr left a comment

abrown commented Jun 9, 2020

abrown Jun 9, 2020

sunfishcode Jun 9, 2020

abrown commented Jun 12, 2020

Implement fcvt_to_sint_sat (f32x4 -> i32x4) for x86 #1820

Implement fcvt_to_sint_sat (f32x4 -> i32x4) for x86 #1820

Conversation

abrown commented Jun 4, 2020 • edited Loading

github-actions bot commented Jun 4, 2020

Subscribe to Label Action

bnjbvr left a comment

Choose a reason for hiding this comment

abrown commented Jun 9, 2020

abrown Jun 9, 2020

Choose a reason for hiding this comment

sunfishcode Jun 9, 2020

Choose a reason for hiding this comment

abrown commented Jun 12, 2020

abrown commented Jun 4, 2020 •

edited

Loading