Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement fcvt_to_sint_sat (f32x4 -> i32x4) for x86 #1820

Closed
wants to merge 10 commits into from

Conversation

abrown
Copy link
Contributor

@abrown abrown commented Jun 4, 2020

This is a follow-on to #1765 and should be merged after that PR.

The most notable change here is the addition of the ISA-specific flags assert_no_nans and assert_in_bounds which are disabled by default; when enabled they allow Cranelift to reduce the cost (in number of instructions) of lowering this and other SIMD instructions.

abrown added 10 commits June 3, 2020 16:30
This instruction converts i32x4 to f32x4 in several AVX512 feature sets.
This instruction is necessary for lowering `fcvt_from_uint`.
This converts an `i32x4` into an `f32x4` with some rounding either by using an AVX512VL/F instruction--VCVTUDQ2PS--or a long sequence of SSE4.1 compatible instructions.
The NaN semantics of the Wasm SIMD spec do not closely align to x86's NaN semantics, resulting in generated code with extra instructions, e.g. to quiet NaNs. This flag allows users to assert that floating-point operations will not produce NaNs (for SIMD primarily, but this could be used eventually in a scalar context) so that Cranelift can emit fewer instructions--hopefully faster code.
This reuses the `x86_cvtt2si` instruction since the packed and scalar versions seem to group together well.
With user assertions, use CVTTPS2DQ directly; otherwise, use a lengthy sequence to quiet NaNs and saturate overflow.
@abrown abrown requested a review from bnjbvr June 4, 2020 17:11
@abrown abrown marked this pull request as ready for review June 4, 2020 17:23
@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:meta Everything related to the meta-language. cranelift:wasm labels Jun 4, 2020
@github-actions
Copy link

github-actions bot commented Jun 4, 2020

Subscribe to Label Action

cc @bnjbvr

This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:meta", "cranelift:wasm"

Thus the following users have been cc'd because of the following labels:

  • bnjbvr: cranelift

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Copy link
Member

@bnjbvr bnjbvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at all the commits, but just looking at the one introducing assert_no_nans, I think we need a broader discussion, to consider if it's a good idea to do so. That is, would we to introduce this flag, how applicable would it be? Right now, there are a few things that make me think that it's not doable to have a function-wide flag for this. User-controlled inputs could create NaN results flowing in into other operators assuming no NaNs, so the only way to correctly support the intent of this flag would be to forbid these other instructions which can create NaNs too. A direct example is a divide by zero, which will create NaNs, if I'm not mistaken? This can also include reinterpreting bits from one type to another, which kind of limits the scope of optimizations we can really do here.

My proposal:

  • split the implementation of these assert_no_nans codegen into a separate PR, so the rest of this work can move forward without being blocked on this
  • open an issue to discuss with others the goal, interest and implementation alternatives for assert_no_nans, cc Dan/Julian/Chris/me. (One thought is that it'd be more doable with a flag per instruction, or that a data-flow analysis might or might not be sufficient to figure out where NaNs can flow in, etc.)

Does it make sense?

@abrown
Copy link
Contributor Author

abrown commented Jun 9, 2020

@bnjbvr, agreed that this needs more discussion; I posted in Zulip a week ago about this: why don't we discuss there?

// allowed in that lane.
let ones_constant = pos.func.dfg.constants.insert(vec![0xff; 16].into());
let ones = pos.ins().vconst(F32X4, ones_constant);
let arg1 = pos.ins().band(arg, ones);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this more, I think the and is needed, but the operand of the and should be cmpeqps(arg, arg), not all ones.

@abrown
Copy link
Contributor Author

abrown commented Jun 12, 2020

Closed in favor of #1876.

@abrown abrown closed this Jun 12, 2020
@abrown abrown deleted the f32x4-to-i32x4 branch May 17, 2021 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:meta Everything related to the meta-language. cranelift:wasm cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants