-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement fcvt_to_sint_sat (f32x4 -> i32x4) for x86 #1820
Conversation
This instruction converts i32x4 to f32x4 in several AVX512 feature sets.
This instruction is necessary for lowering `fcvt_from_uint`.
This converts an `i32x4` into an `f32x4` with some rounding either by using an AVX512VL/F instruction--VCVTUDQ2PS--or a long sequence of SSE4.1 compatible instructions.
The NaN semantics of the Wasm SIMD spec do not closely align to x86's NaN semantics, resulting in generated code with extra instructions, e.g. to quiet NaNs. This flag allows users to assert that floating-point operations will not produce NaNs (for SIMD primarily, but this could be used eventually in a scalar context) so that Cranelift can emit fewer instructions--hopefully faster code.
This reuses the `x86_cvtt2si` instruction since the packed and scalar versions seem to group together well.
With user assertions, use CVTTPS2DQ directly; otherwise, use a lengthy sequence to quiet NaNs and saturate overflow.
Subscribe to Label Actioncc @bnjbvr
This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:meta", "cranelift:wasm"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at all the commits, but just looking at the one introducing assert_no_nans
, I think we need a broader discussion, to consider if it's a good idea to do so. That is, would we to introduce this flag, how applicable would it be? Right now, there are a few things that make me think that it's not doable to have a function-wide flag for this. User-controlled inputs could create NaN results flowing in into other operators assuming no NaNs, so the only way to correctly support the intent of this flag would be to forbid these other instructions which can create NaNs too. A direct example is a divide by zero, which will create NaNs, if I'm not mistaken? This can also include reinterpreting bits from one type to another, which kind of limits the scope of optimizations we can really do here.
My proposal:
- split the implementation of these assert_no_nans codegen into a separate PR, so the rest of this work can move forward without being blocked on this
- open an issue to discuss with others the goal, interest and implementation alternatives for assert_no_nans, cc Dan/Julian/Chris/me. (One thought is that it'd be more doable with a flag per instruction, or that a data-flow analysis might or might not be sufficient to figure out where NaNs can flow in, etc.)
Does it make sense?
// allowed in that lane. | ||
let ones_constant = pos.func.dfg.constants.insert(vec![0xff; 16].into()); | ||
let ones = pos.ins().vconst(F32X4, ones_constant); | ||
let arg1 = pos.ins().band(arg, ones); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this more, I think the and is needed, but the operand of the and should be cmpeqps(arg, arg), not all ones.
Closed in favor of #1876. |
This is a follow-on to #1765 and should be merged after that PR.
The most notable change here is the addition of the ISA-specific flags
assert_no_nans
andassert_in_bounds
which are disabled by default; when enabled they allow Cranelift to reduce the cost (in number of instructions) of lowering this and other SIMD instructions.