-
Notifications
You must be signed in to change notification settings - Fork 292
Add vcvtq_u32_f32 and vcvtq_s32_f32 #902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
r? @Amanieu (rust_highfive has picked a reviewer for you, use r? to override) |
These intrinsics are implemented differently for aarch64 and arm in clang. i.e. aarch64 uses the llvm.aarch64.neon.fcvtzs.v4i32.v4f32 intrinsic. However, there didn't seem to be any advantage to using that intrinsic instead of just sharing code.
Can you add tests for out-of-range values? For example passing a negative value to the unsigned conversion should produce a result of 0. |
It looks like the out of range values will throw a floating point exception: https://developer.arm.com/docs/ddi0596/h/shared-pseudocode-functions/shared-functionsfloat-pseudocode#impl-shared.FPToFixed.5 |
That just sets an exception bit in the FP status register and returns a NaN. I'm more worried about LLVM optimizing it to |
It does seem like there's some badness going on with llvm converting to undef. This shows the difference between the aarch64 and ARM approaches: It feels like the ARM behaviour is not really what you'd when using this intrinsic. |
I checked the behavior with GCC and with the ACLE spec and it seems that the AArch64 behavior is the correct one. It seems that this is a bug in Clang. I don't think LLVM currently exposes an intrinsic for this on ARM, but you should definitely open a bug report for Clang on https://bugs.llvm.org/. In the meantime for this PR you can just keep Clang's behavior but with a clear comment explaining that the behavior isn't 100% correct. |
Can you report the Clang bug on bugs.llvm.org and add a link to the bug report in a comment? |
It looks like CI is broken because of a missing apple nightly |
It's being worked on, should be fixed for tomorrow's nightlies. |
CI is now fixed but there are errors. |
1472efa
to
08e4e5e
Compare
The ARM implementation uses fptoi that has undefined behaviour for out of range data. Clang has the same problem: https://llvm.org/PR47510
These intrinsics are implemented differently for aarch64 and arm
in clang. i.e. aarch64 uses the llvm.aarch64.neon.fcvtzs.v4i32.v4f32
intrinsic. However, there didn't seem to be any advantage to using
that intrinsic instead of just sharing code.