-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aligning std::simd and Rust on Arm v7 Neon float behavior #439
Comments
This seems like basically the same issue as rust-lang/rust#129880, but might be worth tracking in this repo as well I guess? I guess stdarch is also affected, but arguably there it is okay to expose the underlying hardware behavior... that is, assuming we don't get unsoundness due to llvm/llvm-project#89885. |
@RalfJung It has particular considerations for our API design yes. |
I don’t think it makes sense to expect vector operations to have defined subnormal behavior. There is too much hardware where perfect IEEE conformance is either impossible or requires software support code. Making flushing subnormals to zero permissible behavior is the only approach that allows for predictable runtime performance and predictable lowering to target-specific assembly. |
Unfortunately LLVM is unsound on hardware that flushes subnormals.
And completely unpredictable runtime behavior. Great. |
@DemiMarie easily done, all it needs is a small fix in LLVMIR and SelectionDAG: llvm/llvm-project#30633 |
Is it unpredictable because of reordering? I don't see what can be accomplished that doesn't make std::simd useless on armv7 or ppc other than allowing ftz |
It is unpredictable in the sense of giving different results on different targets, and (depending on what semantics LLVM implements once they properly support NEON on 32-bit ARM, which currently they do not) different optimization levels and different ways of writing the same code. |
Considering these are old targets I'm not expecting a huge push to fix the backends, but would simply disallowing certain optimizations be sufficient? We do note in the std::simd docs that ftz will happen on some targets. We could e.g. expose a cfg value if necessary. |
I mean we could try to disable the scalar evolution pass and hope that this suffices. But that's far from a robust solution, so it's not really aligned with Rust's values IMO. |
Anyway I think portable-simd has a lot of things to resolve before this becomes a pressing question. Right now, not even the core::arch operations are stable on ARM32. |
This can be worked around by implementing the relevant intrinsics using LLVM inline assembly instead. |
That would not achieve the "predictable runtime performance" part of your goals, as the optimizer would have to treat this like a black box. And behavior would still be unpredictable in the sense of differing across architectures. So IMO it would also be reasonable to say that portable-simd is simply not supported on 32bit ARM, and only provide core::arch primitives where people are hopefully aware of the semantic pitfalls. But anyway as I said, we're likely years away from this being a high-priority question. First all of the rest of the portable-simd API needs to be worked out... |
Is the optimizer actually able to usefully reason about SIMD intrinsics anyway? The optimizer can (IIUC) be informed that the operations don’t access memory and can be elided if their result is not needed. My understanding is that SIMD programmers typically use the compiler as a glorified register allocator and so don’t particularly care about other optimizations. Is this accurate? |
The |
I think it would be better to have SIMD that cannot be constant-folded than to not have SIMD at all. |
This is going to be a bit grisly: the Arm v7 Neon registers flush subnormals and Rust has defined floats as to deny flushing subnormals to be a valid behavior. If we want std::simd to align here with scalar ops, we will have to unfortunately kinda chuck the vector ops for non-integer operations.
Meta
rustc --version --verbose
:The text was updated successfully, but these errors were encountered: