Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bit operations on float #388

Open
TechPizzaDev opened this issue Feb 9, 2024 · 8 comments
Open

Bit operations on float #388

TechPizzaDev opened this issue Feb 9, 2024 · 8 comments
Labels
C-feature-request Category: a feature request, i.e. not implemented / a PR

Comments

@TechPizzaDev
Copy link

TechPizzaDev commented Feb 9, 2024

Is there reason behind float SIMD not implementing bitwise operations?
I could not find much on this topic other than this comment.

This is severely annoying when it comes to porting algorithms that rely on bit manipulation of float values, like XOR for signs, or AND for masking off bits.

Is it because Simd<f32, N> is modelling the f32 primitive? Even then, I feel like this should be reconsidered. The workaround is messy (from_bits(lhs.to_bits() op rhs.to_bits())) and writing generic code that uses bitwise operators is not feasible since you can't implement operators for external crates.

@TechPizzaDev TechPizzaDev added the C-feature-request Category: a feature request, i.e. not implemented / a PR label Feb 9, 2024
@TechPizzaDev
Copy link
Author

Found another comment mentioning bit-ops for utility math: #109 (comment)

@calebzulawski
Copy link
Member

In my opinion, Simd should map to the primitive and therefore use to_bits and from_bits explicitly, but open to discussion as to why Simd might be special in that regard.

It's certainly possible to write a newtype that implements the various bitops as well as Deref<Target = Simd> but I do understand that could be a bit inconvenient. I'm working on a std::simd extension crate and will at least consider something like that for inclusion (is there an equivalent implementation for scalar f32? I've never looked)

@TechPizzaDev
Copy link
Author

It's certainly fair to model Simd after the underlying primitive, but other languages/abstractions usually follow what is broadly supported on hardware. It's just that SIMD is quite a different paradigm from scalar instructions, and there are completely different expectations around branching (or lack thereof), which changes how algorithms deal with floats.

Let's begin with the raw case of C/C++ where we have intrin.h/immintrin.h. These are all x86-specific compiler instrinsics and the STL has no abstractions. The best you get is third-party libraries or you build your own.
Presence of bitwise operations for floats depends on chosen API/lib.

A good example that has best of both worlds is C# (yes, I'm biased), which has good cross-platform intrinsics in the form of Vector128<T>/Vector256<T>/etc analogous to Simd<T, N>. If the cross-plat types are missing a certain instruction, you can easily pull in arch-specific instructions like PackedSimd for WASM, or AVX512 for x86 as of NET8.
Vector128<T> has bitwise operations for floats.

There is also a counter example from Zig, where @Vector(T, N) is the closer analog to Simd<T, N>. The @Vector intrinsic acts as a group of primitives and supports the same built-in operations as the primitive type T. Zig does not currently expose the specific intrinsics of each arch.
@Vector(f32, N) lacks bitwise operations for floats.

Cross-platform intrinsics of C# and Zig compile down to slow scalar fallbacks in the semi-rare case of a missing instructions.

AVX512 also introduces plenty of pretty niche float instructions that lack scalar equivalents, which could make any API on Simd<T, N> in the future that tries to model after AVX512 confusing because a scalar f32 would not have those or require a slow fallback.

To finish off I would like to mention WebAssembly which should be familiar to Rust users. In WASM the SIMD intrinsics act on the general-purpose non-generic v128 type, which allows bitwise operations for floats since you pick the instruction you want to perform on the register.
v128 has bitwise operations for floats.

As mentioned in previous comments, bitwise operations for floats on Simd<T, N> are possible, but the ergonomics of casting back and forth between the bits type is frustrating and encourages writing bitwise helpers more often than not (which cannot overload operators because Simd<T, N> is in another crate).

@tannergooding
Copy link

I'm the owner of the SIMD types in .NET, so I'm also a bit biased here ;)

We expose these bitwise operations on our Vector64/128/256/512<T> and Vector<T> types for many of the reasons that TechPizza mentions.

Not only are these fundamental operations that have explicit instructions for both integers and floating-point vectors, but when considering SIMD or otherwise generic code they are core operations required across a wide range of scenarios.

For example, with SIMD you typically want to avoid branching as much as possible (especially because any singular scalar branch is Count branches for a vector). So, you end up with APIs like ConditionalSelect which may map down to things like (x & mask) | (y & ~mask) where mask is the result of something like CompareEqual. ConditionalSelect may map down to a singular instruction like bsel (Arm64) or blendv (Xarch) and those instructions may have specific variants for floating-point (blendvps for flaot vs pblendvd for int/uint). On some hardware, which instruction you use can actually impact what port is used and whether there is a delay introduced for crossing the fp/integer SIMD domain (newer hardware is better about having no delay, but it can still exist).

Given this is perf critical code and you're often wanting T-in, T-out, it then becomes sensible for these operations to directly exist for float and double as it makes user code more readable, less error prone, and puts less strain on the compiler to undo the inserted bitcasts, and makes it more closely match the actual SIMD specs and instructions used.

@tannergooding
Copy link

tannergooding commented Feb 23, 2024

Notably, this does make it different from direct scalar code (given float x, y; you can't do x & y, but you can for Vector128<float> x, y;).

However, for our "generic math" feature we do actually allow this for scalar floats and so given where T : IBinaryNumber<T> you have access to the standard bitwise operators (&, |, ^, ~). Thus for T x, y you can do x & y and this includes things like primitive integers, Half, float, and double.

This was again done for convenience and efficiency writing generic code. As an example, you frequently have the need to do bitwise manipulations or conditional checks, but may not know if doing an actual bitwise conversion to an associated integer type is efficient or you may even be in a language without associated types. Thus, by allowing generic code access to the operators you make it friendlier to end users and make it simpler for users to write functioning code.

@calebzulawski
Copy link
Member

std::simd does in fact support operations that might commonly be done with bitwise operations--you may use select, abs, is_infinite, and so on. These common operations are supported in what I would consider to be a better way than bitwise operations. This leaves only arbitrary bitwise operations, which I do still think is better modeled on integers.

@programmerjake
Copy link
Member

To finish off I would like to mention WebAssembly which should be familiar to Rust users. In WASM the SIMD intrinsics act on the general-purpose non-generic v128 type, which allows bitwise operations for floats since you pick the instruction you want to perform on the register. v128 has bitwise operations for floats.

I would argue that WebAssembly has bitwise operations for integers, you just happen to also be able to use them on floats since they share the same v128 type, so int<->float bitcasts disappear.

In current Rust Portable SIMD, int<->float bitcasts usually also generate zero instructions, since the underlying SSE/AVX/AVX512/NEON/etc. registers are also identical, so no inter-register bitcasting move is needed.

@programmerjake
Copy link
Member

LLVM IR is the same way, it has no bitwise operations on vectors of floats, you have to bitcast to vectors of integers. Likewise, LLVM IR has separate better operations for those cases where bitwise operations are commonly used on floats: select, abs, copysign, negation, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: a feature request, i.e. not implemented / a PR
Projects
None yet
Development

No branches or pull requests

4 participants