You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Rust iterators over slices yield references to each item. Even when iterating over references to primitives and applying simple, pure functions LLVM only manages to unroll the loops but fails to vectorize them.
This is derived from an attempt to implement manual unrolling for iterator.all() - which is short-circuiting - to convince LLVM that vectorization would be profitable. Again the unrolling happens and the values are loaded unconditionally but they're not vectorized.
Curiously moving the unrolled short-circuiting comparison into a function - which still gets inlined - and bloating its return type with an additional dummy value causes it to vectorize.
From a quick look at the first case only, the problem is that we don't know that *x is (unconditionally) noundef. Vectorization works if you replace the logical and with a frozen bitwise and: https://llvm.godbolt.org/z/56dcW4z6s
Probably LoopVectorize should support logical and/or reductions by inserting a freeze operation (cc @fhahn).
the8472
changed the title
Lack of or poor autovectorization of loops over references into contiguous memory
Lack of autovectorization of loops over references into contiguous memory
Aug 25, 2024
Rust iterators over slices yield references to each item. Even when iterating over references to primitives and applying simple, pure functions LLVM only manages to unroll the loops but fails to vectorize them.
Case 1 - slice.iter().fold()
https://rust.godbolt.org/z/fn3W8PGE3
The second function from rust-lang/rust#113789
results in this loop body
This is unrolled and the values are loaded unconditionally, so it's not an issue of dereferencability or profitability of loading the data.
Expected behavior: Vectorizes with
vpcmpud
Case 2 - manually unrolled slice.iter().all()
https://rust.godbolt.org/z/dxEsqj3fx
This is derived from an attempt to implement manual unrolling for
iterator.all()
- which is short-circuiting - to convince LLVM that vectorization would be profitable. Again the unrolling happens and the values are loaded unconditionally but they're not vectorized.Expected behavior: Vectorizes with
vpcmpeqd
Curiously moving the unrolled short-circuiting comparison into a function - which still gets inlined - and bloating its return type with an additional dummy value causes it to vectorize.
Rust issue similar to this case: rust-lang/rust#105259
Edit: removed the 3rd case, I missed an important step to keep things in vector registers.
The text was updated successfully, but these errors were encountered: