You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This two-iteration loop gets vectorized by opt -loop-vectorize -mcpu=znver2 (https://llvm.godbolt.org/z/M8qTdTbfE), because we assign cost 1 to scalar ctpop and cost 3 to the vector ctpop, so it's nominally "profitable". At least for low iteration count, this is not actually the case.
The text was updated successfully, but these errors were encountered:
yes - I've wondered about whether we could include the VF in the cost functions to return the cost of VF * op? But we'd still need to override znver costs even then as most Intel arch only has ctpop on Pipe1 (so I'd expect the vectorization to be a gain on most Intel targets).
We're not ready to use scheduler model specific costs in most cases yet - I began researching this back on D46276 and hit so many concerns (often due to the quality of the scheduler models) that I ended up creating the fuzz script for D103695 instead, which has ended up being a massive triage investigation into trying to get cost tables and scheduler models to sort of agree.
If its really critical, we could add a hack such as 'TuningFastPOPCNT' but I really don't like that idea.
Example from rust-lang/rust#101060:
This two-iteration loop gets vectorized by
opt -loop-vectorize -mcpu=znver2
(https://llvm.godbolt.org/z/M8qTdTbfE), because we assign cost 1 to scalar ctpop and cost 3 to the vector ctpop, so it's nominally "profitable". At least for low iteration count, this is not actually the case.The text was updated successfully, but these errors were encountered: