Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[RISCV] Don't cost vector arithmetic fp ops as cheaper than scalar (l…
…lvm#99594) I was comparing some SPEC CPU 2017 benchmarks across rva22u64 and rva22u64_v, and noticed that in a few cases that rva22u64_v was considerably slower. One of them was 519.lbm_r, which has a large loop that was being unprofitably vectorized. It has an if/else in the loop which requires large amounts of predication when vectorized, but despite the loop vectorizer taking this into account the vector cost came out as cheaper than the scalar. It looks like the reason for this is because we cost scalar floating point ops as 2, but their vector equivalents as 1 (for LMUL 1). This comes from how we use BasicTTIImpl for scalars which treats floats as twice as expensive as integers. This patch doubles the cost of vector floating point arithmetic ops so that they're at least as expensive as their scalar counterparts, which gives a 13% speedup on 519.lbm_r at -O3 on the spacemit-x60. Fixes llvm#62576 (the last point there about scalar fsub/fmul)
- Loading branch information