Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
arm: Use utxb rN, rM, ror #8 to implement zero_extract on armv6.
Examining the code generated for the following C snippet on a raspberry pi: int popcount_lut8(unsigned *buf, int n) { int cnt=0; unsigned int i; do { i = *buf; cnt += lut[i&255]; cnt += lut[i>>8&255]; cnt += lut[i>>16&255]; cnt += lut[i>>24]; buf++; } while(--n); return cnt; } I was surprised to see following instruction sequence generated by the compiler: mov r5, r2, lsr #8 uxtb r5, r5 This sequence can be performed by a single ARM instruction: uxtb r5, r2, ror #8 The attached patch allows GCC's combine pass to take advantage of ARM's uxtb with rotate functionality to implement the above zero_extract, and likewise to use the sxtb with rotate to implement sign_extract. ARM's uxtb and sxtb can only be used with rotates of 0, 8, 16 and 24, and of these only the 8 and 16 are useful [ror #0 is a nop, and extends with ror #24 can be implemented using regular shifts], so the approach here is to add the six missing but useful instructions as 6 different define_insn in arm.md, rather than try to be clever with new predicates. Later ARM hardware has advanced bit field instructions, and earlier ARM cores didn't support extend-with-rotate, so this appears to only benefit armv6 era CPUs (e.g. the raspberry pi). Patch posted: https://gcc.gnu.org/legacy-ml/gcc-patches/2018-01/msg01339.html Approved by Kyrill Tkachov: https://gcc.gnu.org/legacy-ml/gcc-patches/2018-01/msg01881.html 2024-05-12 Roger Sayle <roger@nextmovesoftware.com> Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> * config/arm/arm.md (*arm_zeroextractsi2_8_8, *arm_signextractsi2_8_8, *arm_zeroextractsi2_8_16, *arm_signextractsi2_8_16, *arm_zeroextractsi2_16_8, *arm_signextractsi2_16_8): New. 2024-05-12 Roger Sayle <roger@nextmovesoftware.com> Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> * gcc.target/arm/extend-ror.c: New test.
- Loading branch information