rounding shift rights should use rounding halving add #6494

abadams · 2021-12-11T15:43:02Z

On x86 currently we lower cast<uint8_t>((cast<uint16_t>(x) + 8) / 16)
to:

cast<uint8_t>(shift_right(widening_add(x, 8), 4))

This compiles to 8 instructions on x86: Widen each half of the input
vector, add 8 to each half-vector, shift each half-vector, then narrow
each half-vector.

First, this should have been a rounding_shift_right. Some patterns were
missing in FindIntrinsics.

Second, rounding_shift_right had suboptimal codegen in the case where
the second arg is a positive const. On archs without a rounding shift
right instruction you can further rewrite this to:

shift_right(rounding_halving_add(x, 7), 3)

which is just two instructions on x86.

On x86 currently we lower cast<uint8_t>((cast<uint16_t>(x) + 8) / 16) to: cast<uint8_t>(shift_right(widening_add(x, 8), 4)) This compiles to 8 instructions on x86: Widen each half of the input vector, add 8 to each half-vector, shift each half-vector, then narrow each half-vector. First, this should have been a rounding_shift_right. Some patterns were missing in FindIntrinsics. Second, rounding_shift_right had suboptimal codegen in the case where the second arg is a positive const. On archs without a rounding shift right instruction you can further rewrite this to: shift_right(rounding_halving_add(x, 7), 3) which is just two instructions on x86.

abadams · 2021-12-11T15:43:47Z

Dillon, I have a question for you in the IRMatch.h code. I'm a bit confused about the types of the second arg to shifts.

dsharletg · 2021-12-11T21:31:23Z

src/IRMatch.h

+ // Assuming the args have the same type as the intrinsic is incorrect in
+ // general. But for the intrinsics we can fold (just shifts), the LHS
+ // has the same type as the intrinsic, and we can always treat the RHS
+ // as a signed int, because we're using 64 bits for it. (TODO: Dillon,


Yes, I think this is OK. Constants will be simplified to unsigned shifts if needed.

dsharletg · 2021-12-11T21:31:58Z

test/correctness/simd_op_check.cpp

@@ -150,6 +150,13 @@ class SimdOpCheck : public SimdOpCheckTest {
 check("pavgb", 8 * w, u8((u16(u8_1) + u16(u8_2) + 1) >> 1));
 check("pavgw", 4 * w, u16((u32(u16_1) + u32(u16_2) + 1) / 2));
 check("pavgw", 4 * w, u16((u32(u16_1) + u32(u16_2) + 1) >> 1));
+
+ // Rounding right shifts should also use pavg
+ check("pavgb", 8 * w, u8((u16(u8_1) + 15) >> 4));


I think these added checks should be in test/correctness/intrinsics.cpp instead (much lighter weight test than simd_op_check and it is target independent).

It's a test that rounding_shift_right is both caught by pattern matching and also lowers correctly on x86, so I'll move half of it to intrinsics

Aren't the two tests below this the same test after accounting for the pattern matching part?

Hopefully the latest commit makes it clearer. The first two tests aren't just a rounding shift right. they strength-reduce to an average followed by a shift right. So it tests the new patterns.

The second test checks lower_rounding_shift_right does the right thing.

Oh right, I hadn't noticed the constant isn't a rounding_shift_right. Still, I think the change you just made is good (easier tests to debug if they fail).

dsharletg · 2021-12-11T21:32:38Z

src/FindIntrinsics.cpp

- rewrite(min(shift_right(widening_mul(x, y), z), upper), mul_shift_right(x, y, cast(unsigned_type, z)), is_x_same_uint && is_uint(z)) ||
- rewrite(min(rounding_shift_right(widening_mul(x, y), z), upper), rounding_mul_shift_right(x, y, cast(unsigned_type, z)), is_x_same_uint && is_uint(z)) ||
+ if (
+ // Saturating patterns


Side comment, the reformatting here makes this change a lot harder to review easily.

+1, may I suggest restoring original format and putting clang-format off back in place? If we want to let these be 'naturally' reformatted then IMHO we should do that in a standalone PR

I will in future, but here I just manually reformatted a few surrounding lines because they were hard to modify. What I'd done happened to agree with clang-format, so I also removed those comments.

In general rewrite rules are exempt from clang-format, but these particular ones use named intrinsics, so they get very long indeed.

abadams requested a review from dsharletg December 11, 2021 15:43

dsharletg approved these changes Dec 11, 2021

View reviewed changes

Address review comments

357a12a

abadams merged commit e23b6f0 into master Dec 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rounding shift rights should use rounding halving add #6494

rounding shift rights should use rounding halving add #6494

abadams commented Dec 11, 2021

abadams commented Dec 11, 2021

dsharletg Dec 11, 2021

dsharletg Dec 11, 2021

abadams Dec 13, 2021

dsharletg Dec 13, 2021

abadams Dec 13, 2021

dsharletg Dec 13, 2021

dsharletg Dec 11, 2021

steven-johnson Dec 13, 2021

abadams Dec 13, 2021

abadams Dec 13, 2021 •

edited

Loading

rounding shift rights should use rounding halving add #6494

rounding shift rights should use rounding halving add #6494

Conversation

abadams commented Dec 11, 2021

abadams commented Dec 11, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abadams Dec 13, 2021 • edited Loading

Choose a reason for hiding this comment

abadams Dec 13, 2021 •

edited

Loading