Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rounding shift rights should use rounding halving add #6494

Merged
merged 2 commits into from
Dec 13, 2021

Conversation

abadams
Copy link
Member

@abadams abadams commented Dec 11, 2021

On x86 currently we lower cast<uint8_t>((cast<uint16_t>(x) + 8) / 16)
to:

cast<uint8_t>(shift_right(widening_add(x, 8), 4))

This compiles to 8 instructions on x86: Widen each half of the input
vector, add 8 to each half-vector, shift each half-vector, then narrow
each half-vector.

First, this should have been a rounding_shift_right. Some patterns were
missing in FindIntrinsics.

Second, rounding_shift_right had suboptimal codegen in the case where
the second arg is a positive const. On archs without a rounding shift
right instruction you can further rewrite this to:

shift_right(rounding_halving_add(x, 7), 3)

which is just two instructions on x86.

On x86 currently we lower cast<uint8_t>((cast<uint16_t>(x) + 8) / 16)
to:

cast<uint8_t>(shift_right(widening_add(x, 8), 4))

This compiles to 8 instructions on x86: Widen each half of the input
vector, add 8 to each half-vector, shift each half-vector, then narrow
each half-vector.

First, this should have been a rounding_shift_right. Some patterns were
missing in FindIntrinsics.

Second, rounding_shift_right had suboptimal codegen in the case where
the second arg is a positive const. On archs without a rounding shift
right instruction you can further rewrite this to:

shift_right(rounding_halving_add(x, 7), 3)

which is just two instructions on x86.
@abadams
Copy link
Member Author

abadams commented Dec 11, 2021

Dillon, I have a question for you in the IRMatch.h code. I'm a bit confused about the types of the second arg to shifts.

src/IRMatch.h Outdated
// Assuming the args have the same type as the intrinsic is incorrect in
// general. But for the intrinsics we can fold (just shifts), the LHS
// has the same type as the intrinsic, and we can always treat the RHS
// as a signed int, because we're using 64 bits for it. (TODO: Dillon,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is OK. Constants will be simplified to unsigned shifts if needed.

@@ -150,6 +150,13 @@ class SimdOpCheck : public SimdOpCheckTest {
check("pavgb", 8 * w, u8((u16(u8_1) + u16(u8_2) + 1) >> 1));
check("pavgw", 4 * w, u16((u32(u16_1) + u32(u16_2) + 1) / 2));
check("pavgw", 4 * w, u16((u32(u16_1) + u32(u16_2) + 1) >> 1));

// Rounding right shifts should also use pavg
check("pavgb", 8 * w, u8((u16(u8_1) + 15) >> 4));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these added checks should be in test/correctness/intrinsics.cpp instead (much lighter weight test than simd_op_check and it is target independent).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a test that rounding_shift_right is both caught by pattern matching and also lowers correctly on x86, so I'll move half of it to intrinsics

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't the two tests below this the same test after accounting for the pattern matching part?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully the latest commit makes it clearer. The first two tests aren't just a rounding shift right. they strength-reduce to an average followed by a shift right. So it tests the new patterns.

The second test checks lower_rounding_shift_right does the right thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, I hadn't noticed the constant isn't a rounding_shift_right. Still, I think the change you just made is good (easier tests to debug if they fail).

rewrite(min(shift_right(widening_mul(x, y), z), upper), mul_shift_right(x, y, cast(unsigned_type, z)), is_x_same_uint && is_uint(z)) ||
rewrite(min(rounding_shift_right(widening_mul(x, y), z), upper), rounding_mul_shift_right(x, y, cast(unsigned_type, z)), is_x_same_uint && is_uint(z)) ||
if (
// Saturating patterns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side comment, the reformatting here makes this change a lot harder to review easily.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, may I suggest restoring original format and putting clang-format off back in place? If we want to let these be 'naturally' reformatted then IMHO we should do that in a standalone PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will in future, but here I just manually reformatted a few surrounding lines because they were hard to modify. What I'd done happened to agree with clang-format, so I also removed those comments.

Copy link
Member Author

@abadams abadams Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general rewrite rules are exempt from clang-format, but these particular ones use named intrinsics, so they get very long indeed.

@abadams abadams merged commit e23b6f0 into master Dec 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants