Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a fast integer divide that rounds to zero #6455

Merged
merged 6 commits into from
Dec 7, 2021

Conversation

abadams
Copy link
Member

@abadams abadams commented Nov 30, 2021

While working on legacy code I discovered a need for this. Performance test shows a good speed-up over native division for vector code:

signed division rounding to zero:
type            const-divisor speed-up  runtime-divisor speed-up
 Int(32,  1)     2.416                   1.153
 Int(16,  1)     2.552                   1.457
 Int( 8,  1)     1.782                   0.667
 Int(32,  8)     8.592                   5.908
 Int(16, 16)    53.008                  38.505
 Int( 8, 32)    19.480                   8.197

Expr xsign = select(numerator > 0, cast(t, 0), cast(t, -1));

// Multiply-keep-high-half
result = (cast(wide, mul) * numerator);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should use widening_mul intrinsics, because uses of this are after find_intrinsics. Maybe this whole sequence should be mul_shift_right.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this code is only called directly by users, so it's before find_intrinsics. The compiler doesn't ever call this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add this as a comment for future readers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think we should change it to intrinsics anyways. But since the code is just moved and pre-existing, maybe it should be a separate PR.


// Reference good version
g(x, y) = input(x, y) / cast<T>(y + min_val);
// Reference good version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks identical to the case just above, are they supposed to be identical?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they have different schedules which turn the denominator into a constant in one case but not the other.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'll add a comment)

bool srz_method_0(int den, int sh_post, int bits) {
int64_t min = -(1L << (bits - 1)), max = (1L << (bits - 1)) - 1;
for (int64_t num = min; num <= max; num++) {
// for (int iter = 0; iter < 1000000L; iter++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this commented out? If it's being left in for (eg) debugging purposes, please say so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed (deleted)

@abadams
Copy link
Member Author

abadams commented Nov 30, 2021

See also related issue #6456

@abadams
Copy link
Member Author

abadams commented Dec 2, 2021

review ping

Copy link
Contributor

@steven-johnson steven-johnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Expr xsign = select(numerator > 0, cast(t, 0), cast(t, -1));

// Multiply-keep-high-half
result = (cast(wide, mul) * numerator);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add this as a comment for future readers.

@abadams abadams merged commit 7992369 into master Dec 7, 2021
@lordnn
Copy link

lordnn commented Sep 10, 2022

buf(x) = fast_integer_divide_round_to_zero(select(x % 2 == 0, 5, -5), 2);
result is:
2, -3, 2, -3, 2, -3
Not rounded to zero.

@abadams
Copy link
Member Author

abadams commented Sep 10, 2022

Looks like there's a bug in the handling of constant denominators (an early-out path that assumes we're rounding to -infinity). Will fix.

@abadams
Copy link
Member Author

abadams commented Sep 11, 2022

See #7008

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants