-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite integer lerp using intrinsics #6426
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -134,17 +134,11 @@ Expr lower_lerp(Expr zero_val, Expr one_val, const Expr &weight) { | |
case 8: | ||
case 16: | ||
case 32: { | ||
Expr zero_expand = Cast::make(UInt(2 * bits, computation_type.lanes()), | ||
zero_val); | ||
Expr one_expand = Cast::make(UInt(2 * bits, one_val.type().lanes()), | ||
one_val); | ||
|
||
Expr rounding = Cast::make(UInt(2 * bits), 1) << Cast::make(UInt(2 * bits), (bits - 1)); | ||
Expr divisor = Cast::make(UInt(2 * bits), 1) << Cast::make(UInt(2 * bits), bits); | ||
|
||
Expr prod_sum = zero_expand * inverse_typed_weight + | ||
one_expand * typed_weight + rounding; | ||
Expr divided = ((prod_sum / divisor) + prod_sum) / divisor; | ||
Expr shift = Cast::make(UInt(2 * bits), bits); | ||
Expr prod_sum = widening_mul(zero_val, inverse_typed_weight) + widening_mul(one_val, typed_weight); | ||
// Computes x / (2 ** N - 1) as (x / 2 ** N + x) / 2 ** N. | ||
// TODO: on x86 it's actually one instruction cheaper to do the division directly. | ||
Expr divided = rounding_shift_right(rounding_shift_right(prod_sum, shift) + prod_sum, shift); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a trick to do x/255 as (x/256 + x)/256 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK yeah, division by 2^n-1... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that this is a special case of one of our unsigned division lowering methods. I was about to suggest just dividing and relying on that, but that only covers up to 255 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually I could easily extend that table to include 65535 and 2^32 - 1 ... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. btw the nasty thing about this trick is that it fails for the largest uint16_t, so you have to have some way to know that's not possible (as we do here). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the other hand, So maybe the resolution here is teaching the arm backend this trick for division by 255 Edit: Except that it overflows for the largest uint16. Sigh. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do any of us have a copy of the book "Hacker's Delight"? IIRC it has quite a long section on this sort of technique, but I last read it at a different employer ~20 years ago... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here's a useful identity for uint16s:
On ARM we currently lower x/255 as:
but using this identity we get:
Nice. Edit: nevermind, this also overflows in the addition. Dang. Second edit: Fixed, by using an averaging op at the cost of an additional instruction. Now it looks a lot like method 2 of our division methods... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@steven-johnson - very late, I know, but I keep a copy on my desk. If it ever comes up again that we want to consult this book, let me know. |
||
|
||
result = Cast::make(UInt(bits, computation_type.lanes()), divided); | ||
break; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is equivalent to the original code, but I'm not convinced it's a good idea on x86. I bet you can do better with an average-round-up instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should document that trick in a comment.