Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IsCollinear: detect and handle integer multiplication wrapping (version 2) #834

Merged
merged 2 commits into from
May 11, 2024

Conversation

reunanen
Copy link
Contributor

This is similar to #832, but tries to avoid branching and division.

@here-abarany, what do you think?

@here-abarany
Copy link

Looks good to me.

Do note that the C# and Pascal implementations very likely suffer from the same issues.

@reunanen
Copy link
Contributor Author

Do note that the C# and Pascal implementations very likely suffer from the same issues.

Yes. Hopefully easier to port this than the more complicated implementation.

@reunanen
Copy link
Contributor Author

@AngusJohnson, what do you think of this alternative?

@AngusJohnson
Copy link
Owner

@AngusJohnson, what do you think of this alternative?

Juha, I definitely think we're getting much closer, and I really appreciate the effort you've made.

I was a little concerned that these changes would affect performance, but in my testing so far, it doesn't seem to have done so 👍.
And I was wondering if it would be better to avoid calling ProductIsEqual when none of the upper 32bits in a,b,c,d are set (which I believe would be likely for most users). But given that calling ProductIsEqual has negligible impact on performance, that additional check probably wouldn't be necessary.

So really the only (temporary) reservation I have is getting my head around the logic in CalculateCarry(). Thanks for the link there which is helping a little 😜.

@reunanen
Copy link
Contributor Author

And I was wondering if it would be better to avoid calling ProductIsEqual when none of the upper 32bits in a,b,c,d are set (which I believe would be likely for most users). But given that calling ProductIsEqual has negligible impact on performance, that additional check probably wouldn't be necessary.

Yeah, and as pointed out by @here-abarany, that kind of branching may very well be more detrimental to performance than the additional instructions.

So really the only (temporary) reservation I have is getting my head around the logic in CalculateCarry(). Thanks for the link there which is helping a little 😜.

There are also alternative implementations out there, some of which are perhaps a bit clearer. I could try and find you some?

@AngusJohnson
Copy link
Owner

AngusJohnson commented May 11, 2024

I could try and find you some?

I hope that won't be necessary 🤞. I just need a bit of time (and a clear head) to work it out 😁.
(I'm very reluctant to use code that I don't fully understand.)

Edit: Here's my working understanding of CalculateCarry() in pseudocode ...

to test for a * b overflow carry
given that a & b are type Int64
 
let:
  aHi = a & 0xFFFFFFFF00000000;
  aLo = a & 0xFFFFFFFF;
  bHi = b & 0xFFFFFFFF00000000;
  bLo = b & 0xFFFFFFFF;
then:  
  a * b == (aHi + aLo) * (bHi + bLo)
        == (aHi * bHi) + (aHi * bLo) + (aLo * bHi) + (aLo * bLo)
also:
  (aHi * bHi) must be > 64bits, so carry = ((aHi >> 32) * (bHi >> 32))
but:
  (aHi * bLo) + (aLo * bHi) + (aLo * bLo) may also (just) overflow 
  so if ((aHi * bLo) + (aLo * bHi) + (aLo * bLo) > MAX_INT64) ++carry
  or safely ... if ((aLo * bHi) + (aLo * bLo) > (MAX_INT64 - (aHi * bLo))) ++carry

@reunanen reunanen closed this May 11, 2024
@reunanen
Copy link
Contributor Author

Oops, accidentally closed this... 🙈

@reunanen reunanen reopened this May 11, 2024
@AngusJohnson AngusJohnson merged commit 6e4feb8 into AngusJohnson:main May 11, 2024
14 checks passed
@AngusJohnson
Copy link
Owner

AngusJohnson commented May 11, 2024

I'll probably rewrite CalculateCarry so that I can understand what's happening there😁.
I just hope this doesn't break things again🤞.

inline uint64_t CalculateCarry(uint64_t a, uint64_t b)
{
  const uint64_t aHi = a & 0xFFFFFFFF00000000;
  const uint64_t aLo = a & 0xFFFFFFFF;
  const uint64_t bHi = b & 0xFFFFFFFF00000000;
  const uint64_t bLo = b & 0xFFFFFFFF;
  // a * b == (aHi + aLo) * (bHi + bLo)
  // a * b == (aHi * bHi) + (aHi * bLo) + (aLo * bHi) + (aLo * bLo)
  uint64_t carry = (aHi >> 32) * (bHi >> 32);
  // but since (aHi * bLo) + (aLo * bHi) + (aLo * bLo) may also (just) overflow 
  // do this (but safely) ... if ((aHi * bLo) + (aLo * bHi) + (aLo * bLo) > MAX_UINT64) ++carry
  if ((aLo * bHi) + (aLo * bLo) > (std::numeric_limits<uint64_t>::max() - (aHi * bLo))) ++carry;
  return carry;
}

@reunanen reunanen deleted the fix-831-simplified branch May 11, 2024 10:40
@reunanen
Copy link
Contributor Author

I just hope this doesn't break things again🤞.

For what it's worth: I noticed that there's already some serious test automation coverage – in the sense that whenever I made a mistake doing this, it was flagged by at least some (and often multiple) test cases. Which is great! 💪

Of course doesn't mean one should blindly rely on the tests only, but at least simple mistakes will probably be detected right away.

@reunanen
Copy link
Contributor Author

// do safely ... if ((aHi * bLo) + (aLo * bHi) + (aLo * bLo) > MAX_INT64) ++carry

Just a nitpick: that should be MAX_UINT64, right?

@reunanen
Copy link
Contributor Author

And yes, your version is very readable, now I think I understand this too 😅

You use redundant parentheses more than I would, but that's a matter of taste I guess.

@AngusJohnson
Copy link
Owner

You use redundant parentheses more than I would, but that's a matter of taste I guess.

I was deliberately heavy handed with parentheses so they would parallel the commented code logic above.

@AngusJohnson
Copy link
Owner

AngusJohnson commented May 11, 2024

I'll probably rewrite CalculateCarry so that I can understand what's happening there😁. I just hope this doesn't break things again🤞.

inline uint64_t CalculateCarry(uint64_t a, uint64_t b)
{
  const uint64_t aHi = a & 0xFFFFFFFF00000000;
  const uint64_t aLo = a & 0xFFFFFFFF;
  const uint64_t bHi = b & 0xFFFFFFFF00000000;
  const uint64_t bLo = b & 0xFFFFFFFF;
  // a * b == (aHi + aLo) * (bHi + bLo)
  // a * b == (aHi * bHi) + (aHi * bLo) + (aLo * bHi) + (aLo * bLo)
  uint64_t carry = (aHi >> 32) * (bHi >> 32);
  // but since (aHi * bLo) + (aLo * bHi) + (aLo * bLo) may also (just) overflow 
  // do this (but safely) ... if ((aHi * bLo) + (aLo * bHi) + (aLo * bLo) > MAX_UINT64) ++carry
  if ((aLo * bHi) + (aLo * bLo) > (std::numeric_limits<uint64_t>::max() - (aHi * bLo))) ++carry;
  return carry;
}

Actually, looking again more closely, I don't think the code above is correct.
And if it is wrong then I believe I was somewhat mislead by the StackOverflow link provided.

I think the following code is correct (or almost correct), but it's bedtime again so I may not be thinking straight ...

  inline uint64_t CalcOverflowCarry(uint64_t a, uint64_t b)
  {
    const uint64_t aHi = a & 0xFFFFFFFF00000000;
    const uint64_t aLo = a & 0xFFFFFFFF;
    const uint64_t bHi = b & 0xFFFFFFFF00000000;
    const uint64_t bLo = b & 0xFFFFFFFF;
    // a * b == (aHi + aLo) * (bHi + bLo)
    // a * b == (aHi * bHi) + (aHi * bLo) + (aLo * bHi) + (aLo * bLo)
    return (aHi >> 32) * (bHi >> 32) + (((aHi >> 32) * bLo) >> 32) + (((bHi >> 32) * aLo) >> 32);
  }

@reunanen
Copy link
Contributor Author

return (aHi >> 32) * (bHi >> 32) + ((aHi * bLo) >> 32) + ((bHi * aLo) >> 32);

Hmm but won't aHi * bLo and bHi * aLo possibly wrap here?

@AngusJohnson
Copy link
Owner

AngusJohnson commented May 11, 2024

return (aHi >> 32) * (bHi >> 32) + ((aHi * bLo) >> 32) + ((bHi * aLo) >> 32);

Hmm but won't aHi * bLo and bHi * aLo possibly wrap here?

You just missed my edit. (See above.)
Anyhow, this is why I'm so keen to understand the underlying code.
I doesn't stop me making mistakes, but hopefully it helps me avoid making a lot more.

  inline uint64_t CalcOverflowCarry(uint64_t a, uint64_t b)
  {
    //const uint64_t aHi = a & 0xFFFFFFFF00000000;
    const uint64_t aHiShr32 = a >> 32;
    const uint64_t aLo = a & 0xFFFFFFFF;
    //const uint64_t bHi = b & 0xFFFFFFFF00000000;
    const uint64_t bHiShr32 = b >> 32;
    const uint64_t bLo = b & 0xFFFFFFFF;
    // a * b == (aHi + aLo) * (bHi + bLo)
    // a * b == (aHi * bHi) + (aHi * bLo) + (aLo * bHi) + (aLo * bLo)
    //carry = (aHi >> 32) * (bHi >> 32) + (((aHi >> 32) * bLo) >> 32) + (((bHi >> 32) * aLo) >> 32);
    return aHiShr32 * bHiShr32 + ((aHiShr32 * bLo) >> 32) + ((bHiShr32 * aLo) >> 32);
  }

@tomwiel
Copy link

tomwiel commented May 11, 2024

Just in case, this one (Emulate64x64to128) looks quite simple too.

@AngusJohnson
Copy link
Owner

AngusJohnson commented May 11, 2024

Thanks Tom. You've just reminded me of my own Int128Mul function and Int128 class in Clipper1. I'm very confident that Int128Mul there (line 354) performs correctly.

@azrafe7
Copy link

azrafe7 commented May 11, 2024

Hi all, following the discussion and throwing in my two cents...
This one https://stackoverflow.com/a/1815371/1158913 seems to behave correctly for me, returning the expected carry.

I've checked against a naive implementation (carry128()) using uint_128 (BASELINE to ensure correct results - also tested in Python) and an adaptation of the solution in the link (multiply_carry()):

typedef unsigned __int128 uint128_t;
uint64_t carry128(uint64_t a, uint64_t b) {
    uint128_t a_128 = a;
    uint128_t res_128 = a_128 * b;
    uint64_t carry = (uint64_t) (res_128 >> 64);
    
    return carry;
}

uint64_t hi(uint64_t x) {
    return x >> 32;
}

uint64_t lo(uint64_t x) {
    return ((1ULL << 32) - 1) & x;
}

// https://stackoverflow.com/a/1815371/1158913
uint64_t multiply_carry(uint64_t a, uint64_t b) {
    // actually uint32_t would do, but the casting is annoying
    uint64_t s0, s1, s2, s3; 
    
    uint64_t x = lo(a) * lo(b);
    s0 = lo(x);
    
    x = hi(a) * lo(b) + hi(x);
    s1 = lo(x);
    s2 = hi(x);
    
    x = s1 + lo(a) * hi(b);
    s1 = lo(x);
    
    x = s2 + hi(a) * hi(b) + hi(x);
    s2 = lo(x);
    s3 = hi(x);
    
    uint64_t result = s1 << 32 | s0;
    uint64_t carry = s3 << 32 | s2;
    
    return carry;
}

Examples of testing carry returned from multiplication with random numbers:

TEST: 0x51eaed81157de061 * 0x3a271fb2745b6fe9
BASELINE carry128: 0x129bbebdfae0464e
CalculateCarry   : 0x129bbebdd0c2c2b2 WRONG
CalcOverflowCarry: 0x129bbebdfae0464e OK
multiply_carry   : 0x129bbebdfae0464e OK

TEST: 0xc2055706a62883fa * 0x26c78bc79c2322cc
BASELINE carry128: 0x1d640701d192519b
CalculateCarry   : 0x1d6407014210e7ab WRONG
CalcOverflowCarry: 0x1d640701d1925199 WRONG
multiply_carry   : 0x1d640701d192519b OK

More tests...
TEST: 0x874ddae32094b0de * 0x9b1559a06fdf83e0
BASELINE carry128: 0x51f76c49563e5bfe
CalculateCarry   : 0x51f76c490760b8e0 WRONG
CalcOverflowCarry: 0x51f76c49563e5bfd WRONG
multiply_carry   : 0x51f76c49563e5bfe OK
TEST: 0x81fb3ad3636ca900 * 0x239c000a982a8da4
BASELINE carry128: 0x12148e28207b83a3
CalculateCarry   : 0x12148e27c5644c3e WRONG
CalcOverflowCarry: 0x12148e28207b83a2 WRONG
multiply_carry   : 0x12148e28207b83a3 OK
TEST: 0x4be0b4c5d2725c44 * 0x990cd6db34a04c30
BASELINE carry128: 0x2d5d1a4183fd6165
CalculateCarry   : 0x2d5d1a40f6935288 WRONG
CalcOverflowCarry: 0x2d5d1a4183fd6164 WRONG
multiply_carry   : 0x2d5d1a4183fd6165 OK
TEST: 0x978ec0c0433c01f6 * 0x2df03d097966b536
BASELINE carry128: 0x1b3251d91fe272a5
CalculateCarry   : 0x1b3251d8cbf286c1 WRONG
CalcOverflowCarry: 0x1b3251d91fe272a4 WRONG
multiply_carry   : 0x1b3251d91fe272a5 OK
TEST: 0x49c5cbbcfd716344 * 0xc489e3b34b007ad3
BASELINE carry128: 0x38a32c74c8c191a4
CalculateCarry   : 0x38a32c73f0912874 WRONG
CalcOverflowCarry: 0x38a32c74c8c191a3 WRONG
multiply_carry   : 0x38a32c74c8c191a4 OK
TEST: 0xd3361cdbeed655d5 * 0x1240da41e324953a
BASELINE carry128: 0x0f0f4fa11e7e8f2a
CalculateCarry   : 0x0f0f4fa0520fd19c WRONG
CalcOverflowCarry: 0x0f0f4fa11e7e8f28 WRONG
multiply_carry   : 0x0f0f4fa11e7e8f2a OK
TEST: 0x51b854f8e71b0ae0 * 0x6f8d438aae530af5
BASELINE carry128: 0x239c04ee3c8cc248
CalculateCarry   : 0x239c04eda032b5b0 WRONG
CalcOverflowCarry: 0x239c04ee3c8cc247 WRONG
multiply_carry   : 0x239c04ee3c8cc248 OK
TEST: 0xbbecf7dbc6147480 * 0xbb0f73d0f82e2236
BASELINE carry128: 0x895170f4e9a216a7
CalculateCarry   : 0x895170f3a2b5c2f1 WRONG
CalcOverflowCarry: 0x895170f4e9a216a5 WRONG
multiply_carry   : 0x895170f4e9a216a7 OK
WRONG (of 10 tries): 
  CalculateCarry   : 10
  CalcOverflowCarry: 9
  multiply_carry   : 0

Hope it's helpful.

@AngusJohnson
Copy link
Owner

Thanks Giuseppe.
StackOverflow is a fantastic resource, but the code there isn't always reliable (even from contrubutors with very high reputations).
Anyhow, I've just updated clipper.core.h again with a revised CalcOverflowCarry function.

AngusJohnson added a commit that referenced this pull request May 12, 2024
Otherwise a minor code tidy and updated code comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants