Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Study mul256 #574

Merged
merged 18 commits into from
May 20, 2024
Merged

Study mul256 #574

merged 18 commits into from
May 20, 2024

Conversation

ckormanyos
Copy link
Collaborator

@ckormanyos ckormanyos commented May 18, 2024

The purpose of this PR is to simply insert parts of wide-integer's mul-256 into decimal. Let's see how CI shakes out.

This phenomenon was originally reported in #559.

@ckormanyos
Copy link
Collaborator Author

ckormanyos commented May 18, 2024

Hard error but very confusing on Ubuntu, cbrt(128-bit) for argument close to $1.0$ hangs, not fails but hangs indefinitely.

It works just fine on MSVC. Will now investigate why/what's up on LINUX. It also hangs on WSL2 so I'll look there locally first.

@ckormanyos
Copy link
Collaborator Author

Hi Matt (@mborland) I took the opportunity to fully refactor sqrt() and cbrt() and their tests. Interesting point: When I had initially started with mul-256 for some strange reason the cube-root test was hanging. I also found that several tens of Newton iterations were being performed in some cases.

So for sqrt() and cbrt() I used a uniform estimation scheme providing slightly more than $2$ decimal digits of accuracy in the initial guess on the interval $1/10 {\leq} x < 1$. This results excellent results for exactly in $2$, $3$ or $4$ iterations for 32/64/128 bits respectively. I added and strengthened the tests and am now hammering the new multiplication and roots.

When implementing $8{\times}8$ multiplication, I opted to unroll the loop. I could add some specialized optimitations for multiplication $4{\times}4$ or $8{\times}2$, but we can do that later as needed.

@ckormanyos
Copy link
Collaborator Author

ckormanyos commented May 19, 2024

Oh yes and also Matt (@mborland) there was, in fact, a hard error lurking around somewhere in the original $256{\times}256$-bit multiplication. So these changes (or something like them) are seemingly needed.

I can now reduce significantly all tolerances on decimal128-tests. I was tempted to gut out the whole thing and put in a small subset of wide-integer, but we've got no time for that at the moment. So I opted to simply port over a few routines.

Now the test codes in #559 all pass.

@ckormanyos
Copy link
Collaborator Author

... can now reduce significantly all tolerances on decimal128-tests

See also #575.

@ckormanyos
Copy link
Collaborator Author

Drone is having a bit of a tussle with some sort of scripts. Maybe we can sort this out. GHA is finally looking reasonable.

Hi Matt (@mborland) this PR has a bit of depth to it. Redo $256{\times}256$-bit mul, with future optimizations TODO. Also totally rework sqrt() and cbrt().

Please give this thing a look at your convenience. I'm OK with it because $256$-bit math is now correct, or better-said, more correct. If there's any performance hits, we/I can probably deal with them. So I'd favor numerical correctness at this juncture.

@ckormanyos ckormanyos requested a review from mborland May 19, 2024 17:51
include/boost/decimal/detail/cmath/cbrt.hpp Show resolved Hide resolved
include/boost/decimal/detail/cmath/sqrt.hpp Show resolved Hide resolved
include/boost/decimal/detail/emulated256.hpp Show resolved Hide resolved
include/boost/decimal/detail/emulated256.hpp Outdated Show resolved Hide resolved
test/test_lgamma.cpp Show resolved Hide resolved
Copy link

codecov bot commented May 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.7%. Comparing base (04f5974) to head (fe4392e).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           develop    #574     +/-   ##
=========================================
+ Coverage     98.7%   98.7%   +0.1%     
=========================================
  Files          195     195             
  Lines        12014   12472    +458     
  Branches      1419    1455     +36     
=========================================
+ Hits         11846   12307    +461     
+ Misses         168     165      -3     
Files Coverage Δ
include/boost/decimal/detail/cmath/cbrt.hpp 100.0% <100.0%> (ø)
include/boost/decimal/detail/cmath/sqrt.hpp 100.0% <100.0%> (+5.3%) ⬆️
include/boost/decimal/detail/emulated128.hpp 96.5% <100.0%> (+0.2%) ⬆️
include/boost/decimal/detail/emulated256.hpp 100.0% <100.0%> (+1.0%) ⬆️
...e/boost/decimal/detail/wide-integer/uintwide_t.hpp 100.0% <100.0%> (ø)
test/test_atan.cpp 100.0% <100.0%> (ø)
test/test_big_uints.cpp 100.0% <100.0%> (ø)
test/test_cbrt.cpp 100.0% <100.0%> (ø)
test/test_erf.cpp 100.0% <ø> (ø)
test/test_exp.cpp 100.0% <100.0%> (ø)
... and 8 more

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 04f5974...fe4392e. Read the comment docs.

@ckormanyos
Copy link
Collaborator Author

I was concerned about changing performance. Multiplication stays about the same. I also benchmarked sqrt() which has improved in the mul256 branch compared to the develop branch. The tables are from MSVC 142, with similar trends on Ubuntu in WSL2.

Multiplication mul256 develop
Multiplication <float > 443 558
Multiplication <double > 476 609
Multiplication <decimal32 > 37383 37159
Multiplication <decimal64 > 125178 123256
Multiplication <decimal128> 397336 397468
Multiplication <dec32_fast> 32281 32995
sqrt() mul256 develop
sqrt<float > 1053 1353
sqrt<double > 2908 2884
sqrt<decimal32 > 417289 666491
sqrt<decimal64 > 1604746 2181287
sqrt<decimal128> 15742232 23100960

@ckormanyos
Copy link
Collaborator Author

ckormanyos commented May 20, 2024

Hi Matt (@mborland) before leaving this one, I found another issue in uint128 when performing $128{\times}64$-bit multiplication. I'll fix this here in this branch (although not with the highest possible efficiency).

At some time in the future, I'll take a few morning sessions and make a dedicated, C++14-friendly uint-whatever (like $128$, $256$) for this work. But we can probably push that off until the interfacing and most of the known numerical problems have been eliminated.

@ckormanyos
Copy link
Collaborator Author

This thing fixes a bunch of pesky problems.

  • Repair $128$ and $256$-bit multiplication, but retain whatever performance we have (repair temporary permanent fiy planned this year).
  • Improve sqrt() and cbrt().
  • Eliminate spurious errors on standard function tests via eliminating troublesome ranges not needed for basic tests.
  • Reduce decimal128 tolerances on a bunch of tests.

@ckormanyos ckormanyos merged commit 10089df into develop May 20, 2024
62 checks passed
@ckormanyos ckormanyos deleted the study_mul256 branch May 20, 2024 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants