-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix Math.pow_body
for huge powers
#53886
Conversation
Is my understanding correct that the theory behind this is that it moves the first iteration out of the loop? It's somewhat unclear to me why this makes a difference. |
Yes, I took the first execution out of the loop, because I wanted to half the size of
|
Overall this looks good. I do want to do a little more testing with random giant comments to make sure this fully fixes the issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous test run failed for two 32-bit AMD cpus. Reverted muladd
to fma
to check out, if that's the reason.
That was indeed the reason. So on these architectures, muladd
behaves worse compared to and fma
!!!
This comment was marked as outdated.
This comment was marked as outdated.
The current state of the PR is an essential improvement in accuracy (only 150 ppm more than 1 ULP error) for big exponents, while performance is maintained. (The recent implementation has more than 34% in the same sampled argument area). |
The statistics are perfect now: 99.4 % of the samples now show 0 ULPS, 0.6% 1 ULPS error, no sample with more!
Output with the version of this PR:
The output of the same commands on v10.2
|
That looks great! It's also really nice to have a second person that actually understands this code well. Can you also post the benchmarks for |
In the realm of the old implementation, exponents are less than 2^25, the results are comparable both in accuracy and performance: The benchmark evaluates 10^6 powers and requires 59 or 51 ns mean value. New:
For comparison the old implementation:
Baseline for the benchmark:
|
I also wonder whether it would be worth making it so that big powers go to the regular Float64^Float64 path. My bench-marking suggests that around |
Do you mean restrict the exponents to -20:20? Here we go:
|
I will check, how performance verse accuracy works out. |
@oscardssmith yes, that would be a better way. I wasn't aware that we have a double precision implementation of log/exp behind the scenes. That has high accuracy for all integer exponents which can be represented as This is a separate PR, though. If that will be done, this PR is required not anymore. Nevertheless it was a challenging task for me to get out the last ULP of the algorithm, having much fun! Note, that there is still issue #53881. |
This PR is superseded with #53967 and closed. |
Fix
Math.pow_body(a, n)
for huge values ofn
.Fixes #53881