-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improved implementation of hypot(a,b) #31922
Conversation
Welcome, and thanks for the contribution, and the paper. Overall I think this looks pretty good. Have you done any benchmarks? I was wondering if instead of if ax > sqrt(floatmax(T)/2)
sx = ax * floatmin(T)
sy = ay * floatmin(T)
return sqrt(muladd(sx,sx,ay*ay)) / floatmin(T)
elseif ay < sqrt(floatmin(T)
sx = ax / floatmin(T)
sy = ay / floatmin(T)
return sqrt(muladd(sx,sx,ay*ay)) * floatmin(T)
else
return sqrt(muladd(ax,ax,ay*ay))
end (here I am assuming that the compiler will convert the divisions into multiplications, which it should since |
Any hope to extend this and fix the vararg method? See #27141 |
Looks like it isn't getting |
Is it not correct to return NaN if either argument is NaN since we have no idea what caused the NaN in the first place? Otherwise hypot(0/0,0/0) would give Inf which is not correct. I haven't timed the rescaling using division/multiplication vs. frexp/ldexp but I'll look at that when I get back to the office on Monday. Mostly I didn't worry about that since it is an extremely unusual case. The code clears widely varying operands first which would clear almost anything in actual practice that involved an 'extreme' exponent (i.e. possible overflow/underflow). You would only need to rescale in the very odd event that you had both operands with nearby extreme exponents. All that said, I like the simple elegance of the rescaling you suggest. |
I think, because |
You could make the similar argument that FWIW, MATLAB gives |
One problem with this approach is that it can cause the other argument to go out of range. For example
Here x is too large and would overflow. Since y does not have a widely varying exponent we rescale. Unfortunately, the resulting y would then underflow on squaring. One could use the square root of floatmin but that will be a problem if the exponent for that value is not even (it's even in Float64 but if this is to be fully general then it needs to work everywhere). |
I think I kind of agree with @jebej: julia> hypot(Inf, -1/0)
Inf doesn't look necessarily a reasonable answer. |
I would add that if the motivation behind |
The behaviour is specified in the IEEE754 standard (§9.2.1):
My guess is that the logic is that
|
This reasoning should hold also for addition, shouldn't it?
|
|
Probably I got the idea now: |
Good point. We could define a specific constant for this, or use something like |
Can't fight the standard so the inf issue is now compliant. Decided to go with I'll post the updates in a little bit. |
Provides a fast and accurate implementation of hypot() that leverages the fused multiply add where available. The approach is explained and tested in detail in the paper: An Improved Algorithm for hypot(a,b) by Carlos F. Borges The article is available online at ArXiv at the link https://arxiv.org/abs/1904.09481
Overall I think this looks good.
@test hypot(floatmax(T),1.0) == floatmax(T) and something to check that we're not losing precision due to gradual underflow? |
…cleaned up the comments.
I have noticed one compiler related issue that I'd love to get some input on. Specifically, it appears that although there are three nearly identical calls to |
That's interesting. @vchuravy any idea why this might happen? One alternative that might help is to force them all through the same path, e.g. scale = one(T)
if ax > sqrt(floatmax(T)/2)
scale = 1/eps(sqrt(floatmin(T))) #Rescaling constant
ax = ax/scale
ay = ay/scale
elseif ay < sqrt(floatmin(T))
scale = eps(sqrt(floatmin(T))) #Rescaling constant
ax = ax/scale
ay = ay/scale
end
return scale*sqrt(muladd(ax,ax,ay*ay)) and add a test for the behaviour? |
I did that and it does work I just hate adding a multiply by one. It rubs me the wrong way. But if you think that's better than going to an |
Co-Authored-By: Simon Byrne <simonbyrne@gmail.com>
Co-Authored-By: Simon Byrne <simonbyrne@gmail.com>
Co-Authored-By: Simon Byrne <simonbyrne@gmail.com>
… the accuracy. It is now far better than the clib hypot code even without a fused multiply add. If you look at the ArXiv paper there is a plot that shows the stunning performance difference. This comes at the cost of a two more multiplies and one more divide (and some adds). Still way cheaper than the clib code
…e. It is 10 times better than the C math library hypot function in this respect.
Very nice! Should be good to merge once tests finish, unless you have more changes planned. |
No more changes. I believe I've taken it as far as my skills allow. I kept hoping to find a simple form that was always correctly rounded (I didn't want to use any Veltkamp-Dekker type tricks) but this is as close as I could get. You can squeeze a tiny bit more out by adding more branches but that would slow it down for almost no gain so I don't see the point. |
PR #30301 adds a few tests for edge cases that would be probably useful to make sure pass also in this PR. |
Thanks! |
Awesome. It's been fun. By the way, I do have a version that always returns the correctly rounded answer but it REQUIRES the fused multiply-add and gives poor results without it (results are similar to the Naive (Unfused) code in the paper if there is no fma). If that is of interest let me know how I might contribute it. |
I'm not sure: the simplest option is to create a simple package. I'm not sure if there are existing packages where it could fit (https://github.com/JuliaIntervals/CRlibm.jl maybe?). @dpsanders might have some suggestions. What's the performance like vs what you currently have? We don't currently have a great way to feature gate (see #9855). There is currently a line in the code: Line 144 in 4a04600
but on my machine (which does have FMA) I get:
so that doesn't really help. |
How does it do in terms of accuracy if the multiply add isn't fused? We have both |
julia> fma(nextfloat(1.0),nextfloat(1.0),-nextfloat(1.0,2))
4.930380657631324e-32 Wrong sign? |
ah ha! |
As pointed out be @StefanKarpinski in #31922 (comment)
As pointed out be @StefanKarpinski in #31922 (comment)
Without the fma it performs exactly like the Naive (Unfused) code from my paper. So one ulp errors about 17% of the time on normally distributed inputs (compare to 13% for the clib hypot). It requires 4 fma calls. |
CRlibm.jl is just a wrapper around the CRlibm library and I would prefer to leave it as such. But it would be great if we could start writing a CorrectRounding.jl library to replace it! |
As pointed out be @StefanKarpinski in #31922 (comment)
@cfborges, I notice that you published this code in ACM TOMS. This is a bit of a concern, since if you assigned the copyright to ACM (the default, I think?), then the code (which you no longer own) is by default licensed under the ACM TOMS license, which is not free/open-source — it is only free for "noncommercial use". However, there is the possibility for the author of the work to request that ACM TOMS release the code under a more liberal license, or possibly at the time of publication the author could request retain copyright ownership of the work. Did you do this? (The situation is a bit murky to me if you contributed this code to Julia before assigning copyright to ACM, but it seems better to clarify the situation by ensuring that the ACM TOMS code for your article is released under a free/open-source license like MIT or BSD.) |
As a federal employee my work cannot be copyrighted.
Cheers,
Carlos
…________________________________
From: Steven G. Johnson ***@***.***>
Sent: Wednesday, March 20, 2024 10:06:31 AM
To: JuliaLang/julia ***@***.***>
Cc: Borges, Carlos (CIV) ***@***.***>; Mention ***@***.***>
Subject: Re: [JuliaLang/julia] improved implementation of hypot(a,b) (#31922)
NPS WARNING: *external sender* verify before acting.
@cfborges<https://github.com/cfborges>, I notice that you published this code in ACM TOMS<https://dl.acm.org/doi/10.1145/3428446>.
This is a bit of a concern, since if you assigned the copyright to ACM, then the code is by default licensed under the ACM TOMS license<https://www.acm.org/publications/policies/software-copyright-notice>, which is not free/open-source<https://scicomp.stackexchange.com/questions/2832/for-software-submitted-to-acm-toms-how-does-the-acm-software-license-agreement> — it is only free for "noncommercial use".
However, there is the possibility for the author of the work to request that ACM TOMS release the code under a more liberal license, or possibly to retain copyright ownership of the work. Did you do this?
—
Reply to this email directly, view it on GitHub<#31922 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AL7MF72V7SFSYGYMEDSSMZDYZG6ZPAVCNFSM4HKXBUJKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBRGAYDQNBYGI4A>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Ah, great, that is covered by the ACM copyright release: It's weird that the ACM TOMS 1014 page of your article does not make this clear when you download the Supplemental Material (which doesn't list any copyright information that I can see?). You might want to email TOMS to request that they add a public domain notice on your Supplemental Material to clarify the copyright status of your code. Otherwise a reader might by default assume it is restricted by the ACM TOMS semi-free license. |
I did. And it’s obvious from my affiliation.
Cheers,
Carlos
…________________________________
From: Steven G. Johnson ***@***.***>
Sent: Wednesday, March 20, 2024 10:18:41 AM
To: JuliaLang/julia ***@***.***>
Cc: Borges, Carlos (CIV) ***@***.***>; Mention ***@***.***>
Subject: Re: [JuliaLang/julia] improved implementation of hypot(a,b) (#31922)
NPS WARNING: *external sender* verify before acting.
Ah, great, that is covered by the ACM copyright release<https://www.acm.org/binaries/content/assets/publications/copyreleaseproc-8-16.pdf>:
image.png (view on web)<https://github.com/JuliaLang/julia/assets/2913679/29f83cec-0231-4291-9a22-41292fa01a8e>
Assuming you checked that box, it should be all good?
It's weird that the ACM TOMS 1014 page of your article does not make this clear, nor does any public domain notice appear in the Supplemental Material if you download it. You might want to email TOMS to request that they add a public domain notice on your article to clarify the copyright status of your code.
—
Reply to this email directly, view it on GitHub<#31922 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AL7MF73YBKGTW5OAWH2V6ZTYZHAHDAVCNFSM4HKXBUJKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBRGAYTGMRWGA3A>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
I edited my post to note that a public-domain notice indeed appears in your article text. But it would be nice for readers if ACM TOMS also showed such a notice on the Supplementary Material too. |
Provides a fast and accurate implementation of hypot() that leverages
the fused multiply add where available. The approach is explained
and tested in detail in the paper:
An Improved Algorithm for hypot(a,b) by Carlos F. Borges
The article is available online at ArXiv at the link
https://arxiv.org/abs/1904.09481
cc: @simonbyrne