improved implementation of hypot(a,b) #31922

cfborges · 2019-05-03T21:17:08Z

Provides a fast and accurate implementation of hypot() that leverages
the fused multiply add where available. The approach is explained
and tested in detail in the paper:
An Improved Algorithm for hypot(a,b) by Carlos F. Borges
The article is available online at ArXiv at the link
https://arxiv.org/abs/1904.09481

cc: @simonbyrne

base/math.jl

simonbyrne · 2019-05-03T21:55:58Z

Welcome, and thanks for the contribution, and the paper. Overall I think this looks pretty good.

Have you done any benchmarks? I was wondering if instead of frexp/ldexp, would it be faster to say doing something like

    if ax > sqrt(floatmax(T)/2)
        sx = ax * floatmin(T)
        sy = ay * floatmin(T)
        return sqrt(muladd(sx,sx,ay*ay)) / floatmin(T)
    elseif ay < sqrt(floatmin(T)
        sx = ax / floatmin(T)
        sy = ay / floatmin(T)
        return sqrt(muladd(sx,sx,ay*ay)) * floatmin(T)        
    else
        return sqrt(muladd(ax,ax,ay*ay))
    end

(here I am assuming that the compiler will convert the divisions into multiplications, which it should since floatmin(T) is a power of 2)?

giordano · 2019-05-03T23:21:23Z

Any hope to extend this and fix the vararg method? See #27141

simonbyrne · 2019-05-04T05:35:19Z

Looks like it isn't getting hypot(Inf,NaN) == hypot(NaN,Inf) == Inf correct.

cfborges · 2019-05-04T17:59:28Z

Is it not correct to return NaN if either argument is NaN since we have no idea what caused the NaN in the first place? Otherwise hypot(0/0,0/0) would give Inf which is not correct.

I haven't timed the rescaling using division/multiplication vs. frexp/ldexp but I'll look at that when I get back to the office on Monday. Mostly I didn't worry about that since it is an extremely unusual case. The code clears widely varying operands first which would clear almost anything in actual practice that involved an 'extreme' exponent (i.e. possible overflow/underflow). You would only need to rescale in the very odd event that you had both operands with nearby extreme exponents. All that said, I like the simple elegance of the rescaling you suggest.

KlausC · 2019-05-05T14:38:54Z

I think, because hypot(a, +-Inf) == Inf for all real a , so it is reasonable to request hypot(NaN, Inf) == Inf, and hypot(Int, NaN) == Inf. The example hypot(0/0, 0/0)is not in this category, buthypot(Nan, Nan)`.

jebej · 2019-05-05T15:55:16Z

You could make the similar argument that hypot(NaN,a) == NaN for any a. It makes sense to me that here the NaN should poison the result. We could also argue that hypot(NaN,Inf) == √(NaN^2+Inf^2) == NaN.

FWIW, MATLAB gives NaN, and Python gives Inf for this operation.

cfborges · 2019-05-05T15:55:25Z

Welcome, and thanks for the contribution, and the paper. Overall I think this looks pretty good.

Have you done any benchmarks? I was wondering if instead of frexp/ldexp, would it be faster to say doing something like
    if ax > sqrt(floatmax(T)/2)
        sx = ax * floatmin(T)
        sy = ay * floatmin(T)
        return sqrt(muladd(sx,sx,ay*ay)) / floatmin(T)
    elseif ay < sqrt(floatmin(T)
        sx = ax / floatmin(T)
        sy = ay / floatmin(T)
        return sqrt(muladd(sx,sx,ay*ay)) * floatmin(T)        
    else
        return sqrt(muladd(ax,ax,ay*ay))
    end
(here I am assuming that the compiler will convert the divisions into multiplications, which it should since floatmin(T) is a power of 2)?

One problem with this approach is that it can cause the other argument to go out of range. For example

x = 1e154
x > sqrt(floatmax(Float64)/2)
y = 1e147
y <= x*sqrt(eps(Float64)/2)
y = y*floatmin(Float64)
y < sqrt(floatmin(Float64))

Here x is too large and would overflow. Since y does not have a widely varying exponent we rescale. Unfortunately, the resulting y would then underflow on squaring.

One could use the square root of floatmin but that will be a problem if the exponent for that value is not even (it's even in Float64 but if this is to be fully general then it needs to work everywhere).

giordano · 2019-05-05T16:03:29Z

I think I kind of agree with @jebej:

julia> hypot(Inf, -1/0)
Inf

doesn't look necessarily a reasonable answer.

cfborges · 2019-05-05T17:14:06Z

I would add that if the motivation behind hypot(x,y) is that it is just a more stable way of doing sqrt(x*x+y*y) then any NaN should poison the computation just as it would in sqrt(x*x+y*y)

simonbyrne · 2019-05-05T20:07:31Z

The behaviour is specified in the IEEE754 standard (§9.2.1):

For the hypot function, hypot(±0, ±0) is +0, hypot(±∞, qNaN) is +∞, and hypot(qNaN, ±∞) is +∞.

My guess is that the logic is that NaN represents an unknown value, and since hypot(Inf, x) == Inf for all possible other values of x, then it should be for x=NaN as well. This is true for other functions, e.g.:

julia> NaN^0.0
1.0

giordano · 2019-05-05T21:07:41Z

My guess is that the logic is that NaN represents an unknown value, and since hypot(Inf, x) == Inf for all possible other values of x, then it should be for x=NaN as well.

This reasoning should hold also for addition, shouldn't it?

julia> Inf + NaN
NaN

StefanKarpinski · 2019-05-05T21:12:39Z

Inf + -Inf does not equal Inf.

giordano · 2019-05-05T21:14:39Z

Probably I got the idea now: Inf + any non negative real value (as it is in the case of hypot) will always be Inf, whatever the other value is, including ("non negative") NaN.

simonbyrne · 2019-05-05T21:20:22Z

One could use the square root of floatmin but that will be a problem if the exponent for that value is not even (it's even in Float64 but if this is to be fully general then it needs to work everywhere).

Good point. We could define a specific constant for this, or use something like eps(T)/floatmin(T)?

cfborges · 2019-05-06T16:10:07Z

Can't fight the standard so the inf issue is now compliant.

Decided to go with eps(sqrt(floatmin(T))) as the rescaling constant. It is always a power of the base and as long as the exponent range is roughly symmetric and sufficiently larger than the precision (true of all the standard floating point types) there won't be any problems.

I'll post the updates in a little bit.

Provides a fast and accurate implementation of hypot() that leverages the fused multiply add where available. The approach is explained and tested in detail in the paper: An Improved Algorithm for hypot(a,b) by Carlos F. Borges The article is available online at ArXiv at the link https://arxiv.org/abs/1904.09481

simonbyrne · 2019-05-20T04:18:08Z

Overall I think this looks good.

hypot is somewhat poorly tested: can you add some tests for large/small values? (currently the math related tests are in test/math.jl. e.g.

@test hypot(floatmax(T),1.0) == floatmax(T)

and something to check that we're not losing precision due to gradual underflow?

…cleaned up the comments.

cfborges · 2019-05-23T17:26:06Z

I have noticed one compiler related issue that I'd love to get some input on. Specifically, it appears that although there are three nearly identical calls to muladd in the code the compiler on my machine only turns the last one into a fused multiply add (may have something to do with constant propagation but I really don't know). The end result is that hypot(K*x,K*y) for a K that is some large power of 2 (or very negative power of 2) might not be the same as K*hypot(x,y) . This problem disappears if I replace muladd with fma. Would love to get some opinions on whether to just dump muladd and use fma instead. Since the fused multiply-add is so common on today's architectures it seems like this is the way to go even though someone using a platform without it would see a slowdown.

simonbyrne · 2019-05-23T17:42:33Z

That's interesting. @vchuravy any idea why this might happen?

One alternative that might help is to force them all through the same path, e.g.

    scale = one(T)
    if ax > sqrt(floatmax(T)/2)
        scale = 1/eps(sqrt(floatmin(T)))  #Rescaling constant
        ax = ax/scale
        ay = ay/scale
    elseif ay < sqrt(floatmin(T))
        scale = eps(sqrt(floatmin(T)))  #Rescaling constant
        ax = ax/scale
        ay = ay/scale
    end
    return scale*sqrt(muladd(ax,ax,ay*ay))

and add a test for the behaviour?

cfborges · 2019-05-23T17:44:00Z

I did that and it does work I just hate adding a multiply by one. It rubs me the wrong way.

But if you think that's better than going to an fma call then I'll make the change.

Co-Authored-By: Simon Byrne <simonbyrne@gmail.com>

… the accuracy. It is now far better than the clib hypot code even without a fused multiply add. If you look at the ArXiv paper there is a plot that shows the stunning performance difference. This comes at the cost of a two more multiplies and one more divide (and some adds). Still way cheaper than the clib code

…e. It is 10 times better than the C math library hypot function in this respect.

simonbyrne · 2019-06-12T18:56:39Z

Very nice! Should be good to merge once tests finish, unless you have more changes planned.

cfborges · 2019-06-12T19:11:45Z

Very nice! Should be good to merge once tests finish, unless you have more changes planned.

No more changes. I believe I've taken it as far as my skills allow. I kept hoping to find a simple form that was always correctly rounded (I didn't want to use any Veltkamp-Dekker type tricks) but this is as close as I could get. You can squeeze a tiny bit more out by adding more branches but that would slow it down for almost no gain so I don't see the point.

giordano · 2019-06-12T19:49:13Z

PR #30301 adds a few tests for edge cases that would be probably useful to make sure pass also in this PR.

simonbyrne · 2019-06-13T16:32:10Z

Thanks!

cfborges · 2019-06-13T16:47:34Z

Awesome. It's been fun. By the way, I do have a version that always returns the correctly rounded answer but it REQUIRES the fused multiply-add and gives poor results without it (results are similar to the Naive (Unfused) code in the paper if there is no fma). If that is of interest let me know how I might contribute it.

simonbyrne · 2019-06-13T17:00:42Z

I'm not sure: the simplest option is to create a simple package. I'm not sure if there are existing packages where it could fit (https://github.com/JuliaIntervals/CRlibm.jl maybe?). @dpsanders might have some suggestions.

What's the performance like vs what you currently have? We don't currently have a great way to feature gate (see #9855). There is currently a line in the code:

julia/base/special/log.jl

Line 144 in 4a04600

    
           const FMA_NATIVE = muladd(nextfloat(1.0),nextfloat(1.0),-nextfloat(1.0,2)) == -4.930380657631324e-32

but on my machine (which does have FMA) I get:

julia> Base.Math.FMA_NATIVE
false

so that doesn't really help.

StefanKarpinski · 2019-06-13T17:01:02Z

How does it do in terms of accuracy if the multiply add isn't fused? We have both fma (do the operation fused, even if that's slow) and muladd which uses an FMA instruction if present or just a normal multiply and add if there isn't an FMA instruction. We could always take the performance hit on older platforms that don't have an FMA instruction.

StefanKarpinski · 2019-06-13T17:03:08Z

julia> fma(nextfloat(1.0),nextfloat(1.0),-nextfloat(1.0,2))
4.930380657631324e-32

Wrong sign?

simonbyrne · 2019-06-13T17:04:35Z

ah ha!

@StefanKarpinski

As pointed out be @StefanKarpinski in #31922 (comment)

@StefanKarpinski

As pointed out be @StefanKarpinski in #31922 (comment)

cfborges · 2019-06-13T17:41:28Z

How does it do in terms of accuracy if the multiply add isn't fused? We have both fma (do the operation fused, even if that's slow) and muladd which uses an FMA instruction if present or just a normal multiply and add if there isn't an FMA instruction. We could always take the performance hit on older platforms that don't have an FMA instruction.

Without the fma it performs exactly like the Naive (Unfused) code from my paper. So one ulp errors about 17% of the time on normally distributed inputs (compare to 13% for the clib hypot). It requires 4 fma calls.

dpsanders · 2019-06-13T17:52:20Z

CRlibm.jl is just a wrapper around the CRlibm library and I would prefer to leave it as such.

But it would be great if we could start writing a CorrectRounding.jl library to replace it!

@StefanKarpinski

As pointed out be @StefanKarpinski in #31922 (comment)

stevengj · 2024-03-20T17:06:09Z

@cfborges, I notice that you published this code in ACM TOMS.

This is a bit of a concern, since if you assigned the copyright to ACM (the default, I think?), then the code (which you no longer own) is by default licensed under the ACM TOMS license, which is not free/open-source — it is only free for "noncommercial use".

However, there is the possibility for the author of the work to request that ACM TOMS release the code under a more liberal license, or possibly at the time of publication the author could request retain copyright ownership of the work. Did you do this?

(The situation is a bit murky to me if you contributed this code to Julia before assigning copyright to ACM, but it seems better to clarify the situation by ensuring that the ACM TOMS code for your article is released under a free/open-source license like MIT or BSD.)

cfborges · 2024-03-20T17:13:14Z

As a federal employee my work cannot be copyrighted. Cheers, Carlos

…

________________________________ From: Steven G. Johnson ***@***.***> Sent: Wednesday, March 20, 2024 10:06:31 AM To: JuliaLang/julia ***@***.***> Cc: Borges, Carlos (CIV) ***@***.***>; Mention ***@***.***> Subject: Re: [JuliaLang/julia] improved implementation of hypot(a,b) (#31922) NPS WARNING: *external sender* verify before acting. @cfborges<https://github.com/cfborges>, I notice that you published this code in ACM TOMS<https://dl.acm.org/doi/10.1145/3428446>. This is a bit of a concern, since if you assigned the copyright to ACM, then the code is by default licensed under the ACM TOMS license<https://www.acm.org/publications/policies/software-copyright-notice>, which is not free/open-source<https://scicomp.stackexchange.com/questions/2832/for-software-submitted-to-acm-toms-how-does-the-acm-software-license-agreement> — it is only free for "noncommercial use". However, there is the possibility for the author of the work to request that ACM TOMS release the code under a more liberal license, or possibly to retain copyright ownership of the work. Did you do this? — Reply to this email directly, view it on GitHub<#31922 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AL7MF72V7SFSYGYMEDSSMZDYZG6ZPAVCNFSM4HKXBUJKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBRGAYDQNBYGI4A>. You are receiving this because you were mentioned.Message ID: ***@***.***>

stevengj · 2024-03-20T17:18:17Z

Ah, great, that is covered by the ACM copyright release:

Assuming you checked that box, it should be all good? Looks like it, since there is a public-domain notice in your article text:

though it is not obvious when downloading the code.

It's weird that the ACM TOMS 1014 page of your article does not make this clear when you download the Supplemental Material (which doesn't list any copyright information that I can see?). You might want to email TOMS to request that they add a public domain notice on your Supplemental Material to clarify the copyright status of your code. Otherwise a reader might by default assume it is restricted by the ACM TOMS semi-free license.

cfborges · 2024-03-20T17:22:20Z

I did. And it’s obvious from my affiliation. Cheers, Carlos

…

________________________________ From: Steven G. Johnson ***@***.***> Sent: Wednesday, March 20, 2024 10:18:41 AM To: JuliaLang/julia ***@***.***> Cc: Borges, Carlos (CIV) ***@***.***>; Mention ***@***.***> Subject: Re: [JuliaLang/julia] improved implementation of hypot(a,b) (#31922) NPS WARNING: *external sender* verify before acting. Ah, great, that is covered by the ACM copyright release<https://www.acm.org/binaries/content/assets/publications/copyreleaseproc-8-16.pdf>: image.png (view on web)<https://github.com/JuliaLang/julia/assets/2913679/29f83cec-0231-4291-9a22-41292fa01a8e> Assuming you checked that box, it should be all good? It's weird that the ACM TOMS 1014 page of your article does not make this clear, nor does any public domain notice appear in the Supplemental Material if you download it. You might want to email TOMS to request that they add a public domain notice on your article to clarify the copyright status of your code. — Reply to this email directly, view it on GitHub<#31922 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AL7MF73YBKGTW5OAWH2V6ZTYZHAHDAVCNFSM4HKXBUJKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBRGAYTGMRWGA3A>. You are receiving this because you were mentioned.Message ID: ***@***.***>

stevengj · 2024-03-20T17:23:45Z

I edited my post to note that a public-domain notice indeed appears in your article text. But it would be nice for readers if ACM TOMS also showed such a notice on the Supplementary Material too.

simonbyrne requested a review from stevengj May 3, 2019 21:31

simonbyrne reviewed May 3, 2019

View reviewed changes

base/math.jl Outdated Show resolved Hide resolved

ararslan reviewed May 3, 2019

View reviewed changes

base/math.jl Outdated Show resolved Hide resolved

giordano mentioned this pull request May 6, 2019

hypot with complex arguments #31941

Closed

Adds functionality for complex inputs.

a59490b

cossio mentioned this pull request May 20, 2019

fix hypot with more than two arguments #30301

Closed

cfborges added 2 commits May 21, 2019 13:08

Adds some testing.

d71afbf

Cleaned up the logic a bit by eliminating a nested if structure. And …

aef05c8

…cleaned up the comments.

simonbyrne approved these changes May 21, 2019

View reviewed changes

StefanKarpinski changed the title ~~Replacement for hypot(a,b)~~ improved implementation of hypot(a,b) May 22, 2019

cfborges and others added 5 commits May 29, 2019 16:10

Update base/math.jl

15d6f3c

Co-Authored-By: Simon Byrne <simonbyrne@gmail.com>

Update base/math.jl

5a69e85

Co-Authored-By: Simon Byrne <simonbyrne@gmail.com>

Update base/math.jl

4ded5fb

Co-Authored-By: Simon Byrne <simonbyrne@gmail.com>

This version gives correctly rounded results more than 99% of the tim…

34a18b1

…e. It is 10 times better than the C math library hypot function in this respect.

simonbyrne merged commit 4a04600 into JuliaLang:master Jun 13, 2019

cfborges deleted the hypot branch June 13, 2019 16:47

simonbyrne added a commit that referenced this pull request Jun 13, 2019

Fix FMA_NATIVE constant

418a6fb

As pointed out be @StefanKarpinski in #31922 (comment)

simonbyrne mentioned this pull request Jun 13, 2019

Fix FMA_NATIVE constant #32318

Merged

simonbyrne added a commit that referenced this pull request Jun 13, 2019

Fix FMA_NATIVE constant

82a42b9

As pointed out be @StefanKarpinski in #31922 (comment)

simonbyrne added a commit that referenced this pull request Jun 14, 2019

Fix FMA_NATIVE constant (#32318)

50306e0

As pointed out be @StefanKarpinski in #31922 (comment)

musm mentioned this pull request Sep 11, 2019

Clean up updated hypot function #33224

Merged

KlausC mentioned this pull request Jun 18, 2020

hypot performance questionable #36353

Closed

timholy mentioned this pull request Aug 22, 2023

Dates: Documentation, tests, and changed arithmetics for DateTime #50816

Merged

improved implementation of hypot(a,b) #31922

improved implementation of hypot(a,b) #31922

Conversation

cfborges commented May 3, 2019

simonbyrne commented May 3, 2019 • edited Loading

giordano commented May 3, 2019

simonbyrne commented May 4, 2019

cfborges commented May 4, 2019

KlausC commented May 5, 2019 • edited Loading

jebej commented May 5, 2019

cfborges commented May 5, 2019 • edited Loading

giordano commented May 5, 2019

cfborges commented May 5, 2019 • edited Loading

simonbyrne commented May 5, 2019

giordano commented May 5, 2019

StefanKarpinski commented May 5, 2019

giordano commented May 5, 2019

simonbyrne commented May 5, 2019

cfborges commented May 6, 2019 • edited Loading

simonbyrne commented May 20, 2019

cfborges commented May 23, 2019 • edited Loading

simonbyrne commented May 23, 2019 • edited Loading

cfborges commented May 23, 2019 • edited Loading

simonbyrne commented Jun 12, 2019

cfborges commented Jun 12, 2019

giordano commented Jun 12, 2019

simonbyrne commented Jun 13, 2019

cfborges commented Jun 13, 2019

simonbyrne commented Jun 13, 2019

StefanKarpinski commented Jun 13, 2019

StefanKarpinski commented Jun 13, 2019

simonbyrne commented Jun 13, 2019

cfborges commented Jun 13, 2019

dpsanders commented Jun 13, 2019

stevengj commented Mar 20, 2024 • edited Loading

cfborges commented Mar 20, 2024 via email

stevengj commented Mar 20, 2024 • edited Loading

cfborges commented Mar 20, 2024 via email

stevengj commented Mar 20, 2024 • edited Loading

simonbyrne commented May 3, 2019 •

edited

Loading

KlausC commented May 5, 2019 •

edited

Loading

cfborges commented May 5, 2019 •

edited

Loading

cfborges commented May 5, 2019 •

edited

Loading

cfborges commented May 6, 2019 •

edited

Loading

cfborges commented May 23, 2019 •

edited

Loading

simonbyrne commented May 23, 2019 •

edited

Loading

cfborges commented May 23, 2019 •

edited

Loading

stevengj commented Mar 20, 2024 •

edited

Loading

stevengj commented Mar 20, 2024 •

edited

Loading

stevengj commented Mar 20, 2024 •

edited

Loading