Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed hypot(x::Number...) for under/overflow #27251

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 40 additions & 1 deletion base/math.jl
Original file line number Diff line number Diff line change
Expand Up @@ -472,6 +472,7 @@ Stacktrace:
[...]
```
"""
hypot() = throw(ArgumentError("at least one argument is required"))
hypot(x::Number, y::Number) = hypot(promote(x, y)...)
function hypot(x::T, y::T) where T<:Number
ax = abs(x)
Expand Down Expand Up @@ -503,7 +504,45 @@ end

Compute the hypotenuse ``\\sqrt{\\sum x_i^2}`` avoiding overflow and underflow.
"""
hypot(x::Number...) = sqrt(sum(abs2(y) for y in x))
hypot(x::Number...) = hypot(promote(x...)...)
function hypot(x::T...) where T<:Number

# compute infnorm x (modeled on generic_vecnormMinusInf(x) in LinearAlgebra/generic.gl)
(v, s) = iterate(x)::Tuple
maxabs = abs(v)
while true
y = iterate(x, s)
y === nothing && break
(v, s) = y
vnorm = abs(v)
maxabs = ifelse(isnan(maxabs) | (maxabs > vnorm), maxabs, vnorm)
end
maxabsf = float(maxabs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the advantage of this over:

maxabsf = float(maximum(abs, x))
isnan(maxabsf) && return maxabsf


# compute vecnorm2(x) (modeled on generic_vecnorm2(x) in LinearAlgebra/generic.gl)
(maxabsf == 0 || isinf(maxabsf)) && return maxabsf
(v, s) = iterate(x)::Tuple
Tfloat = typeof(maxabsf)
if isfinite(length(x)*maxabsf*maxabsf) && maxabsf*maxabsf != 0 # Scaling not necessary
sum::promote_type(Float64, Tfloat) = abs2(v)
while true
y = iterate(x, s)
y === nothing && break
(v, s) = y
sum += abs2(v)
end
return convert(Tfloat, sqrt(sum))
Copy link
Member

@stevengj stevengj May 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why write out the loop, instead of just doing

return oftype(sqrt(sum(x -> oftype(maxabsf, abs2(x)), x)), maxabsf)

else
sum = (abs(v)/maxabsf)^2
while true
y = iterate(x, s)
y === nothing && break
(v, s) = y
sum += (abs(v)/maxabsf)^2
end
return convert(Tfloat, maxabsf*sqrt(sum))
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, why not just:

return oftype(sqrt(sum(x -> (abs(x)/maxabsf)^2, x)), maxabsf)

Copy link
Contributor Author

@johnfgibson johnfgibson May 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied generic_vecnorm2 from generic.jl in LinearAlgebra, inlining the call to vecnormInf on the way. I take it your compact code is as efficient. Let me pull it apart to understand it better and verify with some benchmarking.

function generic_vecnorm2(x)
maxabs = vecnormInf(x)
(maxabs == 0 || isinf(maxabs)) && return maxabs
(v, s) = iterate(x)::Tuple
T = typeof(maxabs)
if isfinite(_length(x)*maxabs*maxabs) && maxabs*maxabs != 0 # Scaling not necessary
sum::promote_type(Float64, T) = norm_sqr(v)
while true
y = iterate(x, s)
y === nothing && break
(v, s) = y
sum += norm_sqr(v)
end
return convert(T, sqrt(sum))
else
sum = abs2(norm(v)/maxabs)
while true
y = iterate(x, s)
y === nothing && break
(v, s) = y
sum += (norm(v)/maxabs)^2
end
return convert(T, maxabs*sqrt(sum))
end
end
@

Copy link
Contributor Author

@johnfgibson johnfgibson May 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrolling the loops as in the PR is faster and smaller in memory than calling sum(x -> oftype(maxabsf, abs2(x)), x) etc. For four arguments it's ~~~three orders of magnitude~~~ (oops) thirty times faster. As n gets into the thousands the ratio asymptotes to about two.

julia> x = randn(4);

julia> @benchmark hypot_unrolled(x...)
BenchmarkTools.Trial: 
  memory estimate:  80 bytes
  allocs estimate:  5
  --------------
  minimum time:     127.528 ns (0.00% GC)
  median time:      128.967 ns (0.00% GC)
  mean time:        142.552 ns (8.53% GC)
  maximum time:     55.249 μs (99.72% GC)
  --------------
  samples:          10000
  evals/sample:     886

julia> @benchmark hypot_callsum(x...)
BenchmarkTools.Trial: 
  memory estimate:  560 bytes
  allocs estimate:  21
  --------------
  minimum time:     3.965 μs (0.00% GC)
  median time:      4.032 μs (0.00% GC)
  mean time:        4.902 μs (16.56% GC)
  maximum time:     6.260 ms (99.86% GC)
  --------------
  samples:          10000
  evals/sample:     8

julia> versioninfo()
Julia Version 0.7.0-DEV.5219
Commit fef8b1fffa* (2018-05-25 17:26 UTC)
Platform Info:
  OS: Linux (x86_64-suse-linux)
  CPU: Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, sandybridge)

for code

hypot_unrolled(x::Number...) = hypot_unrolled(promote(x...)...)
function hypot_unrolled(x::T...) where T<:Number

    # compute infnorm x (modeled on generic_vecnormMinusInf(x) in LinearAlgebra/generic.gl)
    (v, s) = iterate(x)::Tuple
    maxabs = abs(v)
    while true
        y = iterate(x, s)
        y === nothing && break
        (v, s) = y
        vnorm = abs(v)
        maxabs = ifelse(isnan(maxabs) | (maxabs > vnorm), maxabs, vnorm)
    end
    maxabsf = float(maxabs)

    # compute vecnorm2(x) (modeled on generic_vecnorm2(x) in LinearAlgebra/generic.gl)
    (maxabsf == 0 || isinf(maxabsf)) && return maxabsf
    (v, s) = iterate(x)::Tuple
    Tfloat = typeof(maxabsf)
    if isfinite(length(x)*maxabsf*maxabsf) && maxabsf*maxabsf != 0 # Scaling not necessary
        sum::promote_type(Float64, Tfloat) = abs2(v)
        while true
            y = iterate(x, s)
            y === nothing && break
            (v, s) = y
            sum += abs2(v)
        end
        return convert(Tfloat, sqrt(sum))
    else
        sum = (abs(v)/maxabsf)^2
        while true
            y = iterate(x, s)
            y === nothing && break
            (v, s) = y
            sum += (abs(v)/maxabsf)^2
        end
        return convert(Tfloat, maxabsf*sqrt(sum))
    end
end

hypot_callsum(x::Number...) = hypot_callsum(promote(x...)...)
function hypot_callsum(x::T...) where T<:Number

    maxabsf = float(maximum(abs, x))
    isnan(maxabsf) && return maxabsf

    (maxabsf == 0 || isinf(maxabsf)) && return maxabsf

    if isfinite(length(x)*maxabsf*maxabsf) && maxabsf*maxabsf != 0 # Scaling not necessary
        return oftype(maxabsf, sqrt(sum(x -> oftype(maxabsf, abs2(x)), x)))
    else
        return oftype(maxabsf, maxabsf*sqrt(sum(x -> (abs(x)/maxabsf)^2, x)))
    end
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't benchmark calls involving a splatted global. Try

tst_hypot(x) = hypot(x...)
@btime tst_hypot($x)

etcetera

Copy link
Contributor Author

@johnfgibson johnfgibson May 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still a factor of sixteen (I misplaced a decimal when I reported 10^3).

julia> x = randn(4);

julia> tst_hypot_unrolled(x) = hypot_unrolled(x...);

julia> tst_hypot_callsum(x) = hypot_callsum(x...);

julia> @btime tst_hypot_unrolled($x)
  126.336 ns (5 allocations: 80 bytes)
1.125466836605211

julia> @btime tst_hypot_callsum($x)
  2.070 μs (21 allocations: 560 bytes)
1.125466836605211

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hypot_callsum is better in terms of code size and maintainability, but it seems to me that hypot_unrolled (which matches the current PR for base/math.jl's hypot) is more in tune with Julia's aim of optimizing speed and accuracy simultaneously.

An alternative, if no one likes adding this function to base, is to remove it and require LinearAlgebra for variable-argument hypot calls. Then this function could be implemented as a call to vecnorm2, as it was prior to #27141.

end

"""
atan2(y, x)
Expand Down
4 changes: 4 additions & 0 deletions test/math.jl
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,10 @@ end
@test isnan_type(T, log1p(convert(T,NaN)))
@test_throws DomainError log1p(convert(T,-2.0))
@test hypot(T(0), T(0)) === T(0)
@test hypot(1/prevfloat(typemax(T)), T(0)) == 1/prevfloat(typemax(T))
@test hypot(1/prevfloat(typemax(T)), T(0), T(0)) == 1/prevfloat(typemax(T))
@test hypot(prevfloat(typemax(T)), T(0)) == prevfloat(typemax(T))
@test hypot(prevfloat(typemax(T)), T(0), T(0)) == prevfloat(typemax(T))
@test hypot(T(Inf), T(Inf)) === T(Inf)
@test hypot(T(Inf), T(x)) === T(Inf)
@test hypot(T(Inf), T(NaN)) === T(Inf)
Expand Down