Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base.two_mul is constant folding incorrectly on 1.10 (but not 1.9 or 1.11) #52079

Closed
Moelf opened this issue Nov 8, 2023 · 17 comments · Fixed by #52194
Closed

Base.two_mul is constant folding incorrectly on 1.10 (but not 1.9 or 1.11) #52079

Moelf opened this issue Nov 8, 2023 · 17 comments · Fixed by #52194
Labels
maths Mathematical functions performance Must go faster regression Regression in behavior compared to a previous version
Milestone

Comments

@Moelf
Copy link
Contributor

Moelf commented Nov 8, 2023

It's useful for something: https://cds.cern.ch/record/2866130/files/ANA-EXOT-2022-18-PAPER.pdf#page=9

(this is a fitting, so you want this exponential to be very fast)

julia> @benchmark x^y setup=begin x=rand(); y=rand()end
BenchmarkTools.Trial: 10000 samples with 988 evaluations.
 Range (min  max):  47.248 ns  440.931 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     47.567 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   49.616 ns ±   4.787 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▅█▇▁     ▂▁       ▁▁▁                ▆▇▄       ▁             ▂
  ████▁▃▄▁▇██▇▇▆▆▆▆▇███▇▇▆▆▆▆▆▅▇▆▆▆▆▆▆█████▇▇▇▇▇███▆▆▆▅▅▆▅▇▇█▇ █
  47.2 ns       Histogram: log(frequency) by time      55.5 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark x^y setup=begin x=rand(); y=rand(3:9)end
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min  max):  2.408 ns  20.031 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     9.003 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   8.631 ns ±  2.849 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇                           ▇       █     █   ▂
  █▃▂▂▂▂▂▁▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▂█▂▂▁▃▂▇▄█▇▂▂▂▃█▆▂▂█▃▃▇▂▇▆▂▂▂▂▂ ▃
  2.41 ns        Histogram: frequency by time        12.7 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Maybe we're hitting the physical limit of how fast this can go?

In [2]: a=np.random.rand(100);

In [3]: b=np.random.rand(100);

In [5]: %timeit np.power(a,b)
514 ns ± 3.64 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
@oscardssmith
Copy link
Member

oscardssmith commented Nov 8, 2023

What CPU is that? I see timings of around 20ns on my laptop (11th gen Intel). I think there's probably a few ns that could be shaved (roughly 1-2 on my laptop), primarily in our exp implementation which isn't quite optimal. That said, this is largely expected. Floating point exponentiation is somewhat inherently slow, primarily because the "only" workable formula is x^y = exp(log(x)*y) which puts a pretty strict upper bound on speed (and it's even worse because it turns out you need roughly 12 extra bits of accuracy in the log computation because exp magnifies errors for larger inputs.

It's also possibly worth noting that Float32 exponentiation is a lot faster since it can use Float64 for extended precision rather than needing to rely on double-doubles.

@Richard-Sti
Copy link

Richard-Sti commented Nov 8, 2023

There is another working example related to this to consider which higlights this issue:

using BenchmarkTools

function test()
    x = 1.
    for i in 1:100_000_000
        x += x^2.
    end
    return x
end

@btime test()

Thir runs in ~220 ms on Julia 1.7.3, ~900 ms on Julia 1.9.x and ~2000 ms on Julia 1.10.x so there is a factor of few slow-down compared to the older version.

@Moelf
Copy link
Contributor Author

Moelf commented Nov 8, 2023

I see timings of around 20ns on my laptop (11th gen Intel).

damn am I using fake Intel?

Vendor ID:               GenuineIntel
  Model name:            11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
    CPU family:          6
    Model:               140

@oscardssmith
Copy link
Member

What Julia version are you testing on?

@Moelf
Copy link
Contributor Author

Moelf commented Nov 8, 2023

Version 1.10.0-rc1 (2023-11-03)

@oscardssmith
Copy link
Member

Ok something is actually very wrong here. 1.10 is 2x slower than 1.9 or nightly and I have no idea why.

@oscardssmith oscardssmith added this to the 1.10 milestone Nov 8, 2023
@oscardssmith oscardssmith added performance Must go faster regression Regression in behavior compared to a previous version maths Mathematical functions labels Nov 8, 2023
@Moelf
Copy link
Contributor Author

Moelf commented Nov 8, 2023

well, at least 1.7 was equally slow (do we have benchmark tests we can add to?)

@stevengj
Copy link
Member

stevengj commented Nov 8, 2023

On my computer (Apple M1 Max), I find that 1.9 is faster than 1.8 or nightly. Note also that there is a special-case optimization for x^y if y is an integer-valued float, which is worth benchmarking separately.

On Julia 1.8:

julia> @btime x^y setup=begin x=rand(); y=rand(); end;
  15.698 ns (0 allocations: 0 bytes)

julia> @btime $(Ref(1.2))[]^$(Ref(3.4))[];
  15.782 ns (0 allocations: 0 bytes)

julia> @btime $(Ref(1.2))[]^$(Ref(3.0))[];
  3.958 ns (0 allocations: 0 bytes)

On Julia 1.9 (fastest):

julia> @btime $(Ref(1.2))[]^$(Ref(3.4))[];
  11.637 ns (0 allocations: 0 bytes)

julia> @btime $(Ref(1.2))[]^$(Ref(3.0))[];
  3.916 ns (0 allocations: 0 bytes)

On nightly (1.11.0-DEV.860):

julia> @btime $(Ref(1.2))[]^$(Ref(3.4))[];
  14.863 ns (0 allocations: 0 bytes)

julia> @btime $(Ref(1.2))[]^$(Ref(3.0))[];
  5.167 ns (0 allocations: 0 bytes)

@Zentrik
Copy link
Member

Zentrik commented Nov 8, 2023

Some discussion here as well https://discourse.julialang.org/t/exponentiation-of-floats/105951/22, seems to be some sort of weird bug.

@oscardssmith oscardssmith changed the title Exponential by a float is very slow Base.two_mul is constant folding incorrectly on 1.10 (but not 1.9 or 1.11) Nov 8, 2023
@oscardssmith
Copy link
Member

The MWE is (on 1.10)

julia> @btime x^y setup=begin x=rand(); y=rand()end;
  42.232 ns (0 allocations: 0 bytes)

julia> @eval Base.Math @assume_effects :consistent @inline function two_mul(x::Float64, y::Float64)
           if Core.Intrinsics.have_fma(Float64)
               xy = x*y
               return xy, fma(x, y, -xy)
           end
           return Base.twomul(x,y)
       end
two_mul (generic function with 3 methods)

julia> @btime x^y setup=begin x=rand(); y=rand()end;
  17.039 ns (0 allocations: 0 bytes)

@gbaraldi
Copy link
Member

gbaraldi commented Nov 8, 2023

The big regression happened in 0f5f62c @vchuravy @pchintalapudi

@stevengj
Copy link
Member

stevengj commented Nov 8, 2023

So it's some kind of race condition associated with parallel image generation? That would explain why simply recompiling two_mul fixes the problem.

@gbaraldi
Copy link
Member

gbaraldi commented Nov 8, 2023

I'm not sure if it's a race condition or just a plain bug.

@Moelf
Copy link
Contributor Author

Moelf commented Nov 8, 2023

What kind of bug though, like it seems to be fixed by recompile the exact same function?

@gbaraldi
Copy link
Member

gbaraldi commented Nov 8, 2023

A bug in our multiversioning code.

@Moelf
Copy link
Contributor Author

Moelf commented Nov 9, 2023

If this is also fixed on 1.11 can we bisect forward to find the fix?

@gbaraldi
Copy link
Member

So this got bisected to 0f5f62c @pchintalapudi and @vchuravy

oscardssmith pushed a commit that referenced this issue Nov 24, 2023
So after struggling with this for a long while it seems there were two
different issues. The first one we lacked coverage over, but the other
was a very subtle issue when we sorted the fptrs.

~I still need to add test that does multiversioning where we call
between multiversioned functions~

Fixes #52079
KristofferC pushed a commit that referenced this issue Nov 27, 2023
So after struggling with this for a long while it seems there were two
different issues. The first one we lacked coverage over, but the other
was a very subtle issue when we sorted the fptrs.

~I still need to add test that does multiversioning where we call
between multiversioned functions~

Fixes #52079

(cherry picked from commit a386cd1)
KristofferC pushed a commit that referenced this issue Nov 27, 2023
So after struggling with this for a long while it seems there were two
different issues. The first one we lacked coverage over, but the other
was a very subtle issue when we sorted the fptrs.

~I still need to add test that does multiversioning where we call
between multiversioned functions~

Fixes #52079

(cherry picked from commit a386cd1)
mkitti pushed a commit to mkitti/julia that referenced this issue Dec 9, 2023
…g#52194)

So after struggling with this for a long while it seems there were two
different issues. The first one we lacked coverage over, but the other
was a very subtle issue when we sorted the fptrs.

~I still need to add test that does multiversioning where we call
between multiversioned functions~

Fixes JuliaLang#52079
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maths Mathematical functions performance Must go faster regression Regression in behavior compared to a previous version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants