-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression with non-integer powers (v1.6 and newer) #39976
Comments
is a bit smaller repro, isolating it to the llvmcall |
Here's my attempt to bisect this with
|
@staticfloat, does that bisect makes sense to you? |
This is hilarious. Apparently, You can verify this by using |
@staticfloat Is there anything that should be done here? If the |
We just need @oscardssmith to write a Julia-native |
I'll see what I can do. 2 argument functions are about 1000x harder to verify though so it might take a bit. |
The first thing I would need for this is someone to do the gpl dance with me on https://github.com/JuliaMath/openlibm/blob/master/src/e_pow.c (DO NOT OPEN IF YOU DON"T WANT THE GPL TO CONTAMINATE YOU). Any volunteers? |
GPL dance as in clean-room implementor? I'd be up for it, provided there are no specific OS/system requirements (currently only Windows and/or WSL available here). I'd just need some information on when/how this would be done. |
Thanks so much! Can you DM me in slack or discord? |
Why GPL? Am I missing something or there is no GPL anywhere in that code? |
yeah, did you mean to send a different link?
|
Wait really? I thought openlibm was gpl? If not, then that makes this much easier. Edit: I just assumed based on the library. I didn't actually open the link. |
Not at all. Only a couple of test files are under LGPL (!= GPL), the rest has different more permissive licenses: https://github.com/JuliaMath/openlibm/blob/f052f42bb393918acd6fd51ec1f018c5679dfe30/LICENSE.md |
This bites in many applications in economics, where power utility ( FWIW this also bites with negative integer powers:
vs
|
Can we go back to using the libm so that things are the way they were? This big of a regression in basic functionality stings quite a lot. |
Just here to bump this. |
Hi, bumping again. |
Hi, |
I really hope this gets fixed. I am stuck on 1.5.4 cause of a performance hit even in 1.6.3. |
Should eventually be fixed by #42271? |
The situation has improved significantly on
This was tested on all three versions with the following bash loop:
We're not quite to the same speed as the old |
darn, I was hoping I'd matched 1.5 |
On the plus side, your
But of course, this issue is specifically `^(::Float64, ::Float64) |
well float^float calls float^int when applicable... also since it's power by squaring, -2 is a decent bit faster than a bigger exponent. |
Thanks for the work on this. This is good enough that I can finally switch my production code from 1.5 to 1.8 On my end we (you) are halfway there:
Tagging @Jovansam and @Pramodh-G so they see the improvement and @oscardssmith to say thanks <3 |
Try out 1.9. It should be in the ~18ns range. (I sped up extended precision |
The nightly is even faster than 1.5! |
I believe that means we can close this. There's still some more that can be done here, but we are no longer slower on linux and windows (I think M1 mac is still faster with the Mac libm) |
@oscardssmith many thanks for all the work you have put into this. Much appreciated. @eirikbrandsaas jealous of your use case.
|
That's very odd. On my computer I see 1.3 seconds for 1.9 (technically 1.10 but there haven't been any changes that would affect this since 1.9) compared to 4.8 seconds on 1.6 |
Very interesting. I must be doing something wrong then? How are you running the code. I am using include("...") in the 1.9 REPL on Windows 11 64-bit. |
What do you get for the originally reported regression (i.e.)
on 1.5, 1.6 and 1.9? Also, what processor do you have? |
1.10 = 47.785 ns Processor: Intel Core i7-8750H @ 2.20 GHz |
Just to point out the somewhat obvious; it appears in the benchmark you just ran, you're getting significantly better performance in 1.10 than in 1.5 (as expected in this issue). I suggest you open a new issue for your code, as the problem is likely somewhere else than in the non-integer powers. |
however those numbers make no sense since 1.10 should be strictly better than 1.8. They're both pure Julia implementations and the 1.10 one is faster on everything I've tested it on. |
Many thanks both. Run again and obtained same ranking. I will try on a different computer later today and report. |
I have now tested my use case pasted above on a second laptop with 11th Gen Intel Core i5-1135G7 @ 2.4 GHz processor. This laptop is less stacked than my other laptop, but for the slightly faster processor. Julia 1.5 computes in 0.7 ns, 1.8 in 3 s, and 1.10 in 2s. 1.5 is 6 and 4 times faster than 1.8 and 1.10 respectively. |
are you sure your example is botlenecked by floating point pow? If the pure floating point pow is faster on 1.6/1.8 than 1.5 and your code is slower, I have a hard time believing that floating point pow is the problem. |
Not sure if a new issue should be created or if this should be re-opened... but there's a discourse post indicating that 1.10 is 2-4x slower than prior versions: https://discourse.julialang.org/t/exponentiation-of-floats/105951/18 And further comparison: 1.10.1 is ~2x slower than numpy and ~6x slower than pytorch on a Mac M2:
julia> @btime m .^ (1/12) setup=m=rand(10000,241)
29.251 ms (2 allocations: 18.39 MiB) Timing for v1.9.3 was 26.246ms. I echo the sentiment above that fast powers is important in economics/finance. Edit - some more testing: Nightly (b8a0a39) is a little bit slower than 1.10: julia> @btime m .^ (1/12) setup=m=rand(10000,241)
31.360 ms (3 allocations: 18.39 MiB) AppleAccelerate is 3x faster:
|
This should be fixed in 1.11. #52079 |
On an M2 Mac, the scalar power timing doesn't seem to match what that issue describes (see below). Plus it also doesn't explain the vector performance vs numpy/pytorch. This seems to be consistent with the OP in the linked issue, which shows numpy calculating 100 floats in a vector in the time it would take Julia to do 50. julia> @btime x^y setup=begin x=rand(); y=rand()end #1.9.4
10.844 ns (0 allocations: 0 bytes)
julia> @btime x^y setup=begin x=rand(); y=rand()end # 1.10.1
11.720 ns (0 allocations: 0 bytes)
julia> @btime x^y setup=begin x=rand(); y=rand()end #nightly
11.887 ns (0 allocations: 0 bytes) |
With the 1.6.0-rc1 release candidate and the most recent build from master, I have observed a performance regression when taking non-integer powers of floats on Linux (Fedora 33 and Ubuntu 20.04 behind WSL2). Although the MWE is limited to Float64, similar behavior can be observed with Float32.
MWE on 1.6.0-rc1:
MWE on 1.5.3:
I wonder whether this issue is related to LLVM since the output of
@code_llvm
is pretty much identical on both 1.6.0-rc1 and 1.5.3.1.6.0-rc1:
1.5.3:
The text was updated successfully, but these errors were encountered: