Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fma calls the system libm, not Julia's openlibm #9890

Closed
eschnett opened this issue Jan 23, 2015 · 19 comments · Fixed by #9898
Closed

fma calls the system libm, not Julia's openlibm #9890

eschnett opened this issue Jan 23, 2015 · 19 comments · Fixed by #9898

Comments

@eschnett
Copy link
Contributor

I am tracking down a test failure where fma returns a wrong result. This is a 64-bit Intel Linux system with glibc 2.12 installed.

code_native tells me that fma expands to a libcall on this system (as should be), and also tells me the address of this fma function, but without giving it a name. /proc/*/maps tells me that this address is part of a range where /lib64/libm-2.12.so is mapped. That is, Julia's fma call goes to the system libm, not to Julia's openlibm.

The reason is that we expand Julia's fma to an LLVM intrinsic, and the libcall is then generated by LLVM since there is no fma machine instruction available. LLVM chooses to call libm.

I suggest to check whether the system has an fma instruction when expanding Julia's fma function, and if not, expanding it to a call to openlibm instead of an LLVM intrinsic. This requires #9855 to be available.

This issue may be related to #9847.

@tkelman
Copy link
Contributor

tkelman commented Jan 23, 2015

I saw a similar fma-related test failure inside a Docker container of Centos 5 using the devtoolset, but the host system on Ubuntu 14.04 didn't reproduce.

@eschnett
Copy link
Contributor Author

The failure should depend on the libc version.

@vtjnash
Copy link
Member

vtjnash commented Jan 23, 2015

you should be able to set UNTRUSTED_SYSTEM_LIBM if you want it to be much more likely that all libm calls in julia end up getting redirected to openlibm (this is necessary on windows, for example)

@eschnett
Copy link
Contributor Author

Who looks at this variable -- Julia or LLVM? How does LLVM know about openlibm?

@vtjnash
Copy link
Member

vtjnash commented Jan 23, 2015

it's a makefile variable

@tkelman
Copy link
Contributor

tkelman commented Jan 23, 2015

It statically links libopenlibm.a into libjulia the julia executable, last time I looked at it

@ViralBShah
Copy link
Member

The real solution is for openlibm to have a Makefile variable that allows olm_ prefixes.

@tkelman
Copy link
Contributor

tkelman commented Jan 23, 2015

Probably a good idea. We should be able to apply the same method I used for OpenBLAS.

@nalimilan
Copy link
Member

But that won't fix things for distribution packages. :-/ Or should we ship both standard names and names with the olm_ prefix?

@eschnett
Copy link
Contributor Author

Setting UNTRUSTED_SYSTEM_LIBM=1 did not help. The resulting julia executable does have fma linked in statically, but LLVM still chooses a dynamically linked in libm at run time.

@vtjnash
Copy link
Member

vtjnash commented Jan 23, 2015

I'm not sure how that would happen, unless libLLVM is also dynamically linked. You can't both distrust your system libm but trust your system llvm.

@eschnett
Copy link
Contributor Author

I installed LLVM 3.5.1 manually. Yes, it's linked dynamically -- good point.

@MikaelSlevinsky
Copy link

Bizarre fma behaviour. I wrote simple Horner macros using Base.fma_llvm and Base.fma_libm. The LLVM result is incorrect, and the LIBM result is 100x slower:

julia> versioninfo()
Julia Version 0.4.1-pre+22
Commit 669222e* (2015-11-01 00:06 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

julia> macro hornerfma_llvm(x, p...)
           ex = esc(p[end])
           for i = length(p)-1:-1:1
               ex = :(Base.fma_llvm(t, $ex, $(esc(p[i]))))
           end
           Expr(:block, :(t = $(esc(x))), ex)
       end

julia> macro hornerfma_libm(x, p...)
           ex = esc(p[end])
           for i = length(p)-1:-1:1
               ex = :(Base.fma_libm(t, $ex, $(esc(p[i]))))
           end
           Expr(:block, :(t = $(esc(x))), ex)
       end

julia> myexp1(x::Float64) = @hornerfma_llvm(x,1.0,1.0,0.5,0.16666666666666666,0.041666666666666664,0.008333333333333333,0.001388888888888889,0.0001984126984126984,2.48015873015873e-5,2.7557319223985893e-6,2.755731922398589e-7,2.505210838544172e-8,2.08767569878681e-9,1.6059043836821613e-10,1.1470745597729725e-11,7.647163731819816e-13,4.779477332387385e-14,2.8114572543455206e-15,1.5619206968586225e-16)
myexp1 (generic function with 1 method)

julia> myexp2(x::Float64) = @hornerfma_libm(x,1.0,1.0,0.5,0.16666666666666666,0.041666666666666664,0.008333333333333333,0.001388888888888889,0.0001984126984126984,2.48015873015873e-5,2.7557319223985893e-6,2.755731922398589e-7,2.505210838544172e-8,2.08767569878681e-9,1.6059043836821613e-10,1.1470745597729725e-11,7.647163731819816e-13,4.779477332387385e-14,2.8114572543455206e-15,1.5619206968586225e-16)
myexp2 (generic function with 1 method)

julia> @vectorize_1arg Float64 myexp1
myexp1 (generic function with 4 methods)

julia> @vectorize_1arg Float64 myexp2
myexp2 (generic function with 4 methods)

julia> myexp1(1.0)
2.528361447231352

julia> myexp2(1.0)
2.718281828459045

julia> x = linspace(-1,1,1_000_000);

julia> exp(x);@time exp(x);
  0.018781 seconds (6 allocations: 7.630 MB)

julia> myexp1(x);@time myexp1(x);
  0.012698 seconds (6 allocations: 7.630 MB)

julia> myexp2(x);@time myexp2(x);
  1.020566 seconds (6 allocations: 7.630 MB)

@simonbyrne
Copy link
Contributor

@MikaelSlevinsky You have a Haswell machine, which has a native fma instruction, so LLVM is able to use that. The openlibm fma is implemented in software in terms of basic arithmetic and bit twiddling functions, see here. This is going to be much slower than a hardware fma, or even seperate multiply-add instructions.

Even if the libm fma did call the hardware instruction, it would still likely be slightly slower due to the overhead of the function call (this is the same reason we call the llvm sqrt instead of the libm one)

@MikaelSlevinsky
Copy link

Thanks, that clarifies the timing. But why is LLVM FMA incorrect in the macro?

@simonbyrne
Copy link
Contributor

I'm not sure why you get that: I get the correct answer with the same code. Maybe try it on 0.4.2?

@simonbyrne
Copy link
Contributor

Also, the built-in @horner macro uses the muladd function, which should use fma instructions in this case, but looking at @code_native it doesn't seem to work (at least on my Haswell machine). Hopefully this will get fixed when we upgrade LLVM.

@MikaelSlevinsky
Copy link

Ok, thanks. Merry Christmas!

@eschnett
Copy link
Contributor Author

Julia 0.4 lowers muladd to LLVM 3.3 multiply and add instructions, setting flags that they can be combined. Apparently, LLVM 3.3 does not do this. LLVM 3.7 offers a special muladd intrinsic that Julia uses there, leading to a native fma instruction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants