Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test/numbers.jl fails on 32-bit platform #9847

Closed
rickhg12hs opened this issue Jan 20, 2015 · 27 comments · Fixed by #9898 or #9925
Closed

test/numbers.jl fails on 32-bit platform #9847

rickhg12hs opened this issue Jan 20, 2015 · 27 comments · Fixed by #9898 or #9925
Labels
system:32-bit Affects only 32-bit systems

Comments

@rickhg12hs
Copy link
Contributor

$ make test-numbers
    JULIA test/numbers
     * numbers             exception on 1: ERROR: LoadError: test failed: ("0.3" != "0.3")
 in expression: repr(0.1 + 0.2) != "0.3"
 in error at error.jl:19
 in default_handler at test.jl:27
 in do_test at test.jl:50
 in runtests at /usr/local/src/julia/julia/test/testdefs.jl:66
 in anonymous at multi.jl:642
 in run_work_thunk at multi.jl:603
 in remotecall_fetch at multi.jl:676
 in remotecall_fetch at multi.jl:691
 in anonymous at task.jl:1614
while loading numbers.jl, in expression starting on line 318
ERROR: LoadError: LoadError: test failed: ("0.3" != "0.3")
 in expression: repr(0.1 + 0.2) != "0.3"
 in error at error.jl:19
 in default_handler at test.jl:27
 in do_test at test.jl:50
 in runtests at /usr/local/src/julia/julia/test/testdefs.jl:66
 in anonymous at multi.jl:642
 in run_work_thunk at multi.jl:603
 in remotecall_fetch at multi.jl:676
 in remotecall_fetch at multi.jl:691
 in anonymous at task.jl:1614
while loading numbers.jl, in expression starting on line 318
while loading /usr/local/src/julia/julia/test/runtests.jl, in expression starting on line 42

make[1]: *** [numbers] Error 1
make: *** [test-numbers] Error 2
[Rick@steelers julia]$ ./julia -e 'versioninfo()'
Julia Version 0.4.0-dev+2824
Commit 6bc53e6* (2015-01-19 20:14 UTC)
Platform Info:
  System: Linux (i686-redhat-linux)
  CPU: Genuine Intel(R) CPU           T2250  @ 1.73GHz
  WORD_SIZE: 32
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Banias)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

Interestingly ...

$ ../julia
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.0-dev+2824 (2015-01-19 20:14 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 6bc53e6* (0 days old master)
|__/                   |  i686-redhat-linux

julia> repr(0.1 + 0.2)
"0.30000000000000004"

julia> include("testdefs.jl")
2-element Array{Union(UTF8String,ASCIIString),1}:
 "/usr/local/src/julia/julia/usr/local/share/julia/site/v0.4"
 "/usr/local/src/julia/julia/usr/share/julia/site/v0.4"      

julia> repr(0.1 + 0.2)
"0.30000000000000004"

julia> include("numbers.jl")
ERROR: LoadError: test failed: ("0.3" != "0.3")
 in expression: repr(0.1 + 0.2) != "0.3"
 in error at error.jl:19
 in default_handler at test.jl:27
 in do_test at test.jl:50
 in include at ./boot.jl:249
 in include_from_node1 at ./loading.jl:128
while loading /usr/local/src/julia/julia/test/numbers.jl, in expression starting on line 318

julia> repr(0.1 + 0.2)
"0.3"

@staticfloat
Copy link
Member

These are the reason we set a few make flags on our 32-bit builds. Specifically, we set JULIA_CPU_TARGET=pentium4 explicitly so that we don't run into weird floating-point issues. I believe this is due to a 32-bit processor using its 80-bit float registers, although I could be wrong.

@rickhg12hs
Copy link
Contributor Author

Why did repr(0.1 + 0.2) change in value before/after including test/numbers.jl?

@tkelman
Copy link
Contributor

tkelman commented Jan 20, 2015

Is something potentially leaking state in the numbers test wrt rounding mode maybe? I know if you run the numbers test under gdb there is a SIGFPE from GMP at some point in there.

@tkelman tkelman added the system:32-bit Affects only 32-bit systems label Jan 20, 2015
@rickhg12hs
Copy link
Contributor Author

get_rounding value doesn't seem to change.

$ ../julia 
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.0-dev+2824 (2015-01-19 20:14 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 6bc53e6* (0 days old master)
|__/                   |  i686-redhat-linux

julia> get_rounding(Float64)
Base.Rounding.RoundingMode{:Nearest}()

julia> 0.1 + 0.2
0.30000000000000004

julia> include("testdefs.jl")
2-element Array{Union(UTF8String,ASCIIString),1}:
 "/usr/local/src/julia/julia/usr/local/share/julia/site/v0.4"
 "/usr/local/src/julia/julia/usr/share/julia/site/v0.4"      

julia> 0.1 + 0.2
0.30000000000000004

julia> get_rounding(Float64)
Base.Rounding.RoundingMode{:Nearest}()

julia> include("numbers.jl")
ERROR: LoadError: test failed: ("0.3" != "0.3")
 in expression: repr(0.1 + 0.2) != "0.3"
 in error at error.jl:19
 in default_handler at test.jl:27
 in do_test at test.jl:50
 in include at ./boot.jl:249
 in include_from_node1 at ./loading.jl:128
while loading /usr/local/src/julia/julia/test/numbers.jl, in expression starting on line 318

julia> 0.1 + 0.2
0.3

julia> get_rounding(Float64)
Base.Rounding.RoundingMode{:Nearest}()

@rickhg12hs
Copy link
Contributor Author

Perhaps another clue: fma results are unaffected by including test/numbers.jl.

$ ../julia 
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.0-dev+2824 (2015-01-19 20:14 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 6bc53e6* (0 days old master)
|__/                   |  i686-redhat-linux

julia> fma(1.0,0.2,0.1)
0.30000000000000004

julia> include("testdefs.jl")
2-element Array{Union(UTF8String,ASCIIString),1}:
 "/usr/local/src/julia/julia/usr/local/share/julia/site/v0.4"
 "/usr/local/src/julia/julia/usr/share/julia/site/v0.4"      

julia> fma(1.0,0.2,0.1)
0.30000000000000004

julia> include("numbers.jl")
ERROR: LoadError: test failed: ("0.3" != "0.3")
 in expression: repr(0.1 + 0.2) != "0.3"
 in error at error.jl:19
 in default_handler at test.jl:27
 in do_test at test.jl:50
 in include at ./boot.jl:249
 in include_from_node1 at ./loading.jl:128
while loading /usr/local/src/julia/julia/test/numbers.jl, in expression starting on line 318

julia> fma(1.0,0.2,0.1)
0.30000000000000004

@staticfloat
Copy link
Member

Hmmm. Actually, I just noticed this happening even on builds with JULIA_CPU_TARGET=pentium4. It looks like this is due to 06e2137. @eschnett care to take a look at this?

@rickhg12hs
Copy link
Contributor Author

Looks like this is a side effect of fma.

$ ./julia 
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.0-dev+2824 (2015-01-19 20:14 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 6bc53e6* (0 days old master)
|__/                   |  i686-redhat-linux

julia> 0.1 + 0.2
0.30000000000000004

julia> fma(1.0, 1.0, 1.0)
2.0

julia> 0.1 + 0.2
0.3

julia> 

@rickhg12hs
Copy link
Contributor Author

Got some wacky rounding and fma action going on:

julia> 0.1 + 0.2
0.30000000000000004

julia> get_rounding(Float64)
Base.Rounding.RoundingMode{:Nearest}()

julia> fma(1.0, 1.0, 1.0)
2.0

julia> get_rounding(Float64)
Base.Rounding.RoundingMode{:Nearest}()

julia> 0.1 + 0.2
0.3

julia> set_rounding(Float64, RoundNearest)
0

julia> 0.1 + 0.2
0.30000000000000004

julia> 

@rickhg12hs
Copy link
Contributor Author

Could this be a libm bug?

julia> a = 1.0
1.0

julia> b = 1.0
1.0

julia> c = 1.0
1.0

julia> 0.1 + 0.2
0.30000000000000004

julia> ccall((:fma,"libopenlibm"),Cdouble,(Cdouble,Cdouble,Cdouble),a,b,c)
2.0

julia> 0.1 + 0.2
0.30000000000000004

julia> ccall((:fma,"libm"),Cdouble,(Cdouble,Cdouble,Cdouble),a,b,c)
2.0

julia> 0.1 + 0.2
0.3

julia> set_rounding(Float64, RoundNearest)
0

julia> 0.1 + 0.2
0.30000000000000004

julia> 

@nalimilan
Copy link
Member

Good catch. But your versioninfo() from above states LIBM: libopenlibm. So why would the bug appear when you run the tests? Which distribution are you using, BTW?

@rickhg12hs
Copy link
Contributor Author

I'm puzzled too regarding libm/libopenlibm. I don't know why fma hoses the rounding mode.

I am currently using 32-bit Fedora 19. I'll be upgrading to F21 soon since F19 has gone EOL.

@rickhg12hs
Copy link
Contributor Author

This is disturbing. Is there a test that could be run from the REPL that would determine whether libm or libopenlibm is being used? I'm beginning to doubt that libopenlibm is being used.

@staticfloat
Copy link
Member

You could profile a loop of math function calls and see what the backtrace
says.

On Wed, Jan 21, 2015, 20:00 rickhg12hs notifications@github.com wrote:

This is disturbing. Is there a test that could be run from the REPL that
would determine whether libm or libopenlibm is being used? I'm beginning
to doubt that libopenlibm is being used.


Reply to this email directly or view it on GitHub
#9847 (comment).

@rickhg12hs
Copy link
Contributor Author

Looks like libopenlibm is being used.

$ ./julia -e '@profile for k=1:1000000 sin(exp(1.0)) end;Profile.print(C=true,cols=90)' | grep libopen
              1   ...../lib/libopenlibm.so; __kernel_sin; (unknown line)
              104 ...../lib/libopenlibm.so; exp; (unknown line)
              5   ...../lib/libopenlibm.so; sin; (unknown line)
               78 ...../lib/libopenlibm.so; sin; (unknown line)
                21 ...../lib/libopenlibm.so; __ieee754_rem_pio2; (unknown line)
                42 ...../lib/libopenlibm.so; __kernel_sin; (unknown line)

@staticfloat
Copy link
Member

Can you do a grep libm to see if any other libm implementations are being used as well?

@rickhg12hs
Copy link
Contributor Author

Here's the grep libm for the previous example. Looks like it's just libopenlibm

$ ./julia -e '@profile for k=1:1000000 sin(exp(1.0)) end;Profile.print(C=true,cols=90)' | grep libm
              2   ...../lib/libopenlibm.so; __kernel_sin; (unknown line)
              101 ...../lib/libopenlibm.so; exp; (unknown line)
              2   ...../lib/libopenlibm.so; sin; (unknown line)
               1  ...../lib/libopenlibm.so; exp; (unknown line)
               75 ...../lib/libopenlibm.so; sin; (unknown line)
                23 ...ib/libopenlibm.so; __ieee754_rem_pio2; (unknown line)
                38 ...ib/libopenlibm.so; __kernel_sin; (unknown line)

For some reason fma never shows up in the profiler even when I loop over millions of calls.

@nalimilan
Copy link
Member

For some reason fma never shows up in the profiler even when I loop over millions of calls.

Maybe the effect of inlining?

@eschnett
Copy link
Contributor

fma is not called via libm. Instead, it is translated into an LLVM intrinsic. I don't know how LLVM implements fma if there is no hardware instruction -- this should be a function in a run-time library of LLVM, presumably called fma.

Since Julia loads openlibm dynamically, I doubt that LLVM uses it to resolve names. I thus assume that this call goes to the fma routine of the libm that's visible from the julia executable, although I cannot tell how that would work with cross-compiling.

I am right now trying to track down why an fma call (on a 64-bit architecture) yields the wrong result. I'm facing the same issue -- I don't know which fma is actually being called...

@eschnett
Copy link
Contributor

I also just remember that Julia's profiler strips certain "unimportant" functions from the backtrace. Maybe fma is accidentally among those.

@eschnett
Copy link
Contributor

(Apologies for the piecemeal information.) code_native tells me the address of the fma routine that is called, and Linux's /proc/*/maps tells me that this range is indeed mapped to /lib64/libm-2.12.so.

@staticfloat
Copy link
Member

So it sounds like we might need to force LLVM to use openlibm if possible?
-E

On Thu, Jan 22, 2015 at 4:52 PM, Erik Schnetter notifications@github.com
wrote:

(Apologies for the piecemeal information.) code_native tells me the
address of the fma routine that is called, and Linux's /proc/*/maps tells
me that this range is indeed mapped to /lib64/libm-2.12.so.


Reply to this email directly or view it on GitHub
#9847 (comment).

@eschnett
Copy link
Contributor

See #9890. There I propose to make that decision ourselves in Julia, and then either generate an intrinsic (if LLVM can translate that into an instruction), or into an openlibm libcall.

@simonbyrne
Copy link
Contributor

If I was to hazard a guess on what is happening: there are two floating point status registers, the old x87 one, and the the SSE one (mxcsr). The system libm fma code is changing one of these (and not setting it back), but the get_rounding code (which uses openlibm) checks the other one (which remains unchanged). Based on this comment it appears that openlibm checks the x87 register, so it is probably the SSE register that is causing the problems.

If someone wants to check it out, I have some code here for playing around with the SSE status register manually in julia:
https://gist.github.com/simonbyrne/9c1e4704be46b66b1485

@staticfloat
Copy link
Member

@simonbyrne I tried running that in Julia and it gave me:

ERROR: error compiling getmxcsr: Failed to parse LLVM Assembly:
julia: <string>:4:11: error: use of undefined value '@llvm.x86.sse.stmxcsr'
call void @llvm.x86.sse.stmxcsr(i32 * %ptr)

Is this because I'm using too old of a version of LLVM? (3.3)

@tkelman tkelman reopened this Jan 24, 2015
@rickhg12hs
Copy link
Contributor Author

Using @simonbyrne 's getmxcsr from https://gist.github.com/simonbyrne/9c1e4704be46b66b1485 , I can see that fma does change the SSE rounding mode.

julia> decmx(getmxcsr())
Flags:     100000
Den = 0:   0
Masks:     111111
Rounding:  00
Flush den: 0

julia> 0.1 + 0.2
0.30000000000000004

julia> fma(1.0,1.0,1.0)
2.0

julia> 0.1 + 0.2
0.3

julia> decmx(getmxcsr())
Flags:     100000
Den = 0:   0
Masks:     111111
Rounding:  11
Flush den: 0

julia> 

@simonbyrne
Copy link
Contributor

@staticfloat No, it's due to the problem with declaring LLVM intrinsics. Running the function a second time seems to make it work correctly.

@rickhg12hs Thanks, that's good to know.

@staticfloat
Copy link
Member

@simonbyrne On my 32-bit machine, I get the same results as @rickhg12hs, and I can affirm that set_rounding does indeed reset the rounding bits properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
system:32-bit Affects only 32-bit systems
Projects
None yet
6 participants