Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test results inconsistent between machines #27630

Closed
PetrKryslUCSD opened this issue Jun 18, 2018 · 25 comments
Closed

Test results inconsistent between machines #27630

PetrKryslUCSD opened this issue Jun 18, 2018 · 25 comments
Labels
bug Indicates an unexpected problem or unintended behavior

Comments

@PetrKryslUCSD
Copy link

PetrKryslUCSD commented Jun 18, 2018

For my package FinEtools, tests on Linux yield different results for the same Julia version on different machines. In this constructor the first assert fires on Travis CI ( Linux 64-bit). However, it does not fire when the @shows above the assert line are uncommented(!?). (The @shows show that the types are identical.)

function FEMMDeforLinear(mr::Type{MR}, integdata::IntegData{S, F}, material::M) where {MR<:DeforModelRed, S<:FESet, F<:Function, M<:MatDefor}
    # @show mr 
    # @show material.mr
    @assert mr === material.mr "Model reduction is mismatched"
    @assert (integdata.axisymmetric) || (mr != DeforModelRed2DAxisymm) "Axially symmetric requires axisymmetric to be true"
    return FEMMDeforLinear(mr, integdata, CSys(manifdim(integdata.fes)), material)
end

Tested on my local Linux machine. All tests pass just fine.

julia> versioninfo()
Julia Version 0.7.0-alpha.147
Commit 5e3259e98e (2018-06-16 18:43 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-6650U CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)

Tested by Travis CI: The assert discussed above fires when the @shows are disabled.

Julia Version 0.7.0-alpha.147
Commit 5e3259e98e (2018-06-16 18:43 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD EPYC 7401P 24-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, znver1)
@staticfloat
Copy link
Member

It will help a lot if you can give a minimal working example of this. I see that FinEtools.jl is generally available, what is the minimal script that we would need to run to trigger this behavior?

@PetrKryslUCSD
Copy link
Author

PetrKryslUCSD commented Jun 18, 2018

Okay, it took me a number of iterations, but I do have a minimal working example now.

Mind you, at this point the version is 0.7.0-alpha.153, both on Travis and locally on my machine.

Please run ] test FinEtools. When I do that locally I get

(FinEtools) pkg>  test FinEtools
   Testing FinEtools
 Resolving package versions...
success?
mr = FinEtools.DeforModelRedModule.DeforModelRed2DAxisymm
material.mr = FinEtools.DeforModelRedModule.DeforModelRed2DAxisymm
failure?
Test Summary: | Pass  Total
Debug         |    1      1
 18.412751 seconds (18.85 M allocations: 940.914 MiB, 3.81% gc time)
   Testing FinEtools tests passed

but on Travis the result is

...
$ julia --check-bounds=yes --color=yes -e "Pkg.test(\"${JL_PKG}\", coverage=true)"
WARNING: Base.Pkg is deprecated, run `using Pkg` instead
 in module Main
WARNING: Base.Pkg is deprecated, run `using Pkg` instead
 in module Main
   Testing FinEtools
 Resolving package versions...
success? 
mr = FinEtools.DeforModelRedModule.DeforModelRed2DAxisymm
material.mr = FinEtools.DeforModelRedModule.DeforModelRed2DAxisymm
failure? 
Debug: Error During Test at /home/travis/build/PetrKryslUCSD/FinEtools.jl/test/runtests.jl:104
  Got exception LoadError("/home/travis/build/PetrKryslUCSD/FinEtools.jl/test/test_debug.jl", 38, AssertionError("Model reduction is mismatched")) outside of a @test
  LoadError: AssertionError: Model reduction is mismatched
  Stacktrace:
   [1] Type at /home/travis/build/PetrKryslUCSD/FinEtools.jl/src/FEMMDeforLinearModule.jl:33 [inlined]
   [2] test() at /home/travis/build/PetrKryslUCSD/FinEtools.jl/test/test_debug.jl:32
   [3] top-level scope at none:0
   [4] include at ./boot.jl:317 [inlined]
   [5] include_relative(::Module, ::String) at ./loading.jl:1089
   [6] include(::Module, ::String) at ./sysimg.jl:29
   [7] include(::String) at ./client.jl:393
   [8] macro expansion at /home/travis/build/PetrKryslUCSD/FinEtools.jl/test/runtests.jl:1090 [inlined]
   [9] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/Test/src/Test.jl:1080 [inlined]
   [10] macro expansion at /home/travis/build/PetrKryslUCSD/FinEtools.jl/test/runtests.jl:104 [inlined]
   [11] top-level scope at ./util.jl:156 [inlined]
   [12] top-level scope at ./<missing>:0
   [13] include at ./boot.jl:317 [inlined]
   [14] include_relative(::Module, ::String) at ./loading.jl:1089
   [15] include(::Module, ::String) at ./sysimg.jl:29
   [16] include(::String) at ./client.jl:393
   [17] top-level scope at none:0
   [18] eval(::Module, ::Any) at ./boot.jl:319
   [19] exec_options(::Base.JLOptions) at ./client.jl:243
   [20] _start() at ./client.jl:427
  in expression starting at /home/travis/build/PetrKryslUCSD/FinEtools.jl/test/test_debug.jl:38
Test Summary: | Pass  Error  Total
Debug         |    1      1      2
ERROR: LoadError: Some tests did not pass: 1 passed, 0 failed, 1 errored, 0 broken.
in expression starting at /home/travis/build/PetrKryslUCSD/FinEtools.jl/test/runtests.jl:104
ERROR: Package FinEtools errored during testing
Stacktrace:
 [1] #test#57(::Bool, ::Function, ::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/Pkg/src/Types.jl:119
 [2] #test at ./none:0 [inlined]
 [3] #test#40(::Bool, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/Pkg/src/API.jl:220
 [4] #test at ./none:0 [inlined]
 [5] #test#39 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/Pkg/src/API.jl:203 [inlined]
 [6] #test at ./none:0 [inlined]
 [7] #test#38 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/Pkg/src/API.jl:202 [inlined]
 [8] #test at ./none:0 [inlined]
 [9] #test#37 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/Pkg/src/API.jl:201 [inlined]
 [10] (::getfield(Pkg.API, Symbol("#kw##test")))(::NamedTuple{(:coverage,),Tuple{Bool}}, ::typeof(Pkg.API.test), ::String) at ./none:0
 [11] top-level scope at none:0

The success and failure is distinguished by calling two different constructors, one that @shows some arguments (that one doesn't fail on Travis), and one that doesn't print anything (that one fails on Travis).

function FEMMDeforLinear(mr::Type{MR}, integdata::IntegData{S, F}, material::M) where {MR<:DeforModelRed, S<:FESet, F<:Function, M<:MatDefor}
    @assert mr === material.mr "Model reduction is mismatched"
    @assert (integdata.axisymmetric) || (mr != DeforModelRed2DAxisymm) "Axially symmetric requires axisymmetric to be true"
    return FEMMDeforLinear(mr, integdata, CSys(manifdim(integdata.fes)), material)
end

function FEMMDeforLinear(mr::Type{MR}, integdata::IntegData{S, F}, material::M, print) where {MR<:DeforModelRed, S<:FESet, F<:Function, M<:MatDefor}
    @show mr 
    @show material.mr
    @assert mr === material.mr "Model reduction is mismatched"
    @assert (integdata.axisymmetric) || (mr != DeforModelRed2DAxisymm) "Axially symmetric requires axisymmetric to be true"
    return FEMMDeforLinear(mr, integdata, CSys(manifdim(integdata.fes)), material)
end

Hope this helps?

@PetrKryslUCSD
Copy link
Author

BTW, the same Travis failure occurs with the Mac (also alpha.153). Except in this case I cannot explore whether or not a local test would pass, as I don't have access to the Mac hardware.

@PetrKryslUCSD
Copy link
Author

PetrKryslUCSD commented Jun 19, 2018

I should also mention at this point that the problem does not appear on Windows locally on my machine :

julia> versioninfo()
Julia Version 0.7.0-alpha.164
Commit 04b391a8ea (2018-06-19 04:48 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-6650U CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 4

Right now appveyor also tests okay with 0.7-alpha.157.

@PetrKryslUCSD
Copy link
Author

PetrKryslUCSD commented Jun 21, 2018

The test still fails with alpha.216 on Travis (Linux), but passes locally on my machine. The test passes with the same Julia version on Windows (locally and on CI). The test fails with alpha.15 on Travis (MacOS).

@PetrKryslUCSD
Copy link
Author

Fails with beta.16 on Travis.

@PetrKryslUCSD
Copy link
Author

Still fails with beta.103 on Travis.

@PetrKryslUCSD
Copy link
Author

@staticfloat : The example, is sufficiently minimal?

@ViralBShah ViralBShah added the bug Indicates an unexpected problem or unintended behavior label Jul 2, 2018
@staticfloat
Copy link
Member

Sorry @PetrKryslUCSD, I haven't been able to reproduce this on any of my machines. I am thinking it's possible this is a codegen or LLVM bug. I'll ping @vtjnash to see if he has ideas for debugging information we can try to extract from Travis to better understand why travis fails and locally-running Julia is fine.

@PetrKryslUCSD
Copy link
Author

Yeah, I haven't seen it locally fail with any version of Julia. It is a Travis thing, apparently...

@nalimilan
Copy link
Member

Maybe print the generated LLVM/native code on both machines and compare it?

@PetrKryslUCSD
Copy link
Author

This problem still occurs with Julia Version 0.7.0-beta2.0, both Linux and Mac.

@PetrKryslUCSD
Copy link
Author

PetrKryslUCSD commented Jul 14, 2018

@staticfloat , @nalimilan , @ViralBShah :
The LLVM generated code on Travis and locally are indeed different. I presume that is due to the differences in architecture (AMD versus Intel)?

Local.txt
Travis.txt

The output in the files is composed of:

...
    println("success? ")
    @code_llvm FEMMDeforLinear(MR, IntegData(fes, GaussRule(2, 2), true), material, true)
    femm = FEMMDeforLinear(MR, IntegData(fes, GaussRule(2, 2), true), material, true)
    println("failure? ")
    @code_llvm FEMMDeforLinear(MR, IntegData(fes, GaussRule(2, 2), true), material)
    femm = FEMMDeforLinear(MR, IntegData(fes, GaussRule(2, 2), true), material)

@PetrKryslUCSD
Copy link
Author

Still occurs with beta2-27 on Linux.

@PetrKryslUCSD
Copy link
Author

@staticfloat , @nalimilan , @ViralBShah
The problem of different test results (locally versus Travis) still occurs with Julia Version 1.0.0-DEV.13.

@PetrKryslUCSD
Copy link
Author

PetrKryslUCSD commented Aug 4, 2018

@staticfloat , @nalimilan , @ViralBShah
At this point the code of test_debug() fails for ALL architectures (Linux, Windows, Mac) for Julia 1.0-DEV on Appveyor and Travis. Locally the same code runs fine on Windows 10 and Linux (I don't have the local result for the Mac). Local Linux: Version 1.0.0-DEV.13, local Win: Version 1.0.0-DEV.3.

The issue is again the assert which fails for one constructor but succeeds for another.

...
   Testing FinEtools
 Resolving package versions...
┌ Warning: __precompile__() is now the default
│   caller = top-level scope at none:0
└ @ Core none:0
Debug: Error During Test at /Users/travis/build/PetrKryslUCSD/FinEtools.jl/test/runtests.jl:104
  Got exception LoadError("/Users/travis/build/PetrKryslUCSD/FinEtools.jl/test/test_debug.jl", 84, AssertionError("Model reduction is mismatched")) outside of a @test
  LoadError: AssertionError: Model reduction is mismatched
  Stacktrace:
   [1] Type at /Users/travis/build/PetrKryslUCSD/FinEtools.jl/src/FEMMDeforLinearModule.jl:33 [inlined]
   [2] test() at /Users/travis/build/PetrKryslUCSD/FinEtools.jl/test/test_debug.jl:57
   [3] top-level scope at none:0
   [4] include at ./boot.jl:317 [inlined]
   [5] include_relative(::Module, ::String) at ./loading.jl:1040
   [6] include(::Module, ::String) at ./sysimg.jl:29
   [7] include(::String) at ./client.jl:398
   [8] macro expansion at /Users/travis/build/PetrKryslUCSD/FinEtools.jl/test/runtests.jl:104 [inlined]
   [9] macro expansion at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Test/src/Test.jl:1079 [inlined]
   [10] macro expansion at /Users/travis/build/PetrKryslUCSD/FinEtools.jl/test/runtests.jl:104 [inlined]
   [11] top-level scope at ./util.jl:156 [inlined]
   [12] top-level scope at ./none:0
   [13] include at ./boot.jl:317 [inlined]
   [14] include_relative(::Module, ::String) at ./loading.jl:1040
   [15] include(::Module, ::String) at ./sysimg.jl:29
   [16] include(::String) at ./client.jl:398
   [17] top-level scope at none:0
   [18] eval(::Module, ::Any) at ./boot.jl:319
   [19] exec_options(::Base.JLOptions) at ./logging.jl:317
   [20] _start() at ./client.jl:432
  in expression starting at /Users/travis/build/PetrKryslUCSD/FinEtools.jl/test/test_debug.jl:84

@PetrKryslUCSD
Copy link
Author

When I saw this discrepancy first for Linux, I thought that it made sense because the LLVM code was perhaps different due to the differences in computer architecture. However, now it also happens for Win 10: do you suppose the generated code could also be different locally vs. CI?

@PetrKryslUCSD
Copy link
Author

PetrKryslUCSD commented Aug 6, 2018

Here is the comparison of generated code for the constructors, both the one that does not fail (the one that takes an additional argument and prints the types), and the original one that does, both generated on my local configuration (a laptop) and CI, Windows 10:

function FEMMDeforLinear(mr::Type{MR}, integdata::IntegData{S, F}, material::M, print) where {MR<:DeforModelRed, S<:FESet, F<:Function, M<:MatDefor}
    @show mr 
    @show material.mr
    @assert mr === material.mr "Model reduction is mismatched"
    @assert (integdata.axisymmetric) || (mr != DeforModelRed2DAxisymm) "Axially symmetric requires axisymmetric to be true"
    return FEMMDeforLinear(mr, integdata, CSys(manifdim(integdata.fes)), material)
end

Julia versions:
CI:
Julia Version 1.0.0-DEV.3
Commit cfc7475 (2018-08-03 08:56 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.0 (ORCJIT, haswell)

Local:
Julia Version 1.0.0-DEV.3
Commit cfc7475 (2018-08-03 08:56 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-6650U CPU @ 2.20GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.0 (ORCJIT, skylake)

Generated LLVM code for the constructor with printing:
with-printing-CI.txt
with-printing-local.txt

Generated LLVM code for the constructor WITHOUT printing:
original-CI.txt
original-local.txt

The generated code is DIFFERENT. Does it explain why the comparison of the types fails? I don't know, I don't see enough into the generated code. Does it help you? @staticfloat @ViralBShah @nalimilan

@PetrKryslUCSD
Copy link
Author

May be it is meaningless, but I also checked code-lowered version: these appear to be identical both locally and on CI.

@PetrKryslUCSD
Copy link
Author

Because I wanted to repair the testing of the package FinEtools to reflect the status of porting the package to 1.0, I have removed the code to exercise the errorring constructors from the tests. If you wish to reproduce the results above (the one that generates the LLVM), please execute

module m111ocylpull14n1 # From miscellaneous
using FinEtools
using Test
using InteractiveUtils
function test()
    E1=1.0;
    nu23=0.19;
    rin=1.;
    rex =1.2;
    Length = 1*rex

    MR = DeforModelRed2DAxisymm
    fens,fes = Q4block(rex-rin,Length,5,20);
    material = MatDeforElastIso(MR, 00.0, E1, nu23, 0.0)
    
    femm = FEMMDeforLinear(MR, IntegData(fes, GaussRule(2, 2), true), material, true)
    println("========== With printing ==========")
    @show @code_llvm FEMMDeforLinear(MR, IntegData(fes, GaussRule(2, 2), true), material, true)
    println("========== Original ==========")
    @show @code_llvm FEMMDeforLinear(MR, IntegData(fes, GaussRule(2, 2), true), material)
    femm = FEMMDeforLinear(MR, IntegData(fes, GaussRule(2, 2), true), material)
    
    true
end
end
using .m111ocylpull14n1
m111ocylpull14n1.test()

@stev47
Copy link
Contributor

stev47 commented Aug 27, 2018

seeing that MR is a type: maybe test for equality with == instead of ===, also take heed of this note

@PetrKryslUCSD
Copy link
Author

Switching from === to == seems to have resolved the "bug". However it isn't clear to me why that should happen:

julia> x = Float64
Float64

julia> x == Float64
true

julia> x === Float64
true

So both operators seem to work for types. Perhaps there is something in the dialects of LLVM that makes egal to fail compare correctly types in some situations?

@StefanKarpinski
Copy link
Member

This could be a bug—I've seen a few === bugs here and there but not for a while.

@User-764Q
Copy link

Hi, I tested the issue on OSX with Julia 1.6.2 and it passed.

Is this issue still a problem with other hardware? Sorry I can only test with what I have.

Matt.

@KristofferC
Copy link
Member

This is pretty old now and since it seems to pass now it is hopefully a codegen bug that got fixed. I'll close this issue but if there is more up to date reproducers, just comment and we re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

8 participants