Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pkg [or help()!] crashes Julia in Windows binary #4362

Closed
stevengj opened this issue Sep 25, 2013 · 56 comments
Closed

Pkg [or help()!] crashes Julia in Windows binary #4362

stevengj opened this issue Sep 25, 2013 · 56 comments
Labels
bug Indicates an unexpected problem or unintended behavior packages Package management and loading priority This should be addressed urgently system:linux Affects only Linux system:windows Affects only Windows upstream The issue is with an upstream dependency, e.g. LLVM

Comments

@stevengj
Copy link
Member

Two of my students, one with Windows 7 and one with Windows 8, are experiencing a strange issue where Pkg.add("RPMmd") (or any other package, apparently) immediately crashes Julia. They are using the latest 64-bit Julia 0.2 binary snapshot for Windows, and deleted their AppData\Roaming\julia\packages (i.e. .julia) directories from earlier install attempts.

The problem does not seem entirely reproducible; on another Windows 7 or 8 machine we tried it works.

The versioninfo() output from the student with 64-bit Windows 8 is:

Julia Version 0.2.0-prerelease+3768
Commit d6f7c7c 2013-09-18 04:01:51 UTC
Platform Info:
  System: Windows (x86_64-w64-mingw32)
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY)
  LAPACK: libopenblas
  LIBM: libopenlibm

A screenshot of his Julia session is attached below:
capture

If you delete the AppData\Roaming\julia\packages directory, Julia prints a little more, outputting:

INFO: Initializing package repository C:\Users\...\AppData\Roaming\Julia\packages
INFO: Cloning METADATA from git://github.com/JuliaLang/METADATA.jl

and then exiting. (With no output on subsequent tries.)

Running git-cmd in the Git subdirectory and running git clone git://github.com/JuliaLang/METADATA.jl manually seemed to work.

Any ideas?

cc: @StefanKarpinski, @vtjnash

@stevengj
Copy link
Member Author

The METADATA repo seems to be cloned correctly, so the crash must be in a subsequent step. I'm still guessing it's some problem interacting with Windows git (possibly related to #4349). .... would be nice if there were a simple way to get more verbose output from Pkg. @StefanKarpinski?

@stevengj
Copy link
Member Author

cc: @staticfloat

@stevengj
Copy link
Member Author

Looks like it's not a Pkg or git bug specifically: turns out that the help() command also crashes Julia.

Even stranger, the same student reports that help crashes on the latest build, but works on Julia Version 0.2.0-prerelease+3441, on which Pkg.add("Color") works, but Pkg.add("RPMmd") crashes.

@stevengj
Copy link
Member Author

Executing the following my_init_help() function (based on Base.Help.init_help()) crashes Julia:

function my_init_help()
    println("Loading help data...")
    helpdb = evalfile(Base.Help.helpdb_filename())
    CATEGORY_LIST = {}
    CATEGORY_DICT = Dict()
    MODULE_DICT = Dict()
    FUNCTION_DICT = Dict()
    for (cat,mod,func,desc) in helpdb
        if !haskey(CATEGORY_DICT, cat)
        push!(CATEGORY_LIST, cat)
        CATEGORY_DICT[cat] = {}
        end
        if !isempty(mod)
        mfunc = mod * "." * func
        desc = Base.Help.decor_help_desc(func, mfunc, desc)
        else
            mfunc = func
        end
        push!(CATEGORY_DICT[cat], mfunc)
        if !haskey(FUNCTION_DICT, mfunc)
        FUNCTION_DICT[mfunc] = {}
        end
        push!(FUNCTION_DICT[mfunc], desc)
            if !haskey(MODULE_DICT, func)
        MODULE_DICT[func] = {}
            end
        if !in(mod, MODULE_DICT[func])
        push!(MODULE_DICT[func], mod)
        end
    end
end

but the following does not crash:

function my_init_help()
    println("Loading help data...")
    helpdb = evalfile(Base.Help.helpdb_filename())
    CATEGORY_LIST = {}
    CATEGORY_DICT = Dict()
    MODULE_DICT = Dict()
    FUNCTION_DICT = Dict()
    for (cat,mod,func,desc) in helpdb
        if !haskey(CATEGORY_DICT, cat)
        push!(CATEGORY_LIST, cat)
        CATEGORY_DICT[cat] = {}
        end
        if !isempty(mod)
        mfunc = mod * "." * func
        desc = Base.Help.decor_help_desc(func, mfunc, desc)
        else
            mfunc = func
        end
        push!(CATEGORY_DICT[cat], mfunc)
#        if !haskey(FUNCTION_DICT, mfunc)
#       FUNCTION_DICT[mfunc] = {}
#        end
#        push!(FUNCTION_DICT[mfunc], desc)
#            if !haskey(MODULE_DICT, func)
#       MODULE_DICT[func] = {}
#            end
#        if !in(mod, MODULE_DICT[func])
#       push!(MODULE_DICT[func], mod)
#        end
    end
end

Furthermore, uncommentng just the if !haskey(FUNCTION_DICT, mfunc) block causes it to crash!

Something is seriously weird here.

@stevengj
Copy link
Member Author

@vtjnash, could there possibly be something left over in the Registry (or something?) from previous install attempts that is causing problems? @loladiro?

@Keno
Copy link
Member

Keno commented Sep 25, 2013

Not to my knowledge.

@vtjnash
Copy link
Member

vtjnash commented Sep 25, 2013

I don't put anything in the registry. It's a long shot, but perhaps the debug version of julia would give more info as it crashed? Unfortunately, the example didn't crash on any of the test machines I tried.

@stevengj
Copy link
Member Author

Unfortunately, the debug version doesn't give any more info as it crashes.

@WestleyArgentum
Copy link
Member

@keesvp and I are seeing the same issue, Pkg and help calls cause a crash. Windows 8, 64bit (the binary release)

@vtjnash
Copy link
Member

vtjnash commented Sep 25, 2013

A backtrace, or access to a machine would be awesome. PM me if you are willing to allow me to control you machine for a little while.

Also, if you could check whether it happens with the 32-bit build on your machine or not, that may also be useful information.

@gevahn
Copy link

gevahn commented Sep 25, 2013

I'm one of the original problematic students.
32-bit gives the same results.

@keesvp
Copy link

keesvp commented Sep 25, 2013

Tried 32bit language distribution.
Directly in Julia shell, help() crashes immediately, Pkg.update seems to work.

Package update from Julia Studio results in:

julia> Updating packages
INFO: Updating METADATA...
fatal: Unable to look up github.com (port 9418) (A non-recoverable error occurred during a database lookup. )

and crash (of Julia language)

@vtjnash
Copy link
Member

vtjnash commented Sep 25, 2013

can you see if any of the following crash:
code_lowered(help,())
code_typed(help,())
code_llvm(help,())
code_native(help,())

@gevahn
Copy link

gevahn commented Sep 25, 2013

update: on 32-bit I can install I Julia, but on an IJulia notebook, I get a kernel is dead error

@keesvp
Copy link

keesvp commented Sep 25, 2013

none of these crashes, on both 32 and 64

@gevahn
Copy link

gevahn commented Sep 25, 2013

There was an error messege I missed when installing IJulia (via Pkg.add("IJulia")

WARNING: An exception occured while building binary dependencies.
You may have to take manual steps to complete the installation, see the error me
ssage below.
To reattempt the installation, run Pkg.fixup("ZMQ").

ERROR: Provider Binaries failed to satisfy dependency zmq
at C:\Users\gevah_000\AppData\Roaming\Julia\packages\ZMQ\deps\build.jl:18

@WestleyArgentum
Copy link
Member

I just tried in a relatively clean 32bit vm of Windows 7, ran all the commands above, everything worked fine

@gevahn
Copy link

gevahn commented Sep 25, 2013

In the 32-bit version I get the following error on crash of help()

julia> help()
Loading help data...
Please submit a bug report with steps to reproduce this fault, and any error mes
sages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x24433ae -- setindex! at dict.jl:468
setindex! at dict.jl:468
jl_apply_generic at ???: offset 6a60f686
free at ???: offset 76de9b03
jl_apply_generic at ???: offset 6a60f686
jl_dump_function at ???: offset 6a639058

@vtjnash
Copy link
Member

vtjnash commented Sep 26, 2013

"access to a machine" can just means that you ping me on the IRC channel (freenode.net#julia) and i ask you to run various commands for me

@stevengj
Copy link
Member Author

@gevahn, the problem with ZMQ in 32-bit Windows is due to JuliaInterop/ZMQ.jl#30, which is unrelated to the present issue.

@gevahn
Copy link

gevahn commented Sep 26, 2013

Ok, I've played around with Pkg.add in the 32-bit version, and they all seem to work (other then the unrelated non-working packages). The help crash might be a separte, unrelated bug

@gevahn
Copy link

gevahn commented Sep 26, 2013

I'm in the IRC channel. PM me there when you have time and we'll work this out

@ihnorton
Copy link
Member

@vtjnash kind of a shot in the dark (I am aware that we don't use gnulib): isatty may have changed on win 8 (see also here)

@stevengj
Copy link
Member Author

@ihnorton, I've actually tried Windows 8 and it works...on some machines... the problem here seems unreliable to reproduce, so I don't think it would be problem with Windows 8 in general. Also, a student saw a similar problem on Windows 7.

@vtjnash
Copy link
Member

vtjnash commented Sep 28, 2013

i don't have time to look at this today or tomorrow. but i don't think libuv uses that function -- it appears to implement the "correct" check (i'm not entirely sure why gnu libc wasn't using their method). also, it seems their repl is fine, just certain commands are problematic.

they don't both have AMD processors by any chance? or strange intel variants?

can one of you with a machine where this crashes post the output of code_native(setindex!,(Dict{Any,Any},Vector{Any},ASCIIString). Or GDB backtrace & disassembly at the crash, which would be even more helpful.

@WestleyArgentum
Copy link
Member

The computer @keesvp was having trouble with was an Intel core i7, but
neither of us is near it right now
On Sep 27, 2013 5:24 PM, "Jameson Nash" notifications@github.com wrote:

i don't have time to look at this today or tomorrow. but i don't think
libuv uses that function -- it appears to implement the "correct" check
(i'm not entirely sure why gnu libc wasn't using their method). also, it
seems their repl is fine, just certain commands are problematic.

they don't both have AMD processors by any chance? or strange intel
variants?

can one of you with a machine where this crashes post the output of
code_native(setindex!,(Dict{Any,Any},Vector{Any},ASCIIString). Or GDB
backtrace & disassembly at the crash, which would be even more helpful.


Reply to this email directly or view it on GitHubhttps://github.com//issues/4362#issuecomment-25286017
.

@gevahn
Copy link

gevahn commented Sep 28, 2013

the command doesn't return anything

@vtjnash
Copy link
Member

vtjnash commented Sep 28, 2013

Strange. I see I forgot a closing parentheses: does it print anything if you complete the command statement?

@gevahn
Copy link

gevahn commented Sep 28, 2013

Immediately crashes with no error messages

@ihnorton
Copy link
Member

I tried running win8 under virtualbox and could not reproduce either.

@gevahn (or anyone else who can reproduce the crash): please see these instructions to download and run gdb.

@vtjnash
Copy link
Member

vtjnash commented Sep 29, 2013

with @gevahn's help, i've decided this is a bug in llvm's support for the new shlx instruction on Haswell processors.

Dump of assembler code from 0x2760a0 to 0x2760dc:
  0x00000000002760a0:  jne    0x276359
  0x00000000002760a6:  mov    %rsi,%rax
  0x00000000002760a9:  jmpq   0x276136
  0x00000000002760ae:  mov    $0x2,%eax
  0x00000000002760b3:  cmpq   $0xfa01,0x28(%rsi)
  0x00000000002760bb:  mov    -0x80(%rbp),%rbx
  0x00000000002760bf:  jl     0x2760ca
  0x00000000002760c5:  mov    $0x1,%eax
  0x00000000002760ca:  shlx   %rax,%rdi,%rsi
=> 0x00000000002760cf:  (bad)
  0x00000000002760d0:  loopne 0x276134
  0x00000000002760d2:  mov    $0xd,%cl
  0x00000000002760d4:  mov    $0x9f,%al
  0x000000000027608f:  movabs $0x6a40e1a0,%rax
  0x0000000000276099:  mov    (%rax),%rax
  0x000000000027609c:  cmp    -0x40(%rbp),%rax
  0x00000000002760a0:  jne    0x276359
  0x00000000002760a6:  mov    %rsi,%rax
  0x00000000002760a9:  jmpq   0x276136
  0x00000000002760ae:  mov    $0x2,%eax
  0x00000000002760b3:  cmpq   $0xfa01,0x28(%rsi)
  0x00000000002760bb:  mov    -0x80(%rbp),%rbx
End of assembler dump.

@StefanKarpinski
Copy link
Member

That's pretty amazing collective detective work, guys. So what's the next step? File an LLVM bug. How do we avoid hitting this in the mean time?

@vtjnash
Copy link
Member

vtjnash commented Sep 30, 2013

@ihnorton has a patch for disabling avx2 in our JIT.

disassemble /r 0x00000000002966e5,+40
Dump of assembler code from 0x2966e5 to 0x29670d:
  0x00000000002966e5:  b8 01 00 00 00  mov    $0x1,%eax
  0x00000000002966ea:  c4 e2 f9 f7 f7  shlx   %rax,%rdi,%rsi
=> 0x00000000002966ef:  82      (bad)
  0x00000000002966f0:  c0 2d 72 0d c0 15 9a    shrb   $0x9a,0x15c00d72(%rip) # 0x15e97469
  0x00000000002966f7:  04 48   add    $0x48,%al
  0x00000000002966f9:  b8 b0 02 2c 00  mov    $0x2c02b0,%eax
  0x00000000002966fe:  00 00   add    %al,(%rax)
  0x0000000000296700:  00 00   add    %al,(%rax)
  0x0000000000296702:  48 89 f1        mov    %rsi,%rcx
  0x0000000000296705:  ff d0   callq  *%rax

82 c0 2d 72 0d c0 15 9a 04 seems to be meaningless. i'm not sure where this came from.
48 b8 b0 02 2c 00 00 00 00 00 is the next expected instruction (e.g. movabs $2c02b0 %rax)
ff d0 then was also correct (callq %raq)

@ihnorton
Copy link
Member

pr here -- it only disables BMI2, which is the binary ops part of the new haswell instructions that introduces SHLX... AVX2 is separate, though maybe we want to disable that too to be safe.

@Keno
Copy link
Member

Keno commented Sep 30, 2013

Do we have any idea as to the LLVM IR causing this so we can submit an upstream bug report?

@vtjnash
Copy link
Member

vtjnash commented Sep 30, 2013

no, i haven't been able to reproduce this using llc.

@vtjnash
Copy link
Member

vtjnash commented Oct 1, 2013

http://llvm.org/bugs/show_bug.cgi?id=17422

LLVM's JIT doesn't support Haswell (confirmed that this fails on linux too)

ihnorton added a commit to ihnorton/julia that referenced this issue Oct 1, 2013
@ihnorton
Copy link
Member

ihnorton commented Oct 1, 2013

I have merged the fix after discussion with @vtjnash. We confirmed the bug (and the fix) on linux using the Intel SDE tool. Unfortunately msys is still trying to melt my windows box so I don't have a windows package to try yet.

@WestleyArgentum @keesvp I assume you have a working windows build system - would be great if you can update and test on windows.

@WestleyArgentum
Copy link
Member

Unfortunately we're still stuck on #4300, @keesvp knows more than I do

@ihnorton
Copy link
Member

ihnorton commented Oct 1, 2013

here is a binary to test (note: it is missing git and probably some other extras; I'll try to fix my x-compile setup tonight, or hopefully Jameson can generate a new official one soon)

@gevahn
Copy link

gevahn commented Oct 1, 2013

help() still crash with this one

@ihnorton
Copy link
Member

ihnorton commented Oct 1, 2013

one more time, sorry. grabbed the old package earlier. link is updated now.

@keesvp
Copy link

keesvp commented Oct 1, 2013

I have a build now on a Haswell. Will test further today and report.

@keesvp
Copy link

keesvp commented Oct 1, 2013

help() is fine.

Pkg.update() doesn't crash, but still gives:
Unable to look up github.com (port 9418) (A non-recoverable error occurred during a database lookup. )

run(git)
gives regular git help output, but then:

ERROR: failed process: Process(git, ProcessExited(1)) [1]
in error at error.jl:22

@keesvp
Copy link

keesvp commented Oct 1, 2013

that should be:
run( git )

@gevahn
Copy link

gevahn commented Oct 1, 2013

help() is fine on my machine as well. Pkg.update() doesn't work, but as ihnorton mentioned, there is no git in this version.

@keesvp
Copy link

keesvp commented Oct 1, 2013

I'm not running that version. I have build a complete one.

@ihnorton
Copy link
Member

ihnorton commented Oct 1, 2013

@keesvp what version of git are you using? Can you try using the Git folder from the current download package on the homepage?

@keesvp
Copy link

keesvp commented Oct 1, 2013

I created a new issue #4409.

@ihnorton
Copy link
Member

ihnorton commented Oct 2, 2013

I posted a new 64-bit package, with git and all extras, here.

Closing this as @gevahn confirmed that help(), Pkg, IJulia, and PyPlot all work -- thanks very much to @gevahn for sticking with this, it will save us many bug reports in the next few months as more people get Haswell-based computers.

@stevengj
Copy link
Member Author

stevengj commented Oct 2, 2013

Thanks @ihnorton; can we get this on the official download page?

@vtjnash
Copy link
Member

vtjnash commented Oct 25, 2013

LLVM folks have been kind enough to provided a patch: http://llvm.org/bugs/attachment.cgi?id=11428

I've tested this and it seems to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior packages Package management and loading priority This should be addressed urgently system:linux Affects only Linux system:windows Affects only Windows upstream The issue is with an upstream dependency, e.g. LLVM
Projects
None yet
Development

No branches or pull requests

8 participants