-
-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IJulia kernel doesn't work for Julia 1.6 on macOS #968
Comments
Does the IJulia build succeed? Mine is failing due to JuliaLang/Pkg.jl#2270 (i.e. an unrelated Pkg issue) |
Can you try turning on debugging output? https://julialang.github.io/IJulia.jl/stable/manual/troubleshooting/#Debugging-IJulia-problems |
Hm, I can't manage to get any useful debugging output. It seems like setting I removed all instances of IJulia (in Starting the notebook server from the cmd directly ( Is there something else that I could try? |
I have the same issue with IJulia 1.23.1 on Julia 1.6-DEV.1678. Running
Calling
|
Julia master is currently pretty broken so I wouldn't be surprised if that is the problem here rather than IJulia. |
If that's the case, fair enough. However, let me note that I see the same issue on the fresh 1.6 release branch. Isn't the latter supposed to be at least "beta" stable? |
Still seeing this for the 1.6 release branch. Any pointers to how I could investigate this further would be great. |
It works for me. |
Interesting... What OS are you on and what version of IJulia are you using? |
I tried it again with IJulia 1.23.1 on the latest Julia nightly (1.7.0-DEV.136), but still have the same issue. I'm on macOS.
|
Alright, I have tested julia#master (Version 1.7.0-DEV.197 (2020-12-30)) with IJulia v1.23.1 on Ubuntu 20.04, Windows 10, and macOS 11.1 (Big Sur, x86). As it turns out, everything works fine on linux and windows, so this appears to be a macOS issue. @fredrikekre I assume you're on linux or windows? Given that @tomyun saw the same issue on macOS 10.15 Catalina I assume it's a "generic" macOS issue and not tied to a particular macOS version. Any ideas how to debug/fix this? |
It might be security related (just a guess) because IJulia doesn't even manage to create the 1.7 kernel file in my case. |
If by kernel file you mean the kernel spec ( |
I'm trying to debug this. I started by manually executing every line in julia> start_heartbeat(heartbeat[])
[1] 50603 segmentation fault $HOME/repos/julia/usr/bin/Julia |
Here is a MWE which segfaults on macOS with 1.7 but works fine with 1.5.3: using ZMQ # v1.2.1
using ZMQ: libzmq
const threadid = zeros(Int, 128)
function heartbeat_thread(sock::Ptr{Cvoid})
ccall((:zmq_proxy,libzmq), Cint, (Ptr{Cvoid}, Ptr{Cvoid}, Ptr{Cvoid}),
sock, sock, C_NULL)
nothing
end
const heartbeat = Ref{Socket}()
heartbeat[] = Socket(ROUTER)
sock = heartbeat[]
# function start_heartbeat(sock)
heartbeat_c = @cfunction(heartbeat_thread, Cvoid, (Ptr{Cvoid},))
ccall(:uv_thread_create, Cint, (Ptr{Int}, Ptr{Cvoid}, Ptr{Cvoid}), threadid, heartbeat_c, sock) # this line segfaults |
Yes and I have to create them manually on my side by basically copying previous ones. But after so it doesn't work for me in a similar way to yours. |
Hm, strange, but I think this is an orthogonal issue. |
The following crashes even without involving ZMQ: const threadid = zeros(Int, 128)
function heartbeat_thread(sock::Ptr{Cvoid})
ccall(:printf, Cint, (Cstring, Ptr{Cvoid}), "got sock = %p\n", sock)
nothing
end
sock = Ptr{Cvoid}(0x0123456789)
# function start_heartbeat(sock)
heartbeat_c = @cfunction(heartbeat_thread, Cvoid, (Ptr{Cvoid},))
ccall(:uv_thread_create, Cint, (Ptr{Int}, Ptr{Cvoid}, Ptr{Cvoid}), threadid, heartbeat_c, sock) # this line segfaults @Keno, has something changed in Julia 1.6 that would effect calling Correction: the above crashes even in Julia 1.5, so may be unrelated. |
I've been trying to chase this down, and it looks like the problem is in result = ccall(:uv_thread_create, Cint, (Ptr{Int}, Ptr{Cvoid}, Ptr{Cvoid}),
threadid, heartbeat_c, sock) I've been trying to pin it down more precisely without much success, but adding |
My bad, I hadn't looked at this whole thread. Still digging, but it looks like the address of the called cfunction is getting clobbered, so create_pthread crashes instantly. Hopefully more details once I get a better MWE--so far I haven't been able to find one that works in 1.5 but not 1.6 without ZMQ. |
Can we re-implement the heartbeat thread in terms of the Update: no, |
I've been trying a few things: compiling with clang on Linux, compiling with gcc on MacOS, building with the thread sanitizer. So far none of those have worked. clang builds on linux run into some "not a compile-time constant" issues, gcc on Mac runs into a problem with library versions, the thread sanitizer fails because it doesn't load soon enough in some phases of the build process. By the way, I'm also seeing that the MWE fails intermittently in 1.5.3:
I'm seeing that about 20% of the time. The only change I made to the code listed above is that it prints the value of |
We could just omit the heartbeat thread on MacOS — I think it is optional in Jupyter these days? It's just a bit frustrating not to know why this is crashing. |
Still trying to build with thread sanitization turned on, but it's tripping over (among other things) this (src/task.c, lines 55-65): #if defined(JL_TSAN_ENABLED)
static inline void tsan_destroy_ctx(jl_ptls_t ptls, void *state) {
if (state != &ptls->root_task->state) {
__tsan_destroy_fiber(ctx->state);
}
ctx->state = NULL;
}
static inline void tsan_switch_to_ctx(void *state) {
__tsan_switch_to_fiber(state, 0);
}
#endif
|
Breaking news: I have a working version, though it needs a bunch of cleanup to get rid of all the debugging junk I threw in. I commented out all of the heartbeat stuff, which got it to this:
For debugging purposes, I pulled apart the function definition loop in stdio.jl into three separate functions, which got me to:
I ditched the three separate functions in favor of a single, parameterized one: function redirect_one(io::IJuliaStdio, which::String)
js = io[:jupyter_stream]
js != which && throw(ArgumentError("expecting $(which) stream, got $(js)"))
Core.eval(Base, Expr(:(=), Symbol(which), io))
return io
end and that works! In init.jl, the calls to if capture_stdout
read_stdout[], = redirect_stdout()
redirect_one(IJuliaStdio(stdout,"stdout"), "stdout")
end
if capture_stderr
read_stderr[], = redirect_stderr()
redirect_one(IJuliaStdio(stderr,"stderr"), "stderr")
end
redirect_one(IJuliaStdio(stdin,"stdin"), "stdin") I'll do a PR once I get this cleaned up and tested for compatibility with Linux and Windows. I also tried putting the heartbeat back in after getting a working version and verified that it's still causing a segfault, so whatever's causing that, it's still at large. |
FWIW, that was the goal of the changes to base: to make it easier to consolidate the code and use less meta programming for these. To that end, there's also now |
Note that the heartbeat problem still remains. |
Yeah, I'm looking into it. I believe it actually has nothing to do with ZMQ and is instead compiler-internal changes around what you're allowed to do inside of |
@staticfloat, it's confusing to me that (As mentioned above, this needs to be in a thread, not a task, because otherwise a long-running task that fails to |
@vtjnash tells me that looking up global values is not allowed within a As for the reason why 1.5 didn't crash here, it's because the |
(This is just a pthread under the hood on MacOS.) Why can’t a thread access a constant global, @vchuravy? A runtime |
Doing the using ZMQ, Libdl
const zmq_proxy = dlsym(dlopen(ZMQ.libzmq), :zmq_proxy)
const threadid = zeros(Int, 128)
function heartbeat_thread(sock::Ptr{Cvoid})
ccall(zmq_proxy, Cint, (Ptr{Cvoid}, Ptr{Cvoid}, Ptr{Cvoid}),
sock, sock, C_NULL)
nothing
end
const heartbeat = Ref{Socket}()
heartbeat[] = Socket(ROUTER)
# function start_heartbeat(sock)
heartbeat_c = @cfunction(heartbeat_thread, Cvoid, (Ptr{Cvoid},))
ccall(:uv_thread_create, Cint, (Ptr{Int}, Ptr{Cvoid}, Ptr{Cvoid}), threadid, heartbeat_c, heartbeat[]) |
I'm finding that I get very different results depending on the contents of .julia/compiled. After deleting all precompiled code, the code below works in both v1.5.3 and v1.6--mostly: using ZMQ # v1.2.1
using ZMQ, Libdl #: libzmq
const zmq_proxy = dlsym(dlopen(ZMQ.libzmq), :zmq_proxy)
const threadid = zeros(Int, 128)
ccall(:jl_safe_printf, Cint, (Ptr{UInt8},), "TRAP1\n")
@ccall jl_safe_printf("TRAP2\n"::Ptr{UInt8})::Cint
function heartbeat_thread(sock::Ptr{Cvoid})
ccall(zmq_proxy, Cint, (Ptr{Cvoid}, Ptr{Cvoid}, Ptr{Cvoid}),
sock, sock, C_NULL)
ccall(:jl_safe_printf, Cint, (Ptr{UInt8},), "TRAP3\n")
@ccall jl_safe_printf("TRAP4\n"::Ptr{UInt8})::Cint
nothing
end
const heartbeat = Ref{Socket}()
heartbeat[] = Socket(ROUTER)
sock = heartbeat[]
# function start_heartbeat(sock)
heartbeat_c = @cfunction(heartbeat_thread, Cvoid, (Ptr{Cvoid},))
ccall(:uv_thread_create, Cint, (Ptr{Int}, Ptr{Cvoid}, Ptr{Cvoid}), threadid, heartbeat_c, sock) # this line segfaults If I comment out one or both of the |
I'm confused — are you putting the MWE into a module (IJulia?) and precompiling that? This won't work because #985 seems to work fine for me with both 1.6 and 1.5.3. |
I'm still seeing an intermittent failure, even with my test case modified per #985. Here's the complete source of what I'm running, warts and all: using ZMQ, Libdl #: libzmq
#const zmq_proxy = dlsym(dlopen(ZMQ.libzmq), :zmq_proxy)
const zmq_proxy = Ref(C_NULL)
const threadid = zeros(Int, 128)
ccall(:jl_safe_printf, Cint, (Ptr{UInt8},), "TRAP1\n")
#@ccall jl_safe_printf("TRAP2\n"::Ptr{UInt8})::Cint
function heartbeat_thread(sock::Ptr{Cvoid})
ccall(zmq_proxy[], Cint, (Ptr{Cvoid}, Ptr{Cvoid}, Ptr{Cvoid}),
sock, sock, C_NULL)
#ccall(:jl_safe_printf, Cint, (Ptr{UInt8},), "TRAP3\n")
#@ccall jl_safe_printf("TRAP4\n"::Ptr{UInt8})::Cint
nothing
end
const heartbeat = Ref{Socket}()
heartbeat[] = Socket(ROUTER)
sock = heartbeat[]
# function start_heartbeat(sock)
zmq_proxy[] = Libdl.dlsym(Libdl.dlopen(ZMQ.libzmq), :zmq_proxy)
heartbeat_c = @cfunction(heartbeat_thread, Cvoid, (Ptr{Cvoid},))
ccall(:uv_thread_create, Cint, (Ptr{Int}, Ptr{Cvoid}, Ptr{Cvoid}), threadid, heartbeat_c, sock) # this line segfaults Here's what's happening when I run it:
Sometimes it runs without blowing up, sometimes not. Before the first run, I cleaned out |
Could it be that Julia is calling the |
Tried that, didn’t seem to make any difference. It’s still about 50/50 whether it runs to completion or crashes. |
@rgobbel, I can't reproduce your problem (on MacOS 10.15.7 with Julia |
@stevengj Unfortunately, I can reproduce on macOS 11.1 with Julia ➜ segfaulttest julia-dev --project=. segfaulttest.jl
TRAP1
Assertion failed: nbytes == sizeof (dummy) (src/signaler.cpp:391)
signal (6): Abort trap: 6
in expression starting at none:0
[1] 61009 segmentation fault $HOME/repos/julia/usr/bin/julia --project=. segfaulttest.jl
➜ segfaulttest julia-dev --project=. segfaulttest.jl
TRAP1
➜ segfaulttest julia-dev --project=. segfaulttest.jl
TRAP1
Assertion failed: nbytes == sizeof (dummy) (src/signaler.cpp:391)
signal (6): Abort trap: 6
in expression starting at none:0
[1] 61018 segmentation fault $HOME/repos/julia/usr/bin/julia --project=. segfaulttest.jl
➜ segfaulttest julia-dev --project=. segfaulttest.jl
TRAP1
➜ segfaulttest julia-dev --project=. segfaulttest.jl
TRAP1 |
I tried Is IJulia crashing for you with #985 too? |
In that case I'm going to merge #985, since getting something working is an urgent issue. We can open a new issue if people see intermittent crashes in actual practice. |
(The |
After doing a deep clean of everything (rm -rf ~/.julia /Applications/Julia-* ; uninstall every possible version of libzmq ), I get a working IJulia, if and only if I run using |
(Sounds like a separate issue in any case from this one — the problems in this issue happened in processes where ZMQ was already precompiled.) |
I tested both |
I have the same problem with version 1.7.0-dev.
Here is the versioninfo
|
Running IJulia in the debug mode gives the following error message:
|
@fkguo I would open a new issue because you are using Julia 1.7 / master while this issue is about 1.6. (BTW, I can't reproduce on Julia#master where even IJulia precompilation fails for me.) |
I can't get the IJulia kernel to work when using Julia 1.6/master. Tried latest IJulia release and IJulia#master. The kernel either seems to not start all or die immediately.
Julia <= 1.5.3 kernels work just fine.
The text was updated successfully, but these errors were encountered: