Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

julia REPL doesn't start in 9term #9063

Closed
sqweek opened this issue Nov 19, 2014 · 15 comments
Closed

julia REPL doesn't start in 9term #9063

sqweek opened this issue Nov 19, 2014 · 15 comments
Labels
io Involving the I/O subsystem: libuv, read, write, etc. REPL Julia's REPL (Read Eval Print Loop)
Milestone

Comments

@sqweek
Copy link
Contributor

sqweek commented Nov 19, 2014

Heya,

I tried to run julia (0.3.2) within 9term from p9p (https://github.com/9fans/plan9port), but the REPL doesn't start. Hangs in:

syscall at /usr/bin/../lib/libc.so.6 (unknown line)
uv__epoll_wait at /usr/bin/../lib/julia/libjulia.so (unknown line)
uv__io_poll at /usr/bin/../lib/julia/libjulia.so (unknown line)
uv_run at /usr/bin/../lib/julia/libjulia.so (unknown line)
process_events_3B_1750 at /usr/bin/../lib/julia/sys.so (unknown line)
wait_3B_1750 at /usr/bin/../lib/julia/sys.so (unknown line)
wait_3B_1791 at /usr/bin/../lib/julia/sys.so (unknown line)
jl_apply_generic at /usr/bin/../lib/julia/libjulia.so (unknown line)
anonymous at task.jl:513
unknown function (ip: 1064656435)
julia_trampoline at /usr/bin/../lib/julia/libjulia.so (unknown line)
unknown function (ip: 4199613)
__libc_start_main at /usr/bin/../lib/libc.so.6 (unknown line)
unknown function (ip: 4199667)
unknown function (ip: 0)

(which doesn't look like it makes sense, v0.3.2:base/task.jl only has 342 lines...)

9term is not a typical TTY - it doesn't support character addressing or colours or any escape codes. So I run with TERM=dumb by default, which is where I see the hang.

Ironically if I set TERM=xterm then the REPL runs fine, it just fills the terminal with escape sequences ;)

Seems to be something to do with epoll/libuv, but I don't know enough about that to reproduce myself.

@sqweek
Copy link
Contributor Author

sqweek commented Nov 19, 2014

cat |julia is an OK workaround, but slightly awkward without a prompt. I really like the stream REPL support though, I find python's insistance on using readline within the REPL incredibly annoying (I can't even paste code into a python REPL anymore because I use tabs for indentation and the REPL tries to tab-complete -_-).

@staticfloat
Copy link
Member

If you are building Julia yourself, could you build a debug version of Julia so we can get line numbers? You can do so via make debug in the main Julia source directory.

@sqweek
Copy link
Contributor Author

sqweek commented Nov 19, 2014

I set make off earlier with that goal in mind - here it is:

#0  0x00007fc7135790d9 in syscall () from /usr/lib/libc.so.6
#1  0x00007fc71484308a in uv__epoll_wait (epfd=<optimized out>, events=events@entry=0x7fff38aa0200, nevents=nevents@entry=1024, timeout=timeout@entry=-1)
    at src/unix/linux-syscalls.c:312
#2  0x00007fc7148417e5 in uv__io_poll (loop=loop@entry=0x7fc715754d40 <default_loop_struct>, timeout=-1) at src/unix/linux-core.c:202
#3  0x00007fc714835977 in uv_run (loop=0x7fc715754d40 <default_loop_struct>, mode=UV_RUN_ONCE) at src/unix/core.c:294
#4  0x00007fc714802bc8 in jl_run_once (loop=0x7fc715754d40 <default_loop_struct>) at jl_uv.c:247
#5  0x00007fc71225a152 in julia_process_events;39176 () at stream.jl:537
#6  0x00007fc712259dda in julia_wait;39175 () at task.jl:273
#7  0x00007fc7122708b8 in julia_wait;39912 () at task.jl:194
#8  0x00007fc71475c02e in jl_apply (f=0x3babc20, args=0x7fff38aa3660, nargs=1) at julia.h:983
#9  0x00007fc714761762 in jl_apply_generic (F=0x3bab9e0, args=0x7fff38aa3660, nargs=1) at gf.c:1624
#10 0x00007fc71277bd41 in ?? ()
#11 0x0000000000000002 in ?? ()
#12 0x00007fff38aa3780 in ?? ()
#13 0x0000000003f53990 in ?? ()
#14 0x0000000000000000 in ?? ()

This is off master now, 24cc711

There is one other thread running, which I presume isn't relevant yet but include for completeness:

#0  0x00007fc71406e8bf in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
#1  0x00007fc70f7e453b in blas_thread_server () from /q/src/julia/usr/bin/../lib/libopenblas.so
#2  0x00007fc714069314 in start_thread () from /usr/lib/libpthread.so.0
#3  0x00007fc71357d3ed in clone () from /usr/lib/libc.so.6

@JeffBezanson JeffBezanson added io Involving the I/O subsystem: libuv, read, write, etc. REPL Julia's REPL (Read Eval Print Loop) labels Nov 25, 2014
@sqweek
Copy link
Contributor Author

sqweek commented Nov 27, 2014

The stack trace seemed to reveal an extremely generic i/o loop, but I managed to figure out what is going on through other means:

It happens when julia is exec'd by a process which has blocked SIGCHLD. The signal mask is inherited across exec, and I guess julia ends up waiting for a signal that is never delivered.

strace supports this hypothesis - everything stops shortly after julia forks & execs tput setaf 0.

Execing any process with SIGCHLD blocked seems like an extremely impolite thing to be doing, so I'll be in touch with the plan9port authors to get 9term fixed.

For julia (or perhaps libuv? not sure what is running tput), it may be worth explicitly unblocking the signals relied upon. Here is a wrapper that reproduces the issue in any terminal:

#include <signal.h>
#include <stdlib.h>

int
main(int argc, char **argv)
{
    int sig = SIGCHLD;
    sigset_t set;
    sigemptyset(&set);
    sigaddset(&set, sig);
    sigprocmask(SIG_BLOCK, &set, NULL);

    argv++; argc--;
    execvp(argv[0], argv);
    abort();
}

Save to blkchld.c, compile with gcc blkchld.c -o blkchld and then run ./blkchld julia. By replacing SIG_BLOCK with SIG_UNBLOCK I can use the same approach to get julia running under 9term.

It looks like julia uses pthreads, I think the correct call in this situation is pthread_sigmask rather than sigprocmask.

@tkelman
Copy link
Contributor

tkelman commented Nov 27, 2014

What distribution are you on? I do remember some patches to Julia's fork of libuv a few months ago that involved execvp, I think.

@sqweek
Copy link
Contributor Author

sqweek commented Nov 27, 2014

archlinux. Reproduced with julia from git though (24cc711 Nov 19: deps/libuv @ 5599b8c)

@vtjnash
Copy link
Member

vtjnash commented Dec 15, 2014

does it work if you unblock all signals in _julia_init in init.c, e.g. by calling restore_signals? at some point, the repl used to do that, but i think that may have gotten lost in the big change from readline to REPL.jl

@sqweek
Copy link
Contributor Author

sqweek commented Dec 18, 2014

Sorry, dropped the ball! I can confirm that calling restore_signals() in _julia_init fixes the problem. There appears to be no ordering constraints, inserting the call at the start of _julia_init worked, as did inserting it at the end.

@vtjnash vtjnash added this to the 0.4 milestone Dec 18, 2014
@vtjnash vtjnash modified the milestones: 0.3.4, 0.4 Dec 18, 2014
@tkelman
Copy link
Contributor

tkelman commented Dec 21, 2014

@vtjnash did you have some local unpushed work on this that you're intending to be stable and tested enough for backporting before 0.3.4? This is the kind of thing I'd usually like to have evaluated by people on master for at least several days before backporting.

@vtjnash
Copy link
Member

vtjnash commented Dec 21, 2014

no. all this needs however is to add a call to restore_signals() in _julia_init()

@tkelman
Copy link
Contributor

tkelman commented Dec 21, 2014

Is that completely harmless? Should someone just do that on master?

@vtjnash
Copy link
Member

vtjnash commented Dec 21, 2014

yes (we're replacing all of the signal handlers anyways, might as well turn them on). and yes.

@tkelman
Copy link
Contributor

tkelman commented Dec 21, 2014

Looks like it was even earlier than the readline change. On the Unix side of the history from before the Windows port, it was done in C in julia_init: b842bf4, but was removed in 79d95bf#diff-2 shortly after a merge.

On the Windows branch side of the history it was added as a Julia ccall in base/client.jl in 752e1bd, but lost in a different merge 96a97d8#diff-14

So now we have the implementations but never use them anywhere. Looks like it's been missing for 2 years purely by accident.

@tkelman
Copy link
Contributor

tkelman commented Dec 21, 2014

We may want to add a test for this to make sure it doesn't come back. @sqweek it sounds like latest master should work again.

This was referenced Dec 23, 2014
@tkelman tkelman modified the milestones: 0.3.5, 0.3.4 Dec 25, 2014
tkelman added a commit that referenced this issue Dec 31, 2014
(cherry picked from commit ab50bf8)

Conflicts:
	src/init.c
@tkelman
Copy link
Contributor

tkelman commented Dec 31, 2014

backported in b209dc8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
io Involving the I/O subsystem: libuv, read, write, etc. REPL Julia's REPL (Read Eval Print Loop)
Projects
None yet
Development

No branches or pull requests

5 participants