Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

require polymake.jl 0.9 for polymake 4.9 #1902

Merged
merged 2 commits into from
Feb 9, 2023
Merged

require polymake.jl 0.9 for polymake 4.9 #1902

merged 2 commits into from
Feb 9, 2023

Conversation

benlorenz
Copy link
Member

@benlorenz benlorenz commented Feb 3, 2023

@benlorenz
Copy link
Member Author

On my laptop I get weird crashes when I load Oscar and then suspend it with Ctrl+Z, I haven't seen this anywhere else yet. It looks like it may be caused by the mixing of different ncurses libraries:

Thread 1 "julia-1.8.5" received signal SIGSEGV, Segmentation fault.
0x00007f7b4562b6ae in vidputs_sp () from /home/lorenz/.julia/artifacts/d56f6bc8675ad021bf8070a7bce573e3fbe2e737/lib/libncursesw.so.6
(gdb) bt
#0  0x00007f7b4562b6ae in vidputs_sp ()
   from /home/lorenz/.julia/artifacts/d56f6bc8675ad021bf8070a7bce573e3fbe2e737/lib/libncursesw.so.6
#1  0x00007f7bdd6c25ee in _nc_screen_wrap_sp () from /lib64/libncurses.so.6
#2  0x00007f7bdd6b030e in endwin_sp () from /lib64/libncurses.so.6
#3  0x00007f7bdd6bc2a9 in handle_SIGTSTP () from /lib64/libncurses.so.6
#4  <signal handler called>
#5  0x00007f7bf3b7e8fc in __pthread_kill_implementation () from /lib64/libc.so.6
#6  0x00007f7bf3b32a42 in raise () from /lib64/libc.so.6
#7  0x00007f7bdf72f826 in julia_run_interface_64197 ()
    at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/REPL/src/LineEdit.jl:2507
#8  0x00007f7bdefc8a0f in julia_run_frontend_63772 ()
    at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:1248
#9  0x00007f7bdefc8f33 in julia_#49_65087 () at task.jl:484
#10 0x00007f7bdefc8f49 in jfptr_YY.49_65088.clone_1 ()
   from /home/lorenz/software/polymake/julia/julia/julia-1.8.5/lib/julia/sys.so
#11 0x00007f7bf2e49dbe in _jl_invoke (world=<optimized out>, 
    mfunc=0x7f7be0091c10 <jl_system_image_data+2267472>, nargs=0, args=0x7f7bea2388e8, F=0x7f7bea2c8350)
    at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2377
#12 ijl_apply_generic (F=<optimized out>, args=args@entry=0x7f7bea2388e8, nargs=nargs@entry=0)
    at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2559
#13 0x00007f7bf2e6c500 in jl_apply (nargs=1, args=0x7f7bea2388e0)
    at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/julia.h:1843
#14 start_task () at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/task.c:931

I still don't know why the new polymake has such a big effect on the ncurses, the only relevant thing that we changed is to add a new dependency on SCIP_jll (which also depends on Readline_jll) but I don't think there is any extra ncurses initialization running. cc: @fingolfin

@fingolfin
Copy link
Member

I will have a look. I just tried reproducing this on macOS, but no crash, yet some other "funny behaviour": pressing Ctrl-Z once did not get me to a shell, but left me in a weird limbo state. Pressing it again then got me to a shell, from which I could resume.

But it was also this way on master, so Polymake.jl 0.9.0 is not at fault for that. But perhaps GAP and/or it's and older issue? It might of course also be unrelated to the crash you are seeing (I'll next test on a Linux machine).

This does look a bit as if two SIGTSTP handlers might be involved and interfering with each other? IIRC on macOS, Julia normally bypasses the usual signal handling and directly interacts with XNU (the mach kernel). So perhaps that is triggered and then the "regular" signal handler (or the other way around) ?

Perhaps GAP's Browse package, resp. the fact that it uses ncurses, is at the core of the problem here? To investigate, we can disable the call GAP.Packages.load("browse"; install=true in src/Oscar.jl and then carefully delete its compiled extension, i.e., rm -rf ~/.julia/gaproot/v4.12/pkg/Browse-*

(BTW we are installing an outdated version of Browse.... That's another bug I'll look into now).

@fingolfin
Copy link
Member

I was not yet able to trigger a crash with this branch, even on Linux. But also on Linux I get this weird "need to press Ctrl-Z" behavior.

I also checked what happens if I load OSCAR without Browse, and the behavior is still this way...

And now I realize that Julia seems to always do this, even if I just start julia, not loading anything, and also on macOS and Linux, and in two different terminal emulators?!?!? Is that just me?

@benlorenz
Copy link
Member Author

benlorenz commented Feb 6, 2023

TLDR: I think the main problem is that Browse is compiled against the system-ncurses instead of the artifact and then both are loaded, the configuration of the system-ncurses and the load-order has some influence on the effects, e.g. crash / weird terminal output / ... So I guess oscar-system/GAP.jl#614 might help here as well.
(The corrupt terminal output is similar to oscar-system/GAP.jl#741)

Perhaps GAP's Browse package, resp. the fact that it uses ncurses, is at the core of the problem here? To investigate, we can disable the call GAP.Packages.load("browse"; install=true in src/Oscar.jl and then carefully delete its compiled extension, i.e., rm -rf ~/.julia/gaproot/v4.12/pkg/Browse-*

With that line removed and the Browse folder deleted the crash goes away and the terminal stays fine even after loading Oscar:

Version 0.11.3-DEV ... 
 ... which comes with absolutely no warranty whatsoever
Type: '?Oscar' for more information
(c) 2019-2023 by The OSCAR Development Team

julia> 
[1]+  Stopped                 julia-1.8.5 --project=.
lorenz@eddie ~/software/polymake/julia/Oscar-tmp.jl $ fg
julia-1.8.5 --project=.
julia> 

julia> filter(x->contains(x,"curses"),dllist())
1-element Vector{String}:
"/home/lorenz/.julia/artifacts/d56f6bc8675ad021bf8070a7bce573e3fbe2e737/lib/libncursesw.so.6"

I was not yet able to trigger a crash with this branch, even on Linux. But also on Linux I get this weird "need to press Ctrl-Z" behavior.

  1. On my openSUSE desktop Browse fails to install (v4.12/pkg/log/Browse-1.8.20.log):
    src/ncurses.c:27:17: fatal error: panel.h: No such file or directory
    I get no crashes there and the terminal output stays correct even after suspending and resuming.

  2. I have seen the crash only on my Gentoo Linux Laptop so far:

    • First start after re-enabling the install line in Oscar.jl, suspend does work but after fg the whole terminal is cleared and all julia-output looks weird because the line-length seems wrong:

      Version 0.11.3-DEV ... 
      ...
      julia> filter(x->contains(x,"curses"),dllist())
      3-element Vector{String}:
       "/home/lorenz/.julia/artifacts/d56f6bc8675ad021bf8070a7bce573e3fbe2e737/lib/libncursesw.so.6"
       "/home/lorenz/.julia/gaproot/v4.12/pkg/Browse-1.8.20/bin/x86_64-pc-linux-gnu-julia1.8-64-kv8/ncurses.so"
       "/lib64/libncurses.so.6"
      
      julia> 
      [1]+  Stopped                 julia-1.8.5 --project=.
      $ fg
      julia> dllist()
      170-element Vector{String}:
                                  "linux-vdso.so.1"
                                                    "/lib64/libdl.so.2"
                                                                        "/lib64/libpthread.so.0"
                                                                                                             "/lib64/libc.so.6"
           "/home/lorenz/software/polymake/julia/julia/julia-1.8.5/bin/../lib/libjulia.so.1"
      
    • Second start, terminal flickers during loading of Oscar and it crashes when suspending after Oscar is loaded:

      Version 0.11.3-DEV ... 
      ...
      julia> filter(x->contains(x,"curses"),dllist())
      3-element Vector{String}:
       "/home/lorenz/.julia/artifacts/d56f6bc8675ad021bf8070a7bce573e3fbe2e737/lib/libncursesw.so.6"
       "/home/lorenz/.julia/gaproot/v4.12/pkg/Browse-1.8.20/bin/x86_64-pc-linux-gnu-julia1.8-64-kv8/ncurses.so"
       "/lib64/libncurses.so.6"
      
      julia> 
      signal (11): Segmentation fault
      in expression starting at none:0
      vidputs_sp at /home/lorenz/.julia/artifacts/d56f6bc8675ad021bf8070a7bce573e3fbe2e737/lib/libncursesw.so.6 (unknown line)
      _nc_screen_wrap_sp at /lib64/libncurses.so.6 (unknown line)
      endwin_sp at /lib64/libncurses.so.6 (unknown line)
      ...
      

    I think it depends on the configuration of the system-ncurses installation. The difference between the first and the second start is probably the timing when Browse is loaded (during GAP init or during Oscar init). Loaded libraries further down.

    Edit: The backtrace contains in the top two frames: vidputs_sp in ~/.julia/artifacts/d56f6bc8675ad021bf8070a7bce573e3fbe2e737/lib/libncursesw.so.6 called from
    _nc_screen_wrap_sp in /lib64/libncurses.so.6.

And now I realize that Julia seems to always do this, even if I just start julia, not loading anything, and also on macOS and Linux, and in two different terminal emulators?!?!? Is that just me?

Maybe that is just you, without Oscar it looks like this for me on Gentoo Linux and macOS (munk):

julia> 
[1]+  Stopped                 julia-1.8.5
lorenz@eddie ~ $ fg
julia-1.8.5
julia> 
julia> 
zsh: suspended  /Applications/Julia-1.8.app/Contents/Resources/julia/bin/julia
lorenz@munk ~ % fg
[1]  + continued  /Applications/Julia-1.8.app/Contents/Resources/julia/bin/julia
julia> 

No second Ctrl+Z and no weird terminal behaviour, this is with urxvt and bash.

Details

First start:

...
 "/home/lorenz/.julia/artifacts/d56f6bc8675ad021bf8070a7bce573e3fbe2e737/lib/libform.so"
 "/home/lorenz/.julia/artifacts/d56f6bc8675ad021bf8070a7bce573e3fbe2e737/lib/libncursesw.so.6"
 "/home/lorenz/.julia/artifacts/d56f6bc8675ad021bf8070a7bce573e3fbe2e737/lib/libmenu.so"
 "/home/lorenz/.julia/artifacts/d56f6bc8675ad021bf8070a7bce573e3fbe2e737/lib/libpanel.so"
 "/home/lorenz/.julia/artifacts/9bcd3bb0f3644029835b64523caf43d4881b24fe/lib/libhistory.so"
 "/home/lorenz/.julia/artifacts/9bcd3bb0f3644029835b64523caf43d4881b24fe/lib/libreadline.so"
 "/home/lorenz/.julia/artifacts/a7a51f050ae9f280687f459aa628ab00b6907075/lib/libgap.so"
 "/lib64/libutil.so.1"
 "/home/lorenz/.julia/artifacts/3178f2673dfbca96a1b950d66be9d4b60f1268f6/lib/gap/JuliaInterface.so"
 "/home/lorenz/.julia/artifacts/b9c99d2b2538a54bc5b916d41c98a1668c141ac0/lib/libcxxwrap_julia.so"
 "/home/lorenz/.julia/artifacts/b9c99d2b2538a54bc5b916d41c98a1668c141ac0/lib/libcxxwrap_julia_stl.so"
 "/home/lorenz/.julia/artifacts/07ef9c18296d78f06f5b946ed2c127b4aae6ac2c/lib/libperl.so"
...
 "/home/lorenz/.julia/artifacts/51aeebd7aa37184cb3796181547a7148af3bd674/lib/libpolymake.so"
 "/home/lorenz/.julia/artifacts/51aeebd7aa37184cb3796181547a7148af3bd674/lib/libpolymake-apps-rt.so"
...
 "/home/lorenz/.julia/artifacts/52a5e906a5e0fa3b1f54f01c7000738ad93356fb/lib/libpolymake_julia.so"
...
 "/home/lorenz/.julia/artifacts/04a3e844b10e5d434a337eb602d5849c342f08d3/lib/libSingular.so"
 "/home/lorenz/.julia/artifacts/a747cd08725526af0fd723a541e1bec772a613da/lib/libsingular_julia.so"
 "/home/lorenz/.julia/gaproot/v4.12/pkg/ferret-1.0.9/bin/x86_64-pc-linux-gnu-julia1.8-64-kv8/ferret.so"
 "/home/lorenz/.julia/gaproot/v4.12/pkg/Browse-1.8.20/bin/x86_64-pc-linux-gnu-julia1.8-64-kv8/ncurses.so"
 "/usr/lib64/libpanel.so.6"
 "/lib64/libncurses.so.6"
 "/lib64/libtinfo.so.6"

Second start:

...
 "/home/lorenz/.julia/artifacts/d56f6bc8675ad021bf8070a7bce573e3fbe2e737/lib/libform.so"
 "/home/lorenz/.julia/artifacts/d56f6bc8675ad021bf8070a7bce573e3fbe2e737/lib/libncursesw.so.6"
 "/home/lorenz/.julia/artifacts/d56f6bc8675ad021bf8070a7bce573e3fbe2e737/lib/libmenu.so"
 "/home/lorenz/.julia/artifacts/d56f6bc8675ad021bf8070a7bce573e3fbe2e737/lib/libpanel.so"
 "/home/lorenz/.julia/artifacts/9bcd3bb0f3644029835b64523caf43d4881b24fe/lib/libhistory.so"
 "/home/lorenz/.julia/artifacts/9bcd3bb0f3644029835b64523caf43d4881b24fe/lib/libreadline.so"
 "/home/lorenz/.julia/artifacts/a7a51f050ae9f280687f459aa628ab00b6907075/lib/libgap.so"
 "/lib64/libutil.so.1"
 "/home/lorenz/.julia/artifacts/3178f2673dfbca96a1b950d66be9d4b60f1268f6/lib/gap/JuliaInterface.so"
 "/home/lorenz/.julia/gaproot/v4.12/pkg/Browse-1.8.20/bin/x86_64-pc-linux-gnu-julia1.8-64-kv8/ncurses.so"
 "/usr/lib64/libpanel.so.6"
 "/lib64/libncurses.so.6"
 "/lib64/libtinfo.so.6"
 "/home/lorenz/.julia/artifacts/b9c99d2b2538a54bc5b916d41c98a1668c141ac0/lib/libcxxwrap_julia.so"
 "/home/lorenz/.julia/artifacts/b9c99d2b2538a54bc5b916d41c98a1668c141ac0/lib/libcxxwrap_julia_stl.so"
 "/home/lorenz/.julia/artifacts/07ef9c18296d78f06f5b946ed2c127b4aae6ac2c/lib/libperl.so"
...
 "/home/lorenz/.julia/artifacts/51aeebd7aa37184cb3796181547a7148af3bd674/lib/libpolymake.so"
 "/home/lorenz/.julia/artifacts/51aeebd7aa37184cb3796181547a7148af3bd674/lib/libpolymake-apps-rt.so"
...
 "/home/lorenz/.julia/artifacts/04a3e844b10e5d434a337eb602d5849c342f08d3/lib/libSingular.so"
 "/home/lorenz/.julia/artifacts/a747cd08725526af0fd723a541e1bec772a613da/lib/libsingular_julia.so"
 "/home/lorenz/.julia/gaproot/v4.12/pkg/ferret-1.0.9/bin/x86_64-pc-linux-gnu-julia1.8-64-kv8/ferret.so"

@benlorenz
Copy link
Member Author

I have now moved Polymake up in our imports to load it before GAP, this seems to fix the crashes but I don't really understand why. There are still two ncurses libraries active.

Another workaround for me locally is to manually build Browse with the correct ncurses:

$ cd ~/.julia/gaproot/v4.12/pkg/Browse-1.8.20
$ rm bin/x86_64-pc-linux-gnu-julia1.8-64-kv8/ncurses.so
$ make CFLAGS='-I/home/lorenz/.julia/artifacts/d56f6bc8675ad021bf8070a7bce573e3fbe2e737/include/ncurses' LDFLAGS='-L/home/lorenz/.julia/artifacts/d56f6bc8675ad021bf8070a7bce573e3fbe2e737/lib'

This removes the system-ncurses library from dllist and fixes the crash.

@benlorenz benlorenz marked this pull request as ready for review February 8, 2023 14:59
@benlorenz
Copy link
Member Author

I have tried various different Linux platforms and haven't seen any crashes with this new import order, so I would merge this later today if there are no objections (this update is needed for a few other things).

@benlorenz benlorenz merged commit 92cf456 into master Feb 9, 2023
@benlorenz benlorenz deleted the bl/pm49 branch February 9, 2023 14:18
@fingolfin
Copy link
Member

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants