Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequent segfaults in push_fs_result #644

Closed
happenslol opened this issue Apr 6, 2023 · 16 comments · Fixed by #662
Closed

Frequent segfaults in push_fs_result #644

happenslol opened this issue Apr 6, 2023 · 16 comments · Fixed by #662
Labels

Comments

@happenslol
Copy link

I'm using the nightly neovim build and am regularly encountering segfaults. I haven't yet 100% narrowed down when they occur, but it mostly seems to be when files are changed while the editor is opened.

This is what the stack trace looks like:

                #0  0x00007f7c9db4e939 push_fs_result (libluv.so.1 + 0xf939)
                #1  0x00007f7c9db542b3 luv_fs_cb (libluv.so.1 + 0x152b3)
                #2  0x00007f7c9d9333fd uv__work_done (libuv.so.1 + 0xc3fd)
                #3  0x00007f7c9d9370cd uv__async_io.part.0 (libuv.so.1 + 0x100cd)
                #4  0x00007f7c9d94ae6c uv__io_poll (libuv.so.1 + 0x23e6c)
                #5  0x00007f7c9d937a14 uv_run (libuv.so.1 + 0x10a14)
                #6  0x0000000000538948 loop_uv_run (nvim + 0x138948)
                #7  0x0000000000643cad inbuf_poll.lto_priv.0 (nvim + 0x243cad)
                #8  0x0000000000643eed os_inchar (nvim + 0x243eed)
                #9  0x00000000006cdaad state_enter (nvim + 0x2cdaad)
                #10 0x0000000000609214 normal_enter (nvim + 0x209214)
                #11 0x00000000004552d0 main (nvim + 0x552d0)
                #12 0x00007f7c9d73d24e __libc_start_call_main (libc.so.6 + 0x2924e)
                #13 0x00007f7c9d73d309 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x29309)
                #14 0x0000000000457585 _start (nvim + 0x57585)

I'm not sure what else to include here, so please tell me if there's any additional information you require.

@squeek502 squeek502 added the bug label Apr 7, 2023
@squeek502
Copy link
Member

squeek502 commented Apr 7, 2023

Is this reproducible with default neovim or is it possible it only happens with some combination of plugins?

If possible, knowing the line number of where the crash is happening in push_fs_result would likely be very helpful.

@squeek502
Copy link
Member

Stacktrace looks similar to one in neovim/neovim#21467

@happenslol
Copy link
Author

happenslol commented Apr 7, 2023

I'm 99% sure it happens when I have neo-tree enabled, but I can't say for certain. However, I've tried disabling their usage of libuv, and the crashes have persisted.

How would I go about getting the line number? Loading the segfault into gdb only provides me with the stacktrace I have posted above. I'm assuming I can only get the line number out if debug information is compiled in, I'm not sure how I would do that in this case

Edit: Just had a look at that thread. One of the stacktraces in there is the exact same as mine, however the command posted in there (:lua require("luv").handle_get_type(newproxy())) also causes a segfault for me, albeit with a different stacktrace.

@squeek502
Copy link
Member

squeek502 commented Apr 7, 2023

Compiling neovim from source with debug info would probably work. You could also try the instructions here if that makes anything easier:

NixOS/nixpkgs#219400 (comment)

EDIT: Just tried and compiling neovim from source is pretty painless, can use CMAKE_BUILD_TYPE=Debug for making sure debug info will be available. Wasn't able to reproduce this crash, though.


The :lua require("luv").handle_get_type(newproxy()) crash is different, and should be fixed by #634

@happenslol
Copy link
Author

I'll try that tomorrow and report back. Thanks for the link!

@happenslol
Copy link
Author

Alright, that was quite the journey since the nix build has been broken for a few weeks on NixOS due to treesitter not being up to date in nixpkgs, but I got one step further. libluv still seems to not have debug symbols even though I built neovim-debug, but at least there's some line numbers for the neovim portion now:

#0  0x00007fe385c789ee in push_fs_result () from /nix/store/fsdy4sq9pi4ibp0p6gjzp9lgi5ap77yq-libluv-1.43.0-0/lib/libluv.so.1
#1  0x00007fe385c7e524 in luv_fs_cb () from /nix/store/fsdy4sq9pi4ibp0p6gjzp9lgi5ap77yq-libluv-1.43.0-0/lib/libluv.so.1
#2  0x00007fe385a603c2 in uv.work_done () from /nix/store/avbmp3dcrbzrckrprx48cxx2mwlh825l-libuv-1.44.2/lib/libuv.so.1
#3  0x00007fe385a6409d in uv.async_io.part () from /nix/store/avbmp3dcrbzrckrprx48cxx2mwlh825l-libuv-1.44.2/lib/libuv.so.1
#4  0x00007fe385a780d5 in uv.io_poll () from /nix/store/avbmp3dcrbzrckrprx48cxx2mwlh825l-libuv-1.44.2/lib/libuv.so.1
#5  0x00007fe385a649bc in uv_run () from /nix/store/avbmp3dcrbzrckrprx48cxx2mwlh825l-libuv-1.44.2/lib/libuv.so.1
#6  0x000000000051c267 in loop_uv_run (loop=0x7ed518 <main_loop>, ms=ms@entry=0, once=true) at /build/ab6rvrg81mvwsivc1rhdlfp07qgnsyrg-source/src/nvim/event/loop.c:65
#7  loop_poll_events (loop=0x7ed518 <main_loop>, ms=ms@entry=0) at /build/ab6rvrg81mvwsivc1rhdlfp07qgnsyrg-source/src/nvim/event/loop.c:87
#8  0x0000000000604b2d in os_breakcheck () at /build/ab6rvrg81mvwsivc1rhdlfp07qgnsyrg-source/src/nvim/os/input.c:197
#9  0x000000000055dc18 in vgetorpeek (advance=140) at /build/ab6rvrg81mvwsivc1rhdlfp07qgnsyrg-source/src/nvim/getchar.c:2378
#10 0x000000000055cfae in vpeekc () at /build/ab6rvrg81mvwsivc1rhdlfp07qgnsyrg-source/src/nvim/getchar.c:1635
#11 0x000000000068b029 in state_enter (s=s@entry=0x7ffed28cc4d0) at /build/ab6rvrg81mvwsivc1rhdlfp07qgnsyrg-source/src/nvim/state.c:61
#12 0x00000000005d4b26 in normal_enter (cmdwin=false, noexmode=false) at /build/ab6rvrg81mvwsivc1rhdlfp07qgnsyrg-source/src/nvim/normal.c:497
#13 0x0000000000456ea8 in main (argc=<optimized out>, argv=<optimized out>) at /build/ab6rvrg81mvwsivc1rhdlfp07qgnsyrg-source/src/nvim/main.c:641

@happenslol happenslol changed the title Regular segfaults in push_fs_result Frequent segfaults in push_fs_result Apr 8, 2023
@squeek502
Copy link
Member

squeek502 commented Apr 8, 2023

Unfortunately the libluv line numbers would be the helpful bit, since push_fs_result contains a switch statement so if the crash is happening in a particular case then it'd narrow down the possible reproductions significantly.

If you're using that nixpkgs branch, maybe adding separateDebugInfo = true; to here would give you debug info for libluv? (note that this is a total guess on my part, I have no experience with nixpkgs)

@happenslol
Copy link
Author

Mhm, no luck so far I'm afraid. I've tried compiling the debug symbols separately and loading them into gdb, but the nixpkgs version seems to be different since I'm getting bogus line numbers. I'm not too experienced with overriding nixpkgs either, I'll try to get some help on the forums for that. Man, nix is amazing when it works, but it makes things like these so complicated...

Thanks for your patience!

@teto
Copy link

teto commented Apr 18, 2023

enabling debug symbol can differ between projects, separateDebugInfo might be one of those case, if you can point me at instructions to enable debug symbols in libuv, we can see how to modify the nix expression together.

@happenslol
Copy link
Author

happenslol commented Apr 18, 2023

Yeah, libuv didn't have separateDebugInfo, but I managed to enable it myself by overriding libluv in the rust flake and settings the cmake build type as well as dontStrip (that last one took a bit to figure out..), and I have libluv with debug symbols now. Turns out my last crash was so long ago that coredumpctl already cleaned out the stack traces though, so I'll have to wait for the next crash to get you that line number :-P

enabling debug symbol can differ between projects, separateDebugInfo might be one of those case, if you can point me at instructions to enable debug symbols in libuv, we can see how to modify the nix expression together.

Thanks a lot for the offer still! I learned a lot about overriding things in nix, and I can at least do it for separate targets now. My current way would be building libluv by itself with debug symbols, stripping them out using objcopy and then loading them dynamically in coredumpctl with gdb. Writing an overlay to modify the libluv that neovim builds with would probably be a lot easier, but I haven't done a deep dive into how overlays work yet.

@zhaozg
Copy link
Member

zhaozg commented Aug 19, 2023

Pay attention to neovim/neovim#21413 (comment)

@squeek502
Copy link
Member

The lines the backtrace is pointing to:

luv/src/fs.c

Line 103 in e2fbfba

lua_pushstring(L, ent->name);

luv/src/fs.c

Line 352 in e2fbfba

luv_push_dirent(L, dir->dirents+i, 1);

luv/src/fs.c

Line 377 in e2fbfba

int nargs = push_fs_result(L, req);

@zhaozg
Copy link
Member

zhaozg commented Aug 19, 2023

Let's do some analyze.

  1. In uv.fs_opendir result callback, by newuserdata to create luv_dir, by newuserdata to create luv_dir->handle->dirents and set luv_dir->dirents_ref to dirents.
  2. luv_dir->dirents_ref be unref in uv.fs_closedir or luv_fs_dir_gc, cause dirents gc to invalid.
  3. After call fs_readdir, luv_dir mybe gc before fs_readdir be called.
  4. So we should ref luv_dir in fs_readdir, and unref in readdir callback, avoid lost dirents memory.

@zhaozg
Copy link
Member

zhaozg commented Aug 19, 2023

Reproduced

  test("fs.{open,read,close}dir ref check", function(print, p, expect, uv)
    local dir = assert(uv.fs_opendir('.', nil, 50))

    local function readdir_cb(err, dirs)
      assert(not err)
      if dirs then
        p(dirs)
        uv.fs_readdir(dir, readdir_cb)
      else
        assert(uv.fs_closedir(dir)==true)
      end
    end

    uv.fs_readdir(dir, readdir_cb)
    dir = nil
    collectgarbage()
    collectgarbage()
    collectgarbage()

  end, "1.28.0")

zhaozg added a commit to zhaozg/luv that referenced this issue Aug 19, 2023
@squeek502
Copy link
Member

squeek502 commented Aug 19, 2023

That reproduction produces a different stack trace for me when I run it via gdb:

#0  0x00007ffff7c321dc in uv__fs_readdir (req=<optimized out>, req=<optimized out>) at /home/ryan/Programming/luvit/luv/deps/libuv/src/unix/fs.c:610
610	    dirent->name = uv__strdup(res->d_name);
#1  uv__fs_work (w=<optimized out>) at /home/ryan/Programming/luvit/luv-tmp/deps/libuv/src/unix/fs.c:1709
#2  0x00007ffff7c2a34e in worker (arg=0x0) at /home/ryan/Programming/luvit/luv-tmp/deps/libuv/src/threadpool.c:122
#3  0x00007ffff7be4609 in start_thread (arg=<optimized out>) at pthread_create.c:477

but I think the fix might solve the luv_push_dirent segfault, too (it's likely the same problem; the garbage collection is just happening at a different time).

@squeek502
Copy link
Member

Nevermind, the stack trace is the same as the neovim one if I run it with LuaJIT (I was using PUC Lua since sometimes that makes things easier to debug):

Thread 1 "luajit" received signal SIGSEGV, Segmentation fault.
luv_push_dirent (L=L@entry=0x7ffff7fa9380, ent=0x0, table=table@entry=1) at /home/ryan/Programming/luvit/luv/src/fs.c:121
121	  lua_pushstring(L, ent->name);

#0  luv_push_dirent (L=L@entry=0x7ffff7fa9380, ent=0x0, table=table@entry=1) at /home/ryan/Programming/luvit/luv/src/fs.c:121
#1  0x00007ffff7bfb1d8 in push_fs_result (L=L@entry=0x7ffff7fa9380, req=req@entry=0x7ffff7fc84d8) at /home/ryan/Programming/luvit/luv/src/fs.c:371
#2  0x00007ffff7bfb5b1 in luv_fs_cb (req=0x7ffff7fc84d8) at /home/ryan/Programming/luvit/luv/src/fs.c:401
#3  0x00007ffff7c10240 in uv__work_done (handle=0x7ffff7fbc1f0) at /home/ryan/Programming/luvit/luv/deps/libuv/src/threadpool.c:329
#4  0x00007ffff7c1407b in uv__async_io (loop=0x7ffff7fbc140, w=0x7fffffff9580, events=<optimized out>) at /home/ryan/Programming/luvit/luv/deps/libuv/src/unix/async.c:176
#5  0x00007ffff7c25ff3 in uv__io_poll (loop=loop@entry=0x7ffff7fbc140, timeout=<optimized out>) at /home/ryan/Programming/luvit/luv/deps/libuv/src/unix/linux.c:1303
#6  0x00007ffff7c14cc3 in uv_run (loop=0x7ffff7fbc140, mode=mode@entry=UV_RUN_DEFAULT) at /home/ryan/Programming/luvit/luv/deps/libuv/src/unix/core.c:447
#7  0x00007ffff7c0bc00 in luv_run (L=0x7ffff7fa9380) at /home/ryan/Programming/luvit/luv/src/loop.c:36
#8  0x00005555555ca03b in lj_BC_FUNCC () at buildvm_x86.dasc:859
#9  0x00005555555bbe03 in lua_pcall (L=0x7ffff7fa9380, nargs=<optimized out>, nresults=-1, errfunc=<optimized out>) at /home/ryan/Programming/luvit/luv/deps/luajit/src/lj_api.c:1116
#10 0x000055555555c8ab in docall (L=0x7ffff7fa9380, narg=0, clear=0) at /home/ryan/Programming/luvit/luv/deps/luajit/src/luajit.c:122
#11 0x000055555555dbd2 in handle_script (argx=<optimized out>, L=0x7ffff7fa9380) at /home/ryan/Programming/luvit/luv/deps/luajit/src/luajit.c:292
#12 pmain (L=0x7ffff7fa9380) at /home/ryan/Programming/luvit/luv/deps/luajit/src/luajit.c:550
#13 0x00005555555ca03b in lj_BC_FUNCC () at buildvm_x86.dasc:859
#14 0x00005555555bbfa1 in lua_cpcall (L=<optimized out>, func=<optimized out>, ud=<optimized out>) at /home/ryan/Programming/luvit/luv/deps/luajit/src/lj_api.c:1173
#15 0x000055555555c70e in main (argc=2, argv=0x7fffffffda48) at /home/ryan/Programming/luvit/luv/deps/luajit/src/luajit.c:581

zhaozg added a commit to zhaozg/luv that referenced this issue Aug 19, 2023
zhaozg added a commit that referenced this issue Aug 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants