Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uv__io_stop: Assertion `loop->watchers[w->fd] == w' failed. #33966

Closed
wmertens opened this issue Jun 19, 2020 · 15 comments
Closed

uv__io_stop: Assertion `loop->watchers[w->fd] == w' failed. #33966

wmertens opened this issue Jun 19, 2020 · 15 comments
Labels
libuv Issues and PRs related to the libuv dependency or the uv binding.

Comments

@wmertens
Copy link

What steps will reproduce the bug?

I have a rather download-heavy script and it often crashes with

node: ../deps/uv/src/unix/core.c:918: uv__io_stop: Assertion `loop->watchers[w->fd] == w' failed.

I found some other bugs that seem related to datagrams so to be sure I added the host it's connecting to to /etc/hosts, hoping that means no UDP connections, however that doesn't help. The connections my script does are only TCP

How often does it reproduce? Is there a required condition?

Within a few minutes, but only when the server that I'm fetching from times out. (the server times out very often)

Additional information

I'm using node-fetch to do the downloading but that just uses the http module of course.

@addaleax
Copy link
Member

Is there any chance you can share a full reproduction?

@wmertens
Copy link
Author

Unfortunately it's proprietary code connecting to proprietary services :(

Maybe I can make a repro by making a service that times out often

@wmertens
Copy link
Author

is there something I can do to get more information from the crash? Coredumps or so?

@addaleax
Copy link
Member

@wmertens A stack trace or knowing what kind of libuv handle is involved might already be helpful. It should be possible to extract that information from a core dump, or through a debugger attached to a crashing process.

@wmertens
Copy link
Author

@addaleax I have gdb paused on a crash, what can I get?

(gdb) run
Starting program: /nix/store/f22j6islmg8nagagpcxxg26plvsjj7m7-user-environment/bin/node fsckAp.js
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/nix/store/qvf11lymvw6n8g66xgj1wsm28z1viqdv-glibc-2.30/lib/libthread_db.so.1".
[New Thread 0x7ffff57ac700 (LWP 31986)]
[New Thread 0x7ffff4fab700 (LWP 31987)]
[New Thread 0x7fffeffff700 (LWP 31988)]
[New Thread 0x7fffef7fe700 (LWP 31989)]
[New Thread 0x7fffeeffd700 (LWP 31990)]
[New Thread 0x7ffff47aa700 (LWP 31991)]
[Detaching after fork from child process 31992]
[New Thread 0x7fffee7fc700 (LWP 31994)]
[New Thread 0x7fffedffb700 (LWP 31995)]
[New Thread 0x7fffed7fa700 (LWP 31996)]
[New Thread 0x7fffecff9700 (LWP 31997)]

[...script runs...]

node: ../deps/uv/src/unix/core.c:918: uv__io_stop: Assertion `loop->watchers[w->fd] == w' failed.

Thread 1 "node" received signal SIGABRT, Aborted.
0x00007ffff57ea17a in raise () from /nix/store/qvf11lymvw6n8g66xgj1wsm28z1viqdv-glibc-2.30/lib/libc.so.6
(gdb) bt
#0  0x00007ffff57ea17a in raise () from /nix/store/qvf11lymvw6n8g66xgj1wsm28z1viqdv-glibc-2.30/lib/libc.so.6
#1  0x00007ffff57d4548 in abort () from /nix/store/qvf11lymvw6n8g66xgj1wsm28z1viqdv-glibc-2.30/lib/libc.so.6
#2  0x00007ffff57d442f in __assert_fail_base.cold.0 () from /nix/store/qvf11lymvw6n8g66xgj1wsm28z1viqdv-glibc-2.30/lib/libc.so.6
#3  0x00007ffff57e2ad2 in __assert_fail () from /nix/store/qvf11lymvw6n8g66xgj1wsm28z1viqdv-glibc-2.30/lib/libc.so.6
#4  0x0000000001446101 in uv.io_close ()
#5  0x0000000001451881 in uv.stream_close ()
#6  0x0000000001444b75 in uv_close ()
#7  0x0000000000921342 in node::HandleWrap::Close(v8::FunctionCallbackInfo<v8::Value> const&) ()
#8  0x0000000000b680d9 in v8::internal::FunctionCallbackArguments::Call(v8::internal::CallHandlerInfo) ()
#9  0x0000000000b68490 in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) ()
#10 0x0000000000b68d0a in v8::internal::Builtin_Impl_HandleApiCall(v8::internal::BuiltinArguments, v8::internal::Isolate*) ()
#11 0x0000000000b695b9 in v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) ()
#12 0x0000000001317a39 in Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit ()
#13 0x000000000129d2e4 in Builtins_InterpreterEntryTrampoline ()
#14 0x00003f63c3e404b1 in ?? ()
#15 0x000028a3a763cb41 in ?? ()
#16 0x0000000600000000 in ?? ()
#17 0x00003f63c3e40591 in ?? ()
#18 0x000008564f080f81 in ?? ()
#19 0x00003226a9d6ace9 in ?? ()
#20 0x00003f63c3e404b1 in ?? ()
#21 0x00003226a9d6ace9 in ?? ()
#22 0x000028a3a763cb41 in ?? ()
#23 0x000005f127f03221 in ?? ()
#24 0x000008564f080f81 in ?? ()
#25 0x000000c500000000 in ?? ()
#26 0x00000a42de316199 in ?? ()
#27 0x00003f6b3edf0201 in ?? ()
#28 0x000008564f080f41 in ?? ()
#29 0x00007fffffff92e0 in ?? ()
#30 0x000000000129d2e4 in Builtins_InterpreterEntryTrampoline ()
#31 0x000008564f080f09 in ?? ()
#32 0x00003226a9d6ad19 in ?? ()
#33 0x00003f63c3e404b1 in ?? ()
#34 0x000008564f080f09 in ?? ()
#35 0x00003226a9d6ad19 in ?? ()
#36 0x00003f6b3edf0201 in ?? ()
#37 0x000008564f080e99 in ?? ()
#38 0x00003370750d1699 in ?? ()
#39 0x00003f63c3e401b9 in ?? ()
#40 0x0000014200000000 in ?? ()
#41 0x00000a42de335551 in ?? ()
#42 0x000015e2e3754451 in ?? ()
#43 0x000008564f080ed1 in ?? ()
#44 0x00007fffffff9368 in ?? ()
#45 0x000000000129d2e4 in Builtins_InterpreterEntryTrampoline ()
--Type <RET> for more, q to quit, c to continue without paging--
#46 0x000008564f080e61 in ?? ()
#47 0x00003f63c3e401b9 in ?? ()
#48 0x00003226a9d6aad9 in ?? ()
#49 0x00003f63c3e404b1 in ?? ()
#50 0x000008564f080e61 in ?? ()
#51 0x00003f63c3e401b9 in ?? ()
#52 0x00003226a9d6aad9 in ?? ()
#53 0x000015e2e3754451 in ?? ()
#54 0x00003370750dc749 in ?? ()
#55 0x00003f63c3e406e9 in ?? ()
#56 0x00003f63c3e406e9 in ?? ()
#57 0x0000014400000000 in ?? ()
#58 0x00000a42de318a71 in ?? ()
#59 0x00003370750e04e9 in ?? ()
#60 0x000008564f080e21 in ?? ()
#61 0x00007fffffff93c8 in ?? ()
#62 0x000000000129d2e4 in Builtins_InterpreterEntryTrampoline ()
#63 0x00003f63c3e404b1 in ?? ()
#64 0x00003f63c3e404b1 in ?? ()
#65 0x00003226a9d6aad9 in ?? ()
#66 0x00003b698e93d971 in ?? ()
#67 0x00003226a9d6aad9 in ?? ()
#68 0x00003370750e04e9 in ?? ()
#69 0x0000007d00000000 in ?? ()
#70 0x0000301d99c32f91 in ?? ()
#71 0x00000d3210b3f251 in ?? ()
#72 0x0000255d14dc1239 in ?? ()
#73 0x00007fffffff9420 in ?? ()
#74 0x000000000129d2e4 in Builtins_InterpreterEntryTrampoline ()
#75 0x00003226a9d6a291 in ?? ()
#76 0x00003f63c3e404b1 in ?? ()
#77 0x00003f63c3e404b1 in ?? ()
#78 0x00003226a9d6a291 in ?? ()
#79 0x00000d3210b3f251 in ?? ()
#80 0x0000004200000000 in ?? ()
#81 0x0000301d99c32e19 in ?? ()
#82 0x00003226a9d69b39 in ?? ()
#83 0x00003226a9d69ab9 in ?? ()
#84 0x00007fffffff9478 in ?? ()
#85 0x000000000129d2e4 in Builtins_InterpreterEntryTrampoline ()
#86 0x00003f63c3e404b1 in ?? ()
#87 0x000007c66c3b1d01 in ?? ()
#88 0x000008564f080361 in ?? ()
#89 0x000008564f080381 in ?? ()
#90 0x00003226a9d69b39 in ?? ()
#91 0x0000006900000000 in ?? ()
#92 0x0000301d99c32a69 in ?? ()
#93 0x00003226a9d6c461 in ?? ()
#94 0x00003226a9d69ab9 in ?? ()
--Type <RET> for more, q to quit, c to continue without paging--
#95 0x00007fffffff9520 in ?? ()
#96 0x000000000129d2e4 in Builtins_InterpreterEntryTrampoline ()
#97 0x00003226a9d6c4a1 in ?? ()
#98 0x00003f63c3e404b1 in ?? ()
#99 0x00003f63c3e404b1 in ?? ()
#100 0x00003f63c3e404b1 in ?? ()
#101 0x00003226a9d6c461 in ?? ()
#102 0x000039e5c047a451 in ?? ()
#103 0x000008564f080269 in ?? ()
#104 0x000030e995412b01 in ?? ()
#105 0x00003f63c3e404b1 in ?? ()
#106 0x00003f63c3e404b1 in ?? ()
#107 0x000008564f080269 in ?? ()
#108 0x0000ea6000000000 in ?? ()
#109 0x00003226a9d6c4a1 in ?? ()
#110 0x00003f63c3e40639 in ?? ()
#111 0x0000ea6000000000 in ?? ()
#112 0x000001b000000000 in ?? ()
#113 0x00000502992e3b71 in ?? ()
#114 0x000039e5c047f811 in ?? ()
#115 0x000039e5c047a451 in ?? ()
#116 0x00007fffffff9588 in ?? ()
#117 0x000000000129d2e4 in Builtins_InterpreterEntryTrampoline ()
#118 0x0001116300000000 in ?? ()
#119 0x00003226a9d6c511 in ?? ()
#120 0x00003f63c3e404b1 in ?? ()
#121 0x000030e995412761 in ?? ()
#122 0x000039e5c047f811 in ?? ()
#123 0x00003f63c3e40639 in ?? ()
#124 0x00003226a9d6c511 in ?? ()
#125 0x000000c700000000 in ?? ()
#126 0x00000502992e3701 in ?? ()
#127 0x000030e995401579 in ?? ()
#128 0x000039e5c047a451 in ?? ()
#129 0x00007fffffff95b8 in ?? ()
#130 0x000000000129a85d in Builtins_JSEntryTrampoline ()
#131 0x0001116300000000 in ?? ()
#132 0x000028a3a763f6d1 in ?? ()
#133 0x000030e995401579 in ?? ()
#134 0x0000000000000022 in ?? ()
#135 0x00007fffffff9620 in ?? ()
#136 0x000000000129a638 in Builtins_JSEntry ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

@addaleax
Copy link
Member

@wmertens I think you should be able to go to the node::HandleWrap::Close frame, figure out the this value, and print *this from gdb – that should provide the actual class name.

Since you seem to be able to reproduce this quite well – if you can do so with a debug build of Node.js, that might be helpful here, although it requires building Node.js from source and takes quite a while as well.

@wmertens
Copy link
Author

wmertens commented Jun 23, 2020

I'm still trying to get a debug build of node to work with.

In the meantime, I discovered that if I download 1 file at a time, it doesn't happen. So it looks to be some sort of race condition? 5-at-a-time triggers it for me (didn't test 2-at-a-time).

@gireeshpunathil
Copy link
Member

@wmertens - are there any updates on this? were you able to build a debug build and try? or are you running with the work around that you have identified yourself?

@wmertens
Copy link
Author

wmertens commented Nov 4, 2020

Sorry for dropping the ball on this, indeed I'm using the workaround right now.
My problem is getting the debug build, if I have it I can probably reproduce it quickly.
An alternative is that I try to make a simpler repro script. Not sure what will take longer for me.

Of course, if you @gireeshpunathil or someone else can repro that would be great :) the errors happen when running 5 concurrent node-fetch calls, doing GET returning around 1MB each (not sure of this number)

@gireeshpunathil gireeshpunathil added the libuv Issues and PRs related to the libuv dependency or the uv binding. label Nov 4, 2020
@gireeshpunathil
Copy link
Member

I will see if I can produce and supply a debug build

@gireeshpunathil
Copy link
Member

@wmertens - I have built a debug build on v12.x lines and kept it in gireeshpunathil/noded:v5 docker image and published it. you can volume mount the /node folder to some host location, copy the executable, gunzip those and run and debug your test.

#docker run -it gireeshpunathil/noded:v5 sh
/ # cd node
/node # ls -lrt
total 404244
-rwxr-xr-x    1 root     root     396495685 Nov  4 15:23 node_d.gz
-rwxr-xr-x    1 root     root      17443476 Nov  4 15:50 node_r.gz
/node # 

@wmertens
Copy link
Author

wmertens commented Jan 4, 2021

@gireeshpunathil Thanks! I had trouble setting up docker on my repro VM though.

In any case, I tried what @addaleax asked:

(gdb) frame
#9  0x0000000000a4d22e in node::HandleWrap::Close (this=0x25a44a0, close_callback=...) at ../src/handle_wrap.cc:74
74      in ../src/handle_wrap.cc
(gdb) print *this
$2 = {<node::AsyncWrap> = {<node::BaseObject> = {<node::MemoryRetainer> = {_vptr.MemoryRetainer = 0x1fb3f78 <vtable for node::TCPWrap+16>}, persistent_handle_ = {<v8::PersistentBase<v8::Object>> = {val_ = 0x232ab00}, <No data fields>}, env_ = 0x20f4d40, pointer_data_ = 0x0}, static kInvalidAsyncId = -1, 
    provider_type_ = node::AsyncWrap::PROVIDER_TCPWRAP, init_hook_ran_ = true, async_id_ = 19471, trigger_async_id_ = 0}, handle_wrap_queue_ = {prev_ = 0x258c198, next_ = 0x20f5590}, state_ = node::HandleWrap::kInitialized, handle_ = 0x25a4540}

Does that help?

@onozaty
Copy link

onozaty commented Jul 20, 2021

I also encountered the same problem.
I investigated and found that the problem has already been fixed.

This fix has been included in NodeJS 14 and 16.
If you can get it up to NodeJS 14 or 16, that might be one way to handle it.

@wmertens
Copy link
Author

Awesome! I'll upgrade :)

@kid551
Copy link

kid551 commented Dec 29, 2021

Add more information based on #33966 (comment): the libuv-fix#2686 was released on libuv-v1.41.0(it's VERY hard to find the corresponding release version of libuv-fix#2686), which was merged in node-v14.17.0 instead of v14.0.0.

So, if anyone want to upgrade Nodejs, you should upgrade to at least node-v14.17.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libuv Issues and PRs related to the libuv dependency or the uv binding.
Projects
None yet
Development

No branches or pull requests

5 participants