Skip to content
This repository has been archived by the owner on May 4, 2018. It is now read-only.

unix: Assertion `events == 1' failed #826

Closed
emilsedgh opened this issue Jun 5, 2013 · 30 comments
Closed

unix: Assertion `events == 1' failed #826

emilsedgh opened this issue Jun 5, 2013 · 30 comments

Comments

@emilsedgh
Copy link

node: ../deps/uv/src/unix/stream.c:494: uv__server_io: Assertion `events == 1' failed.

Im on node 0.10.10 and therefore livuv 0.10.10.
This happens when node-phantom tries to spawn phantom's process.
On Debian 7.0

@bnoordhuis
Copy link
Contributor

Can you post the output of strace -fe \!mmap,munmap,mprotect,futex,nanosleep,gettimeofday node script.js? If you add -ff -o trace.log, it'll write to per-pid log files.

@emilsedgh
Copy link
Author

Unfortunately,I cannot reproduce this anymore. I will report back when the error is back.

@cmisare
Copy link

cmisare commented Jun 17, 2013

I am occasionally seeing this with node 0.10.11. The strace output should be available here:

https://mega.co.nz/#!VUsC3RCY!XRDjrwj0b-Xm0Ayqz6gJ5hKpW-8ipaow9DFzCOZ4iCQ

Thanks in advance, and let me know if there is anything else I can provide.

@bnoordhuis
Copy link
Contributor

@cmisare Can you gist the log? That website is quite annoying and doesn't work all that great in my browser.

@cmisare
Copy link

cmisare commented Jun 18, 2013

Apologies for that. I had run the strace with the -ff option and wasn't sure where to upload a bundle of the output files. Here is a gist of the output without the -ff:

https://gist.github.com/cmisare/ea88e84a7d1a932a73cf

@bnoordhuis
Copy link
Contributor

Thanks. Are you on a recent-ish Fedora system, per chance? Your version of strace suffers from a bug that I fixed a while ago where it doesn't process the return value from process_vm_readv() correctly.

Can you clone git://strace.git.sourceforge.net/gitroot/strace/strace and run ./configure && make -j4? Don't worry, it builds in a few seconds and you don't have to install it to run it.

Failing that, do you have some kind of test case that I can try?

@bnoordhuis
Copy link
Contributor

I forget to mention what the effect of the bug is... effectively you get this:

10129 epoll_wait(5, {?} 0x7fff1a81e4f0, 1024, 45) = 8

Rather than this:

[pid  5688] epoll_wait(5, {{EPOLLIN, {u32=16, u64=16}}, {EPOLLIN, {u32=17, u64=17}}, {EPOLLIN, {u32=18, u64=18}}}, 1024, 779) = 3

@cmisare
Copy link

cmisare commented Jun 19, 2013

I am indeed on Fedora 18. Good catch.

It looks like configure is gitignore'd in the strace repo, so I tried creating it using "aclocal -I m4 && autoheader && automake --add-missing && autoconf". If that is incorrect please let me know what I should do instead.

I do now see lines with epoll_wait and EPOLLIN, so here is a gist of the output from running with the compiled strace: https://gist.github.com/cmisare/b72bfdc0b2655a3b208b

Unfortunately, this isn't my code to share. If nothing comes out of this strace log I can try to isolate a test case to reproduce but I will need to get approval for that. Sorry.

@dule
Copy link

dule commented Jun 20, 2013

I am also seeing this when using node cluster on node v0.10.10 running Ubuntu server 12.04 LTS. I would give a strace but it's a private project that would produce a trace with probably a lot of noise. I'm trying to build a small test case to try to isolate it, but have yet to reproduce it in a smaller test project.

The crash doesn't occur if I don't use clustering, and it also doesn't occur if I disable the logger lib we're using (ain2). The ain2 logger seems to be using UDP to write to syslogs, and removing the dgram socket send, causes it not to crash.

@dgrelaud
Copy link

I have the same error on Ubuntu 12.04 LTS x64 (it works on OS X). I have isolated the bug in a small test case: https://gist.github.com/dgrelaud/5872380
The bug appears only on Node.js 0.10.x (not on 0.8.x) when we start a server, use cluster and launch a command with exec...

@goffrie
Copy link
Contributor

goffrie commented Jun 27, 2013

I discovered the cause of our (= @dule's) problem. When a file descriptor is closed, it is not removed from epoll. As described in epoll(7), there are cases in which a file description will stay registered when the file descriptor is closed. If the same fd number is reused in a future socket (specifically, a server socket), then it will receive epoll events for the old file as well, and libuv will not be able to distinguish between the two types of events, because the fds are the same. So we might have e.g. EPOLLOUT events sent to a watcher that only expects EPOLLIN events - which is what causes this assertion failure.

I've created a small testcase here: https://gist.github.com/goffrie/5878871 + its strace. I also wrote a patch that fixes the problem for me (not-complete-fix.patch). However, it needs some work: I don't know what to do about platforms other than Linux. Probably it won't compile because the function I added will be missing.

@dgrelaud's testcase appears to hit a different problem, where uv__server_io expects events always to be EPOLLIN but it might be EPOLLIN|EPOLLHUP. This seems like it's normal behaviour, so I put another patch other-fix.patch in that gist, which relaxes the requirement. I don't know if this is correct though.

@muhammadghazali
Copy link

Hi guys, I experiencing the same issue here when I'm using Node v0.10.12. Like @cmisare said, it seem to be the issue happened occasionally. I experiencing this issue when I'm using https://github.com/yeoman/generator-webapp and the bottom line is it seem the issue originating from the libuv based on the log.

grunt && grunt server
Running "jshint:all" (jshint) task
>> 3 files lint free.

Running "clean:server" (clean) task

Running "concurrent:test" (concurrent) task

    Running "coffee:dist" (coffee) task
    File .tmp/scripts/hello.js created.

    Running "coffee:test" (coffee) task

    Done, without errors.

Running "connect:test" (connect) task
Starting connect web server on localhost:9000.

Running "mocha:all" (mocha) task
Testing: http://localhost:9000/index.html

  ․

  1 test complete (3 ms)

>> 1 passed! (0.00s)

Running "clean:dist" (clean) task
Cleaning ".tmp"...OK
Cleaning "dist/.htaccess"...OK
Cleaning "dist/404.html"...OK
Cleaning "dist/favicon.ico"...OK
Cleaning "dist/index.html"...OK
Cleaning "dist/robots.txt"...OK
Cleaning "dist/scripts"...OK
Cleaning "dist/styles"...OK

Running "useminPrepare:html" (useminPrepare) task
Going through app/index.html to update the config
Looking for build script HTML comment blocks

Found a block:
        <!-- build:css({.tmp,app}) styles/main.css -->
        <link rel="stylesheet" href="styles/main.css">
        <!-- endbuild -->
Updating config with the following assets:
    - {.tmp,app}/styles/main.css

Found a block:
        <!-- build:js scripts/vendor/modernizr.js -->
        <script src="bower_components/modernizr/modernizr.js"></script>
        <!-- endbuild -->
Updating config with the following assets:
    - app/bower_components/modernizr/modernizr.js

Found a block:
        <!-- build:js scripts/main.js -->
        <script src="bower_components/jquery/jquery.js"></script>
        <script src="scripts/main.js"></script>
        <!-- endbuild -->
Updating config with the following assets:
    - app/bower_components/jquery/jquery.js
    - app/scripts/main.js

Found a block:
        <!-- build:js(.tmp) scripts/coffee.js -->
        <script src="scripts/hello.js"></script>
        <!-- endbuild -->
Updating config with the following assets:
    - .tmp/scripts/hello.js

Configuration is now:

  cssmin:
  { 'dist/styles/main.css': 'dist/styles/main.css' }

  concat:
  { 'dist/styles/main.css': [ '{.tmp,app}/styles/main.css' ],
  'dist/scripts/vendor/modernizr.js': [ 'app/bower_components/modernizr/modernizr.js' ],
  'dist/scripts/main.js': 
   [ 'app/bower_components/jquery/jquery.js',
     'app/scripts/main.js' ],
  'dist/scripts/coffee.js': [ '.tmp/scripts/hello.js' ] }

  uglify:
  { 'dist/scripts/vendor/modernizr.js': 'dist/scripts/vendor/modernizr.js',
  'dist/scripts/main.js': 'dist/scripts/main.js',
  'dist/scripts/coffee.js': 'dist/scripts/coffee.js' }

  requirejs:
  {}

Running "concurrent:dist" (concurrent) task

    Running "htmlmin:dist" (htmlmin) task
    File dist/404.html created.
    File dist/index.html created.

    Done, without errors.

    Running "imagemin:dist" (imagemin) task

    Done, without errors.

    Running "coffee:dist" (coffee) task
    File .tmp/scripts/hello.js created.

    Running "coffee:test" (coffee) task

    Done, without errors.

    Running "svgmin:dist" (svgmin) task

    Done, without errors.

    Running "compass:dist" (compass) task

    Running "compass:server" (compass) task

    Done, without errors.

Running "concat:dist/styles/main.css" (concat) task
File "dist/styles/main.css" created.

Running "concat:dist/scripts/vendor/modernizr.js" (concat) task
File "dist/scripts/vendor/modernizr.js" created.

Running "concat:dist/scripts/main.js" (concat) task
File "dist/scripts/main.js" created.

Running "concat:dist/scripts/coffee.js" (concat) task
File "dist/scripts/coffee.js" created.

Running "cssmin:dist/styles/main.css" (cssmin) task
File dist/styles/main.css created.

Running "uglify:dist/scripts/vendor/modernizr.js" (uglify) task
File "dist/scripts/vendor/modernizr.js" created.

Running "uglify:dist/scripts/main.js" (uglify) task
File "dist/scripts/main.js" created.

Running "uglify:dist/scripts/coffee.js" (uglify) task
File "dist/scripts/coffee.js" created.

Running "copy:dist" (copy) task
Copied 3 files

Running "rev:dist" (rev) task
dist/scripts/coffee.js >> 121395a4.coffee.js
dist/scripts/main.js >> 36c8c666.main.js
dist/scripts/vendor/modernizr.js >> f7f27360.modernizr.js
dist/styles/main.css >> f3b12f69.main.css

Running "usemin:html" (usemin) task

Processing as HTML - dist/404.html
Update the HTML to reference our concat/min/revved script files
Update the HTML with the new css filenames
Update the HTML with the new img filenames
Update the HTML with data-main tags
Update the HTML with the data tags
Update the HTML with background imgs, case there is some inline style
Update the HTML with anchors images
Update the HTML with reference in input

Processing as HTML - dist/index.html
Update the HTML to reference our concat/min/revved script files
<script src="scripts/vendor/modernizr.js" changed to <script src="scripts/vendor/f7f27360.modernizr.js"
<script src="scripts/main.js" changed to <script src="scripts/36c8c666.main.js"
<script src="scripts/coffee.js" changed to <script src="scripts/121395a4.coffee.js"
Update the HTML with the new css filenames
<link rel="stylesheet" href="styles/main.css" changed to <link rel="stylesheet" href="styles/f3b12f69.main.css"
Update the HTML with the new img filenames
Update the HTML with data-main tags
Update the HTML with the data tags
Update the HTML with background imgs, case there is some inline style
Update the HTML with anchors images
Update the HTML with reference in input

Running "usemin:css" (usemin) task

Processing as CSS - dist/styles/f3b12f69.main.css
Update the CSS with new img filenames

Done, without errors.
Running "server" task

Running "clean:server" (clean) task
Cleaning ".tmp"...OK

Running "concurrent:server" (concurrent) task

    Running "coffee:dist" (coffee) task
    File .tmp/scripts/hello.js created.

    Done, without errors.

    Running "compass:dist" (compass) task

    Running "compass:server" (compass) task

    Done, without errors.

Running "connect:livereload" (connect) task
Starting connect web server on localhost:9000.

Running "open:server" (open) task

Running "watch" task
Waiting...node: ../deps/uv/src/unix/stream.c:494: uv__server_io: Assertion `events == 1' failed.
Aborted (core dumped)

Btw, here is my dev environment details when this issue happening:

  • Ubuntu 13.04
  • Node v0.10.12

If you guys need any more details, please let me know.

Thanks,
best regards.

@iskitsas
Copy link

iskitsas commented Aug 9, 2013

Using cluster module and forking processes, which created http server per process listening to a port, caused this issue.

Specifically when you set .listen(80), you get this error on init of your cluster.

A workaround was to use setTimeout with 5ms delay when using .listen(PORT_NUM)

@bnoordhuis
Copy link
Contributor

Does anyone have a reduced test case that triggers the assert and doesn't tamper with libuv/node.js internals?

@goffrie is correct that:

When a file descriptor is closed, it is not removed from epoll. As described in epoll(7), there are cases in which a file description will stay registered when the file descriptor is closed. If the same fd number is reused in a future socket (specifically, a server socket), then it will receive epoll events for the old file as well

But that cannot happen unless you start messing with libuv or node.js internals or call fork() yourself. Libuv doesn't dup the file descriptor by itself.

@dgrelaud's testcase appears to hit a different problem, where uv__server_io expects events always to be EPOLLIN but it might be EPOLLIN|EPOLLHUP.

I've never seen this. The only explanation I can think of is a race between a (probably accidental) close() of the listen socket file descriptor in another thread when the event loop has entered epoll_wait(). I wouldn't expect EPOLLIN to be set in that case, though. With what kernel is this?

(For context: the kernel only sets the EPOLLHUP flag when a TCP socket is in the shutdown or closed state. I'm reasonably sure that for listen sockets that only happens when someone calls close().)

@goffrie
Copy link
Contributor

goffrie commented Aug 9, 2013

@bnoordhuis did you look at the Gist I linked? I have an all-Node test case that reliably triggers the bug. The only "internal" it uses is the fd number, which it does for expediency; we have triggered this bug in regular production code, but it only happens on one server, which isn't too surprising given that it relies on a very unlikely fd race. And the patch that I posted does fix our problem (though we're just using node 0.8 for now).

@goffrie
Copy link
Contributor

goffrie commented Aug 9, 2013

One more thing - the cluster module does duplicate fds: the same fd being open in another process is duplicate enough for this bug.

@goffrie
Copy link
Contributor

goffrie commented Aug 9, 2013

Here's a test case that doesn't use any internal-y stuff for its operation, using the cluster module: http://sprunge.us/FPjU

@bnoordhuis
Copy link
Contributor

The only "internal" it uses is the fd number

Well, the fact that it's called sock._handle.fd should be a strong hint that it's not for public consumption. The only reason I added the fd property is to facilitate logging / debugging, it's definitely not intended to be passed around.

the cluster module does duplicate fds

If you mean that the master sends a copy of the file descriptor to worker processes, then that's true. Libuv and node.js go to some lengths to make sure everything cooperates peacefully however. It's possible there's a bug somewhere but I'll need a little more to go on.

As to your patch, I'm afraid it's not suitable for inclusion even if it fixes the issue for you. epoll_ctl() is an expensive system call; calling it indiscriminately has a pretty severe performance impact. Compare the libuv benchmark numbers with and without your patch.

EDIT: That said, keeping track of whether the file descriptor has been shared and deregistering based on that might be an option. That won't fix all cases though.

@bnoordhuis
Copy link
Contributor

Here's a test case that doesn't use any internal-y stuff for its operation, using the cluster module: http://sprunge.us/FPjU

Right, I see what you mean. That's fixed in master actually. It's probably a bug in the v0.10 cluster module. I'll look into it.

@goffrie
Copy link
Contributor

goffrie commented Aug 9, 2013

Compare the libuv benchmark numbers with and without your patch.

https://gist.github.com/goffrie/6195465. I'm not seeing any significant changes.

That said, keeping track of whether the file descriptor has been shared and deregistering based on that might be an option. That won't fix all cases though.

I think this could be reasonable.

That's fixed in master actually. It's probably a bug in the v0.10 cluster module. I'll look into it.

Thanks for looking :) By the way, http://sprunge.us/OZJh is another incarnation of the same bug, that is still failing on the latest version of node (0.11.5). Also, in the original cluster test case, even though the assertion failure doesn't occur, strace output still has thousands of lines like this:

epoll_ctl(6, EPOLL_CTL_DEL, 11, {EPOLLOUT, {u32=11, u64=11}}) = -1 EBADF (Bad file descriptor)
epoll_wait(6, {{EPOLLOUT, {u32=11, u64=11}}}, 1024, -1) = 1
epoll_ctl(6, EPOLL_CTL_DEL, 11, {EPOLLOUT, {u32=11, u64=11}}) = -1 EBADF (Bad file descriptor)
epoll_wait(6, {{EPOLLOUT, {u32=11, u64=11}}}, 1024, -1) = 1
epoll_ctl(6, EPOLL_CTL_DEL, 11, {EPOLLOUT, {u32=11, u64=11}}) = -1 EBADF (Bad file descriptor)
epoll_wait(6, {{EPOLLOUT, {u32=11, u64=11}}}, 1024, -1) = 1
epoll_ctl(6, EPOLL_CTL_DEL, 11, {EPOLLOUT, {u32=11, u64=11}}) = -1 EBADF (Bad file descriptor)
epoll_wait(6, {{EPOLLOUT, {u32=11, u64=11}}}, 1024, -1) = 1
epoll_ctl(6, EPOLL_CTL_DEL, 11, {EPOLLOUT, {u32=11, u64=11}}) = -1 EBADF (Bad file descriptor)
epoll_wait(6, {{EPOLLOUT, {u32=11, u64=11}}}, 1024, -1) = 1

so there's still a problem there.

@c4milo
Copy link

c4milo commented Aug 12, 2013

I just ran across this same issue in node 0.8.25.

epoll_ctl(3, EPOLL_CTL_DEL, 4294967296, {0, {u32=0, u64=4294967296}}) = -1 EBADF (Bad file descriptor)
clock_gettime(CLOCK_MONOTONIC, {6600, 786971786}) = 0
epoll_wait(3, {{EPOLLHUP, {u32=0, u64=4294967296}}}, 64, 3604) = 1
epoll_ctl(3, EPOLL_CTL_DEL, 4294967296, {0, {u32=0, u64=4294967296}}) = -1 EBADF (Bad file descriptor)
clock_gettime(CLOCK_MONOTONIC, {6600, 787245843}) = 0
epoll_wait(3, {{EPOLLHUP, {u32=0, u64=4294967296}}}, 64, 3603) = 1
epoll_ctl(3, EPOLL_CTL_DEL, 4294967296, {0, {u32=0, u64=4294967296}}) = -1 EBADF (Bad file descriptor)
clock_gettime(CLOCK_MONOTONIC, {6600, 787536382}) = 0
epoll_wait(3, {{EPOLLHUP, {u32=0, u64=4294967296}}}, 64, 3603) = 1
epoll_ctl(3, EPOLL_CTL_DEL, 4294967296, {0, {u32=0, u64=4294967296}}) = -1 EBADF (Bad file descriptor)
clock_gettime(CLOCK_MONOTONIC, {6600, 787781665}) = 0
epoll_wait(3, {{EPOLLHUP, {u32=0, u64=4294967296}}}, 64, 3603) = 1
epoll_ctl(3, EPOLL_CTL_DEL, 4294967296, {0, {u32=0, u64=4294967296}}) = -1 EBADF (Bad file descriptor)
clock_gettime(CLOCK_MONOTONIC, {6600, 788172218}) = 0
epoll_wait(3, {{EPOLLHUP, {u32=0, u64=4294967296}}}, 64, 3602) = 1
epoll_ctl(3, EPOLL_CTL_DEL, 4294967296, {0, {u32=0, u64=4294967296}}) = -1 EBADF (Bad file descriptor)
clock_gettime(CLOCK_MONOTONIC, {6600, 788478961}) = 0
epoll_wait(3, {{EPOLLHUP, {u32=0, u64=4294967296}}}, 64, 3602) = 1
epoll_ctl(3, EPOLL_CTL_DEL, 4294967296, {0, {u32=0, u64=4294967296}}) = -1 EBADF (Bad file descriptor)
clock_gettime(CLOCK_MONOTONIC, {6600, 788872028}) = 0
epoll_wait(3, {{EPOLLHUP, {u32=0, u64=4294967296}}}, 64, 3602) = 1
epoll_ctl(3, EPOLL_CTL_DEL, 4294967296, {0, {u32=0, u64=4294967296}}) = -1 EBADF (Bad file descriptor)
clock_gettime(CLOCK_MONOTONIC, {6600, 789178212}) = 0
epoll_wait(3, {{EPOLLHUP, {u32=0, u64=4294967296}}}, 64, 3601) = 1
epoll_ctl(3, EPOLL_CTL_DEL, 4294967296, {0, {u32=0, u64=4294967296}}) = -1 EBADF (Bad file descriptor)
clock_gettime(CLOCK_MONOTONIC, {6600, 789465678}) = 0
epoll_wait(3, {{EPOLLHUP, {u32=0, u64=4294967296}}}, 64, 3601) = 1
epoll_ctl(3, EPOLL_CTL_DEL, 4294967296, {0, {u32=0, u64=4294967296}}) = -1 EBADF (Bad file descriptor)
clock_gettime(CLOCK_MONOTONIC, {6600, 789770466}) = 0
epoll_wait(3, {{EPOLLHUP, {u32=0, u64=4294967296}}}, 64, 3601) = 1
epoll_ctl(3, EPOLL_CTL_DEL, 4294967296, {0, {u32=0, u64=4294967296}}) = -1 EBADF (Bad file descriptor)
clock_gettime(CLOCK_MONOTONIC, {6600, 790086707}) = 0
epoll_wait(3, {{EPOLLHUP, {u32=0, u64=4294967296}}}, 64, 3600) = 1
epoll_ctl(3, EPOLL_CTL_DEL, 4294967296, {0, {u32=0, u64=4294967296}}) = -1 EBADF (Bad file descriptor)
clock_gettime(CLOCK_MONOTONIC, {6600, 790423342}) = 0
epoll_wait(3, {{EPOLLHUP, {u32=0, u64=4294967296}}}, 64, 3600) = 1
epoll_ctl(3, EPOLL_CTL_DEL, 4294967296, {0, {u32=0, u64=4294967296}}) = -1 EBADF (Bad file descriptor)
clock_gettime(CLOCK_MONOTONIC, {6600, 790797412}) = 0
epoll_wait(3, {{EPOLLHUP, {u32=0, u64=4294967296}}}, 64, 3600) = 1
epoll_ctl(3, EPOLL_CTL_DEL, 4294967296, {0, {u32=0, u64=4294967296}}) = -1 EBADF (Bad file descriptor)
clock_gettime(CLOCK_MONOTONIC, {6600, 791163659}) = 0
epoll_wait(3, {{EPOLLHUP, {u32=0, u64=4294967296}}}, 64, 3599) = 1
epoll_ctl(3, EPOLL_CTL_DEL, 4294967296, {0, {u32=0, u64=4294967296}}) = -1 EBADF (Bad file descriptor)
clock_gettime(CLOCK_MONOTONIC, {6600, 791463977}) = 0
epoll_wait(3, {{EPOLLHUP, {u32=0, u64=4294967296}}}, 64, 3599) = 1
epoll_ctl(3, EPOLL_CTL_DEL, 4294967296, {0, {u32=0, u64=4294967296}}) = -1 EBADF (Bad file descriptor)
clock_gettime(CLOCK_MONOTONIC, {6600, 791741386}) = 0
epoll_wait(3, {{EPOLLHUP, {u32=0, u64=4294967296}}}, 64, 3599) = 1
epoll_ctl(3, EPOLL_CTL_DEL, 4294967296, {0, {u32=0, u64=4294967296}}) = -1 EBADF (Bad file descriptor)

@c4milo
Copy link

c4milo commented Aug 12, 2013

Worth saying that the process consumes 100% CPU in this state.

This was referenced Oct 23, 2013
indutny added a commit that referenced this issue Nov 12, 2013
When fd is closed and new one (with the same number) is opened inside
kqueue/epoll/port loop's callback - stale events might invoke callbacks
on wrong watchers.

Check if watcher was changed after invocation and invalidate all events
with the same fd.

fix #826
@subnetmarco
Copy link

Seeing this in node v0.10.26 on Centos:

Linux localhost 2.6.32-431.5.1.el6.x86_64 #1 SMP Wed Feb 12 00:41:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

@indutny
Copy link
Contributor

indutny commented Apr 4, 2014

@thefosk could you please try running strace as suggested by @bnoordhuis and post some data here?

@tsukasagenesis
Copy link

Same problem here unfortunately.
Linux 73118 3.13.0-30-generic #54-Ubuntu SMP Mon Jun 9 22:45:01 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
node v0.10.29 from chris-lea ppa on trusty.

epoll_ctl(5, EPOLL_CTL_MOD, 36, {EPOLLIN, {u32=36, u64=36}}) = 0
epoll_wait(5, {{EPOLLIN, {u32=8, u64=8}}}, 1024, 0) = 1
read(8, "\1\0\0\0\0\0\0\0", 1024)       = 8
epoll_wait(5, {{EPOLLIN, {u32=25, u64=25}}}, 1024, 88) = 1
read(25, "", 65536)                     = 0
futex(0x7fbe8c0008c8, FUTEX_WAKE_PRIVATE, 1) = 1
epoll_ctl(5, EPOLL_CTL_DEL, 25, {EPOLLRDNORM|EPOLLWRNORM|EPOLLHUP|0x1b3d820, {u32=0, u64=122681170965037056}}) = 0
close(25)                               = 0
futex(0xe13d84, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0xe13d80, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0xe13d40, FUTEX_WAKE_PRIVATE, 1)  = 1
write(1, "fd: 24\n", 7)                 = 7
write(1, "11byebye\n", 9)               = 9
epoll_wait(5, {{EPOLLIN, {u32=11, u64=11}}, {EPOLLIN, {u32=8, u64=8}}}, 1024, 76) = 2
accept4(11, 0, NULL, SOCK_CLOEXEC|SOCK_NONBLOCK) = 24
accept4(11, 0, NULL, SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
read(8, "\1\0\0\0\0\0\0\0", 1024)       = 8
write(1, "11close2\n", 9)               = 9
futex(0xe13d84, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0xe13d80, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0xe13d40, FUTEX_WAKE_PRIVATE, 1)  = 1
epoll_ctl(5, EPOLL_CTL_ADD, 24, {EPOLLIN, {u32=24, u64=24}}) = -1 EBADF (Bad file descriptor)
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
tgkill(18944, 18944, SIGABRT)           = 0
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=18944, si_uid=0} ---
+++ killed by SIGABRT (core dumped) +++

Append every-time after a couple of hour :/

@indutny
Copy link
Contributor

indutny commented Jul 23, 2014

This is very very odd... What I do see from your log is:

accept4(11, 0, NULL, SOCK_CLOEXEC|SOCK_NONBLOCK) = 24

and then:

epoll_ctl(5, EPOLL_CTL_ADD, 24, {EPOLLIN, {u32=24, u64=24}}) = -1 EBADF (Bad file descriptor)

Are you sure that are kernel is up-to-date on your server?

@saghul
Copy link
Contributor

saghul commented Jul 23, 2014

Also note these:

write(1, "fd: 24\n", 7)                 = 7
write(1, "11byebye\n", 9)               = 9

Are you sending handles over pipes?

@indutny
Copy link
Contributor

indutny commented Jul 23, 2014

i.e. are you using cluster?

@tsukasagenesis
Copy link

No cluster.
But I find a way to bypass this problem.
In fact, I pipe the user stream to a pipe to throttle piped to a file read. The problem is if I cut with fs.close on the FD that was piped.
I use close() on the filestream instead and I didn't have this problem now, I will try to make a test script.

@indutny
Copy link
Contributor

indutny commented Jul 24, 2014

Thank you!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests