Skip to content
This repository has been archived by the owner on May 4, 2018. It is now read-only.

closing a shared socket can lead to 100% CPU usage #1099

Closed
goffrie opened this issue Feb 5, 2014 · 17 comments
Closed

closing a shared socket can lead to 100% CPU usage #1099

goffrie opened this issue Feb 5, 2014 · 17 comments

Comments

@goffrie
Copy link
Contributor

goffrie commented Feb 5, 2014

Another manifestation of #826. When a socket is closed in one process, if it is open in another process, it may stay registered in epoll and thus continously generate junk events that can't be silenced. Reproducible by the following node.js program:

var cluster = require('cluster');

if (cluster.isMaster) {
    console.log("%d: master", process.pid);
    cluster.fork();
} else {
    // Create a new fd. The fd will be shared with the master.
    var sock = require('dgram').createSocket('udp4');
    // Send a packet to get the socket registered in the epoll.
    sock.send(new Buffer(1000), 0, 1000, 3000, '127.0.0.1', end);
    function end() {
        // Close the socket.
        sock.close();
        console.log("%d: worker: done", process.pid);
        // At this point, the socket is not deregistered from the epoll fd,
        // since it is still open - but not in this process.
        // It is still in EPOLLIN|EPOLLOUT mode.
        // It also cannot be removed, since we no longer have an fd for it.
        // The process will continuously receive EPOLLOUT events, causing 100% CPU usage.
    }
}

This patch corrects this behaviour, although as I understand it, it is not acceptable as-is because of the performance of the extra system call.

diff --git a/deps/uv/src/unix/linux-core.c b/deps/uv/src/unix/linux-core.c
index 74294eb..351fdd0 100644
--- a/deps/uv/src/unix/linux-core.c
+++ b/deps/uv/src/unix/linux-core.c
@@ -98,21 +98,11 @@ void uv__platform_loop_delete(uv_loop_t* loop) {


 void uv__platform_invalidate_fd(uv_loop_t* loop, int fd) {
-  struct uv__epoll_event* events;
-  uintptr_t i;
-  uintptr_t nfds;
-
-  assert(loop->watchers != NULL);
-
-  events = (struct uv__epoll_event*) loop->watchers[loop->nwatchers];
-  nfds = (uintptr_t) loop->watchers[loop->nwatchers + 1];
-  if (events == NULL)
-    return;
-
-  /* Invalidate events with same file descriptor */
-  for (i = 0; i < nfds; i++)
-    if ((int) events[i].data == fd)
-      events[i].data = -1;
+  /* Pass in a dummy epoll_event,
+   * to work around a bug in old kernels. */
+  struct uv__epoll_event e;
+  if (loop->backend_fd == -1) return;
+  uv__epoll_ctl(loop->backend_fd, UV__EPOLL_CTL_DEL, fd, &e);
 }
@jwalton
Copy link

jwalton commented Feb 5, 2014

+1

@indutny
Copy link
Contributor

indutny commented Feb 5, 2014

Are you sure that the supplied javascript code sample is reproducing the problem? Seems to be working just fine for me.

Also, I don't understand how this patch is fixing a problem, could you please explain what it does in a little bit more detail?

@goffrie
Copy link
Contributor Author

goffrie commented Feb 5, 2014

Sorry, I should have specified that this only happens on Linux (not sure what system you're on). The testcase works for me (causes 100% cpu usage) on node 0.10.25, on every machine I tried. The problem occurs because the fd stays regstered with epoll despite being closed; the patch deregisters it from the epoll instance entirely.

@trevnorris
Copy link
Contributor

Create a new fd. The fd will be shared with the master.

I don't believe this is any longer the case [1][2]. Or is it only that fd's won't be shared from master to child?

[1] 6abe1e4
[2] nodejs/node-v0.x-archive#6905

@indutny
Copy link
Contributor

indutny commented Feb 5, 2014

@trevnorris this is what actually happening, because .bind() is called internally in .send(). I am able to reproduce the issue, btw.

And I think that this is a kernel bug, will try on a newer linux just to verify it.

@indutny
Copy link
Contributor

indutny commented Feb 5, 2014

Linux 7527bd77-ab3e-474b-ace7-eed6053931e7 3.1.10joyent-ubuntu-10-opt #1 SMP Fri Jan 20 09:55:31 PST 2012 x86_64 GNU/Linux

@bnoordhuis
Copy link
Contributor

And I think that this is a kernel bug

It's not a bug but it's definitely a quirk.

epoll polls on file descriptions, not file descriptors. If the file description is alive because of a file descriptor in another process, then any activity on the file description wakes up epoll_wait() in your process.

The fun thing is, epoll_ctl(EPOLL_CTL_DEL) won't work if your file descriptor is already closed.

You can mitigate it somewhat by switching to edge-triggered mode. That doesn't really solve the issue but at least you won't get repeated wake-ups, just a single one.

@goffrie
Copy link
Contributor Author

goffrie commented Feb 5, 2014

It happens for me on node master, kernel Linux asplode 3.12.5-1-ARCH #1 SMP PREEMPT Thu Dec 12 12:57:31 CET 2013 x86_64 GNU/Linux.

@indutny
Copy link
Contributor

indutny commented Feb 5, 2014

So this is an expected behavior?! The epoll is totally screwed then :) Especially considering that I could open a socket/file with the same fd as of the closed one and receive events for stale closed fd.

Anyway, I think that the only thing we could do is to call EPOLL_CTL_DEL in invalidate. However removing the existing code does seem to be wrong, because it is there for a completely different purpose.

@goffrie if you wish - you could contribute a Pull Request and I'll review it and land it, but if you don't have time for this - just let me know and I'll fix it myself.

@bnoordhuis
Copy link
Contributor

Anyway, I think that the only thing we could do is to call EPOLL_CTL_DEL in invalidate

That will catch most of it. There are some edge cases like dup'ed stdio file descriptors that probably need to be addressed separately. I remember looking into that some months ago but the details are a little fuzzy by now.

@indutny
Copy link
Contributor

indutny commented Feb 5, 2014

Hm... what's up with stdio? Could you please provide additional info? ;)

@goffrie
Copy link
Contributor Author

goffrie commented Feb 5, 2014

I created #1100.

@indutny
Copy link
Contributor

indutny commented Feb 6, 2014

@bnoordhuis yeah, confirming https://gist.github.com/indutny/8842007 . Seems like it is an intended behavior.

@indutny
Copy link
Contributor

indutny commented Feb 6, 2014

And a quote from man page:

       Q6  Will closing a file descriptor cause it to be removed from all epoll sets automatically?

       A6  Yes, but be aware of the following point.  A file descriptor is a reference to an open file description (see open(2)).  Whenever a descriptor is duplicated via dup(2),  dup2(2),  fcntl(2)
           F_DUPFD,  or fork(2), a new file descriptor referring to the same open file description is created.  An open file description continues to exist until all file descriptors referring to it
           have been closed.  A file descriptor is removed from an epoll set only after all the file descriptors referring to the underlying open file description have been closed (or before if  the
           descriptor  is explicitly removed using epoll_ctl(2) EPOLL_CTL_DEL).  This means that even after a file descriptor that is part of an epoll set has been closed, events may be reported for
           that file descriptor if other file descriptors referring to the same underlying file description remain open.

@Raynos
Copy link

Raynos commented Dec 19, 2014

@indutny @goffrie can we back port this to 0.10 ?

@goffrie
Copy link
Contributor Author

goffrie commented Dec 19, 2014

It's in a 0.10 release already, though I can't remember the version number
off hand.
On Dec 18, 2014 5:26 PM, "Raynos (Jake Verbaten)" notifications@github.com
wrote:

@indutny https://github.com/indutny @goffrie
https://github.com/goffrie can we back port this to 0.10 ?


Reply to this email directly or view it on GitHub
#1099 (comment).

@Raynos
Copy link

Raynos commented Dec 19, 2014

@goffrie I found the commit, thank you :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants