Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

child_process.fork() and read from parent process.stdin causes 100% CPU usage #6271

Closed
avz opened this issue Sep 26, 2013 · 6 comments
Closed

Comments

@avz
Copy link

avz commented Sep 26, 2013

This simple script causes 100% CPU usage by parent process after reaching stdin's EOF:

if(process.argv[2] === 'child') {
    console.log('child');

    setInterval(function() {}, 1000); // just sleep
} else {
    console.log('parent');

    process.stdin.on('readable', function() {
        process.stdin.read(); // read stdin until EOF
    });

    require('child_process').fork(__filename, ['child']);
}
% echo 1 | node bug.js

Checked on v0.10.12 (arch and ubuntu), v0.10.17 (arch and ubuntu) and v0.10.19 (arch)

@bnoordhuis
Copy link
Member

Confirmed, thanks. I think we're looking at two things here:

a) inadvisable behavior in node (implicitly allowing fd 0 to get closed), and
b) an honest-to-$DEITY kernel bug. epoll_wait() keeps reporting EPOLLHUP for fd 0 but subsequently calling epoll_ctl(EPOLL_CTL_DEL) fails with EBADF. Here is what the strace output looks like:

epoll_ctl(5, EPOLL_CTL_ADD, 0, {EPOLLIN, {u32=0, u64=0}}) = 0
epoll_wait(5, {{EPOLLIN|EPOLLHUP, {u32=0, u64=0}}}, 1024, -1) = 1
read(0, ".\n", 65536)                   = 2
epoll_ctl(5, EPOLL_CTL_MOD, 0, {EPOLLIN, {u32=0, u64=0}}) = 0
epoll_wait(5, {{EPOLLHUP, {u32=0, u64=0}}}, 1024, -1) = 1
read(0, "", 65536)                      = 0
close(0)                                = 0
epoll_wait(5, {{EPOLLHUP, {u32=0, u64=0}}}, 1024, -1) = 1
epoll_ctl(5, EPOLL_CTL_DEL, 0, {EPOLLHUP, {u32=0, u64=0}}) = -1 EBADF (Bad file descriptor)
epoll_wait(5, {{EPOLLHUP, {u32=0, u64=0}}}, 1024, -1) = 1
epoll_ctl(5, EPOLL_CTL_DEL, 0, {EPOLLHUP, {u32=0, u64=0}}) = -1 EBADF (Bad file descriptor)
epoll_wait(5, {{EPOLLHUP, {u32=0, u64=0}}}, 1024, -1) = 1
epoll_ctl(5, EPOLL_CTL_DEL, 0, {EPOLLHUP, {u32=0, u64=0}}) = -1 EBADF (Bad file descriptor)
# etc.

A simple test case:

#include <assert.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/epoll.h>
#include <unistd.h>

#define E(x) do { errno = 0; { x; } if (errno) perror(#x), exit(1); } while (0)

int main(void) {
  struct epoll_event e;
  char buf[32];
  ssize_t n;
  int epfd;
  int fd;

  fd = 0;
  E(epfd = epoll_create1(0));
  e.events = EPOLLIN;
  e.data.u64 = fd;
  E(epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &e));
  E(n = epoll_wait(epfd, &e, 1, -1));
  assert(n == 1);
  assert(e.data.u64 == fd);
  do
    E(n = read(fd, buf, sizeof(buf)));
  while (n != 0);
  close(fd);

  for (;;) {
    E(n = epoll_wait(epfd, &e, 1, -1));
    assert(n == 1);
    assert(e.events == EPOLLHUP);
    assert(e.data.u64 == fd);
    if (epoll_ctl(epfd, EPOLL_CTL_DEL, fd, &e))  // fails with EBADF
      perror("epoll_ctl(EPOLL_CTL_DEL)");
  }

  return 0;
}

The call to close() is supposed to remove the file descriptor from the epoll set but it doesn't, doesn't matter if fds 1 and 2 are closed or redirected to something else (i.e. so they don't share a file description with fd 0.)

Interestingly, the EPOLLHUP/EBADF busy loop only happens when stdin is a pipe. When it's a tty, fd 0 keeps generating EPOLLIN events after the close().

I guess I'll have to recheck with the latest mainline (my current kernel is 3.10.10) and report it to the LKML. Working around this in node.js should be relatively straightforward:

a) we shouldn't allow fds 0-2 to get closed, and
b) libuv could add fds 0-2 in EPOLLET or EPOLLONESHOT mode

@ghost ghost assigned bnoordhuis Sep 27, 2013
@bnoordhuis
Copy link
Member

Confirmed to also happen with 3.12-rc3.

bnoordhuis added a commit to joyent/libuv that referenced this issue Oct 1, 2013
Ensure that close() system calls don't close stdio file descriptors
because that is almost never the intention.

This is also a partial workaround for a kernel bug that seems to affect
all Linux kernels when stdin is a pipe that gets closed: fd 0 keeps
signalling EPOLLHUP but a subsequent call to epoll_ctl(EPOLL_CTL_DEL)
fails with EBADF.  See nodejs/node-v0.x-archive#6271 for details and a test case.
@bnoordhuis
Copy link
Member

After thinking it through some more, I've come to the conclusion that it's not an out-and-out kernel bug but an epoll design flaw. epoll_wait() reports events for file descriptions, not file descriptors. Closing the file descriptor in our process doesn't necessarily close the file description because other processes may still have open file descriptors that refer to the same file description.

In other words, this busy loop behavior is not unique to stdio file descriptors. I don't think we can solve this in a general way in v0.10 but maybe we can add a special case for stdio fds because those are most prone to triggering it (and because closing fds 0-2 is usually a bad idea anyway.)

In master, this is probably best mitigated (but not fully solved) by switching to edge-triggered I/O. Guess I have a motivation now for finishing up my patch from November 2012.

When I say 'not fully solved', I mean that it's still possible to get spurious wakeups (see example below) but libuv is equipped to deal with that and at least you won't get flat-out busy loops that way because the kernel will report the event only once.

#include <assert.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/epoll.h>
#include <unistd.h>

#define E(x) do { errno = 0; { x; } if (errno) perror(#x), exit(1); } while (0)

int main(void) {
  int pipefd[2];
  pid_t pid;

  E(pipe(pipefd));
  E(pid = fork());
  if (pid == 0) {
    struct epoll_event e;
    int epfd;
    int n;
    E(epfd = epoll_create1(EPOLL_CLOEXEC));
    e.events = EPOLLIN | EPOLLOUT | EPOLLET;
    e.data.fd = pipefd[0];
    E(epoll_ctl(epfd, EPOLL_CTL_ADD, pipefd[0], &e));
    E(close(pipefd[0]));
    E(close(pipefd[1]));
    for (;;) {
      E(do
          n = epoll_wait(epfd, &e, 1, -1);
        while (n == -1 && errno == EINTR));
      assert(n == 1);
      printf("child wakeup %d %x\n", e.data.fd, e.events);
    }
  } else {
    char buf[1];
    ssize_t n;
    for (;;) {
      E(do
          n = write(pipefd[1], "", 1);
        while (n == -1 && errno == EINTR));
      usleep(25e4);
      E(do
          n = read(pipefd[0], buf, sizeof(buf));
        while (n == -1 && errno == EINTR));
    }
  }

  return 0;
}

When run, it keeps printing child wakeup 3 1 ad infinitum because the parent process keeps generating events that the child process cannot process or disable because it no longer has an open file descriptor referencing the pipe.

generalhenry added a commit to CodeNow/node that referenced this issue Dec 7, 2013
Ensure that close() system calls don't close stdio file descriptors
because that is almost never the intention.

This is also a partial workaround for a kernel bug that seems to affect
all Linux kernels when stdin is a pipe that gets closed: fd 0 keeps
signalling EPOLLHUP but a subsequent call to epoll_ctl(EPOLL_CTL_DEL)
fails with EBADF.  See nodejs#6271 for details and a test case.
@jwalton
Copy link

jwalton commented Jul 28, 2014

an honest-to-$DEITY kernel bug. epoll_wait() keeps reporting EPOLLHUP for fd 0 but subsequently calling epoll_ctl(EPOLL_CTL_DEL) fails with EBADF.

This is not a kernel bug. From epoll(7):

   Q6  Will closing a file descriptor cause it to be removed from all
       epoll sets automatically?

   A6  Yes, but be aware of the following point.  A file descriptor is a
       reference to an open file description (see open(2)).  Whenever a
       descriptor is duplicated via dup(2), dup2(2), fcntl(2) F_DUPFD,
       or fork(2), a new file descriptor referring to the same open file
       description is created.  An open file description continues to
       exist until all file descriptors referring to it have been
       closed.  A file descriptor is removed from an epoll set only
       after all the file descriptors referring to the underlying open
       file description have been closed (or before if the descriptor is
       explicitly removed using epoll_ctl(2) EPOLL_CTL_DEL).  This means
       that even after a file descriptor that is part of an epoll set
       has been closed, events may be reported for that file descriptor
       if other file descriptors referring to the same underlying file
       description remain open.

See also:

@indutny
Copy link
Member

indutny commented Jul 28, 2014

@jwalton I agree, but why resurrecting the old issue?

@indutny indutny closed this as completed Jul 28, 2014
@jwalton
Copy link

jwalton commented Jul 28, 2014

I didn't reopen it; it was open when I got here. :) I commented on it because we're seeing an epoll busy loop today, so I was looking for open related bugs. Although, I just realized @goffrie's fix made it into Node 0.10.27, and we're still on 0.10.25. :( So you can safely ignore me. :P

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants