Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Child-Process: file-descriptor auto-closes when it shouldn't without triggering close events. #14046

Closed
AjaniBilby opened this issue Jul 3, 2017 · 9 comments
Labels
child_process Issues and PRs related to the child_process subsystem. stdio Issues and PRs related to stdio.

Comments

@AjaniBilby
Copy link

  • Version: 6.11.0
  • Platform: Windows 10.0.15063 (64bit)
  • Subsystem: child_process (spawn component)

I have a master and a child process of which both repeatedly ping each other over fd: 3, and fd:4, however after 1 second the master stops receiving any messages from the child, and no close, end, finish, or error event is ever triggered.

Master:

var spawn = require('child_process').spawn;

var child = spawn(process.execPath, [__dirname+'/child.js'], {
  stdio: ['pipe', 'inherit', 'inherit', 'pipe', 'pipe']
});

child.stdio[3].on('data', function(chunk){
  console.log('Master Replayed to 3');
  child.stdio[3].write('Reply 3');
});

child.stdio[4].on('data', function(chunk){
  console.log('Master Replayed to 4');
  child.stdio[4].write('Reply 4');
});

child.stdio[3].on('close', function(){
  console.log('Master lost connection though fd3');
});
child.stdio[4].on('close', function(){
  console.log('Master lost connection though fd4');
});

child.stdio[3].on('end', function(){
  console.log('Master lost connection though fd3');
});
child.stdio[4].on('end', function(){
  console.log('Master lost connection though fd4');
});

child.stdio[3].on('finish', function(){
  console.log('Master lost connection though fd3');
});
child.stdio[4].on('finish', function(){
  console.log('Master lost connection though fd4');
});

child.stdio[3].on('error', function(err){
  console.log('Master lost connection though fd3', err);
});
child.stdio[4].on('error', function(err){
  console.log('Master lost connection though fd4', err);
});

child.stdio[3].write('ready');
child.stdio[4].write('ready');

console.log('init child');

Child:

var net = require('net');
var pipe3 = new net.Socket({ fd: 3, writeable: true, readable: true});
var pipe4 = new net.Socket({ fd: 4, writeable: true, readable: true});

pipe3.on('data', function(){
  console.log('Child received reply though fd3');
  pipe3.write('auto');
});
pipe4.on('data', function(){
  console.log('Child received reply though fd4');
  pipe4.write('auto');
});

setInterval(function () {
  console.log('Child says hi though fd3');
  pipe3.write('Boo');

  console.log('Child says hi though fd4');
  pipe4.write('Boo');
}, 1000);

pipe3.on('close', function(){
  console.log('Child lost connection to fd3');
});
pipe4.on('close', function(){
  console.log('Child lost connection to fd4');
});

pipe3.on('end', function(){
  console.log('Child lost connection to fd3');
});
pipe4.on('end', function(){
  console.log('Child lost connection to fd4');
});

pipe3.on('finish', function(){
  console.log('Child lost connection to fd3');
});
pipe4.on('finish', function(){
  console.log('Child lost connection to fd4');
});

pipe3.on('error', function(err){
  console.log('FD3:', err);
});
pipe4.on('error', function(err){
  console.log('FD4:', err);
});

Partial log:

...[child and master pinging each other allot]
Child says hi though fd3
Child says hi though fd4
Child says hi though fd3
Child says hi though fd4
...
[forced close process]
@vsemozhetbyt vsemozhetbyt added the child_process Issues and PRs related to the child_process subsystem. label Jul 3, 2017
@AjaniBilby
Copy link
Author

AjaniBilby commented Jul 24, 2017

Tested in latest LTS (6.11.1) and current (8.2.1) still have the problem. However my friend (runs Ubuntu, with node v8.2.1) does not have the same problem.

@gireeshpunathil
Copy link
Member

can't reproduce with the latest. Can you increase the timeout (say 10 sec) to see if internal buffering is an issue?

@AjaniBilby
Copy link
Author

Using 10.0.0 I still get the same problem with that code, however then I up the interval to 10sec it works properly for quite a while. Then it just freezes... :\

I suspect that this may be some weird OS level obscurity, making it difficult to debug.
It has been so long that I have moved on from this, and completely forgot I opened this issue.

Since the problem is still occurring I do not feel comfortable losing this issue, however, I wouldn't expect anyone to focus on it right now due to its niche.

@gireeshpunathil
Copy link
Member

thanks for clarifying. I will see what happens underneath and let you know.

@gireeshpunathil
Copy link
Member

In Linux everything works fine, the fd numbers seem to map correctly.

[pid  4516] socketpair(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0, [11, 12]) = 0
[pid  4516] socketpair(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0, [13, 14]) = 0
[pid  4516] socketpair(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0, [15, 16]) = 0
...

[pid  4522] dup2(12, 0)                 = 0
[pid  4522] close(11)                   = 0
[pid  4522] dup2(14, 3)                 = 3
[pid  4522] close(13)                   = 0
[pid  4522] dup2(16, 4)                 = 4
[pid  4522] close(15)                   = 0
...
[pid  4522] write(3, "auto", 4)         = 4
[pid  4516] <... epoll_wait resumed> {{EPOLLIN, {u32=13, u64=13}}}, 1024, -1) = 1
[pid  4516] read(13, "auto", 65536)     = 4
[pid  4522] read(4, "ready", 65536)     = 5
[pid  4522] write(11, "Child received reply though fd4\n", 32 <unfinished ...>
[pid  4516] write(9, "Master Replayed to 3\n", 21 <unfinished ...>
[pid  4516] write(13, "Reply 3", 7 <unfinished ...>
[pid  4522] write(4, "auto", 4 <unfinished ...>
...

In Windows however, no data is transported exept the initial handshake. I don't know about a similar tracing mechanism in Windows so don't know what is happening inside.

Somewhat simplified test case:

$ cat 14046.js

var spawn = require('child_process').spawn
var net = require('net')

if(process.argv[2] === 'child') {
var count = 0
var p1 = new net.Socket({ fd: 3, writeable: true, readable: true})
var p2 = new net.Socket({ fd: 4, writeable: true, readable: true})
p1.on('data', (d) => {
  console.log(`p1 data: ${d.toString()}`)
})
p2.on('data', (d) => {
  console.log(`p2 data: ${d.toString()}`)
})

setInterval(function () {
  p1.write('p1')
  p2.write('p2')
}, 10000)
  
} else {
var child = spawn(process.execPath, [process.argv[1], 'child'], 
  {stdio: ['pipe', 'inherit', 'inherit', 'pipe', 'pipe']}
)

child.stdio[3].on('data', (d) => {
  console.log(`child data: ${d.toString()}`)
  child.stdio[3].write('olleh')
})
child.stdio[4].on('data', (d) => {
  console.log(`child data: ${d.toString()}`)
  child.stdio[4].write('olleh')
})

child.stdio[3].write('start')
child.stdio[4].write('start')
}

runs good on mac and linux, but data do not transfer in windows.
if the second fd (4) is not used, everything works fine.
changing the interval does not seem to make anything better.

I am just wondering whether the random selection of fd 3 and 4 were correct in the first place or not. How do we know what fd number is used internally? Doc does not seem to be very explicit in that regard.

/cc @nodejs/child_process

@bnoordhuis
Copy link
Member

I am just wondering whether the random selection of fd 3 and 4 were correct in the first place or not. How do we know what fd number is used internally?

On Windows you don't; the test looks flawed to me. POSIX systems (Linux, macOS) open new fds at the lowest available slot but Windows is not POSIX and doesn't have such semantics.

@gireeshpunathil
Copy link
Member

oh, yes! that fills all the gap, thanks!

But then how, given the parent code, a child will handle this? there seem to be no way for identifying what descriptors the pipe has opened for it.

@bnoordhuis
Copy link
Member

You can't right now but perhaps libuv could map the handles to CRT file descriptors in the child process. Completely untested but it would look something like this:

diff --git a/deps/uv/src/win/process-stdio.c b/deps/uv/src/win/process-stdio.c
index 032e30935c..2a2afa19f3 100644
--- a/deps/uv/src/win/process-stdio.c
+++ b/deps/uv/src/win/process-stdio.c
@@ -470,9 +470,13 @@ void uv__stdio_noinherit(BYTE* buffer) {
   count = CHILD_STDIO_COUNT(buffer);
   for (i = 0; i < count; i++) {
     HANDLE handle = CHILD_STDIO_HANDLE(buffer, i);
-    if (handle != INVALID_HANDLE_VALUE) {
-      SetHandleInformation(handle, HANDLE_FLAG_INHERIT, 0);
+    if (handle == INVALID_HANDLE_VALUE) {
+      continue;
+    }
+    if (i > 2) {
+      _open_osfhandle((intptr_t) handle, _O_RDWR);
     }
+    SetHandleInformation(handle, HANDLE_FLAG_INHERIT, 0);
   }
 }
 

@jasnell jasnell added the stdio Issues and PRs related to stdio. label Jun 25, 2020
@AjaniBilby
Copy link
Author

It's been 2yrs, I presume this is fixed, otherwise someone else would have ran into it and added something here - or created a new issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
child_process Issues and PRs related to the child_process subsystem. stdio Issues and PRs related to stdio.
Projects
None yet
Development

No branches or pull requests

5 participants