-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ptys unexpectedly interrupted and exit early in GitHub Actions, Linux with io_uring enabled #630
Comments
node-pty is only really tested currently with node up to v18 as it's what vscode ships as part of its Electron builds. |
This bug has also been significantly impacting us on cocalc.com, and I made an issue in our repo about it here with some more details, including some strace output: sagemathinc/cocalc#6963 In particular, node-pty has this bug on Node.js 18.18.0 (released about 2 weeks ago), but not with Node 18.17.1 (the previous version). My guess is you'll hit this with vscode in December (?), when they might upgrade to Node v18.18. Right now electron is on v18.16, I think. |
I can confirm it happens on 18.18.0: https://github.com/lydell/node-pty-bug/actions/runs/6415765977 (As you can see, that was on attempt 2 of that run – first time it didn’t happen. Seems to not be 100 % reproducible. Also note how 20.8.0 succeeded there, but fails here: https://github.com/lydell/node-pty-bug/actions/runs/6415782042/job/17418282266.) |
Thanks for testing on 18.18.0.
For me at least this problem really is 100% reproducible via the exact same steps. What it takes to reproduce it is extremely weird, since it involves doing something once, then restarting a server, then doing the exact same thing again and then it fails 100% after that. It should be the other way around, where restarting the server should clear any state and make the problem go away, so I'm extremely puzzled by this bug. I also have no idea if this is a bug in nodejs or in node-pty. It seems more likely to be a nodejs bug, but I didn't find anything clearly relevant when searching their issue tracker. I tried reverting to older versions of node-pty and they also exhibit this behavior. The changelog from 18.17.x --> 18.18.0 includes many updates of dependencies, and also three changes to child_process. This change looks potentially relevant: They rewrote bits of code specifically related to how they handle subprocess termination in these changes, and when I reproduce this bug, I basically see:
So I bet one of those child_process commits related to subrocesses exiting introduced a bug in node.js. |
Please set I'm debugging for this, yeah it's 100% reproducible, the core path is: enable io_uring in libuv, create pty1, create pty2, destory pty1. The order is important, destory by stack order will not hit this bug. |
Thanks for finding that!
|
Is there anyone who has managed to make a shorter reproduction than mine, that also reproduces outside of GitHub Actions? I think having such a reproduction here would increase the chances of fixing this. |
Here is my conclusion of the issue. I’m not 100 % certain I’m right, but I think I have a plausible explanation.
I looks like there is nothing node-pty can do about this currently. I think it’s better to wait for Node.js to keep working on their |
Environment details
Summary of the below updates: If Node.js has io_uring enabled, the bugs occurs.
Update: It works with Node.js 18! https://github.com/lydell/node-pty-bug/actions/runs/6283383630/job/17063792075
And 19: https://github.com/lydell/node-pty-bug/actions/runs/6283390976/job/17063807172
And 20.0.0: https://github.com/lydell/node-pty-bug/actions/runs/6283396940/job/17063819451
So it must have been introduced in some 20.x release … the search continues.
Aha! It’s Node.js 20.3.0 that introduced it: https://github.com/lydell/node-pty-bug/actions/runs/6283409434
Switched the child process from
node
tobash
and it got a bit flaky: https://github.com/lydell/node-pty-bug/actions/runs/6283443471/attempts/1 vs https://github.com/lydell/node-pty-bug/actions/runs/6283443471/attempts/2Update: Can happen on Node.js 18.18.0 too: https://github.com/lydell/node-pty-bug/actions/runs/6415765977
Issue description
When running ptys in parallel, some of them unexpectedly receive “signal 1” (SIGHUP?) and exit early.
This happens in ubuntu-latest on GitHub Actions. But not in macOS-latest. And not my own Ubuntu machine (and not on my own macOS machine either).
I made a minimal repo to show it happening: https://github.com/lydell/node-pty-bug
Example failing output (ubuntu-latest): https://github.com/lydell/node-pty-bug/actions/runs/6283275300/job/17063579148
Example successful output (macOS-latest): https://github.com/lydell/node-pty-bug/actions/runs/6283275300/job/17063579184
This is so strange! I wonder what’s special about Linux on GitHub Actions that makes this happen. Any ideas?
The text was updated successfully, but these errors were encountered: