Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot be killed after running for a while on Linux 6.1.113 or 6.6.57 #120

Closed
UlyssesZh opened this issue Oct 30, 2024 · 7 comments
Closed

Comments

@UlyssesZh
Copy link

UlyssesZh commented Oct 30, 2024

I tried Linux kernel 6.1.113, 6.1.112, 6.6.56, 6.6.57 on NixOS 24.05 with arRPC 3.5.0 and Node.js 20.17 (actually the bug also exists on arRPC 3.4.0 and Node.js 20.15; it may be present in other versions as well).

I found that in Linux 6.1.113 or 6.6.57, after running arRPC for a while (usually more than 10 minutes; longer time has higher probability of triggering the bug), it cannot be killed, even with kill -9. This causes the computer to be stuck at shutting down, which is pretty bad. This bug is not present in Linux 6.1.112 or 6.6.56.

It is hard to tweak the kernel version on distros other than NixOS, so I didn't try on other distros. Maybe I can try on Gentoo, though I have never used Gentoo.

I guess this may be a kernel bug or a Node.js bug, but I haven't met with other new bugs with other applications after upgrading the kernel.

@GsakuL
Copy link

GsakuL commented Oct 31, 2024

I had this Issue since a few days, but just had time to investigate.
For what it's worth, I'm using the home manager services.arrpc.enable, but that's just a simple systemd-unit wrapper.
When someone whats to dig deeper:
I currently updated my nixos-unstable (main) channel to rev: 807e9154dcb16384b1b765ebe9cd2bba2ac287fd but it was also present in rev: 32e940c7c420600ef0d1ef396dc63b04ee9cad37.
My current kernel is 6.6.58

Since the arRPC package in nixpkgs is still on 3.4.0 I also think this is not a recent change to arRPC.
Funny thing tho, that you also use NixOS. So I think is may have something to do with Nix's version of node. Maybe they use the wrong one for building the arRPC package now by accident because they changed the default or whatever? But i'm not firm in digging in the nixpkgs repo/source.

For illustrative purposes, here the messages after shutdown:
grafik
grafik
the second line appears one more time after some (unmeasured) delay
And yes, that PID did corresponded to "node arrpc".
I (independently) also tried killing the process (also -9) but it did not respond to any of those.

@UlyssesZh
Copy link
Author

So I think is may have something to do with Nix's version of node. Maybe they use the wrong one for building the arRPC package now by accident because they changed the default or whatever? But i'm not firm in digging in the nixpkgs repo/source.

I have already investigated that. The answer is that it is probably not the problem of Node. I tried the same arRPC from exactly the same rev of nixpkgs on different Linux kernel version (so exactly the same Node and the same arRPC), and the result is different.

@UlyssesZh
Copy link
Author

UlyssesZh commented Oct 31, 2024

Funny thing tho, that you also use NixOS.

I think it's just because NixOS is the only popular distro with which you can end up with a very new patch version of an old kernel. Other distros either tend to use the newest kernel/LTS kernel or tend to use an old kernel version. This is why I wanted to see whether I can reproduce this bug on Gentoo: I can control the kernel version on Gentoo. However, Gentoo is a painful distro to set up, and I have never touched Gentoo before, and I don't have enough time recently for that. If you know a better distro that I can use to test this bug, please suggest.


Edit: I found out that Arch's linux-lts also uses kernel 6.6.58. I can try that.

@UlyssesZh
Copy link
Author

Not reproduced on Arch Linux VM with Linux 6.6.58.

@Ciflire
Copy link

Ciflire commented Nov 2, 2024

this made my computer reload every 15 or so minutes
this is the relevant part of journalctl

nov. 02 11:55:48 vivobook14 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nov. 02 11:55:48 vivobook14 kernel:       Not tainted 6.6.58 #1-NixOS
nov. 02 11:55:48 vivobook14 kernel: INFO: task node:2261 blocked for more than 245 seconds.

it was repeating this every few minutes untill it reaches the point where the kernel decides to force reboot if i read the journals correctly
It is also probably blocking some HM rebuild too

@UlyssesZh
Copy link
Author

OK, I think this is not a bug of arRPC. My npm install command for a totally irrelevant project got stuck too.

@UlyssesZh
Copy link
Author

Reproducing example:

import { readdir, readFile } from "fs/promises";
let i = 0;
setInterval(async () => {
  console.log(i++);
  (await readdir("/proc")).forEach(pid => +pid > 0 && readFile(`/proc/${pid}/cmdline`, "utf8"));
}, 5000);

Not a bug of arRPC. Closing.

@UlyssesZh UlyssesZh closed this as not planned Won't fix, can't repro, duplicate, stale Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants