-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Loading big memory snapshot can put processes in uninterruptible sleep #3020
Comments
In order to get a better understanding of the issue, could you please share the guest kernel and rootfs? Also, it would be useful to have a script that replicates your running scenario (starting the processes that you see failing). Could you please, also share some info on the host setup (distro and kernel config, if available)? |
Hi @CompuIves, are you still experiencing this issue? I see that you merged a fix in your fork, has that solved the problem for you? |
Hey! The commit in our fork fixed the issue. That said, we're removing the fix in a future version where we rely on UFFD to handle page faults. We've updated the host kernel to Linux 6 and guest kernel to 5.15 in this scenario, and we cannot reproduce the issue anymore. |
Hi @CompuIves , if you are able to reproduce this issue in currently supported versions of Firecracker and kernels, please feel free to post the results and re-open. |
Describe the bug
Whenever we load a reasonably sized memory snapshot (8GB), which has been running several node processes, we notice that some processes get stuck in an uninterruptible sleep . This both happens with the
uffd
handler and the "default" snapshot loading. They seem to get stuck on a syscall waiting for a page fault:These are some processes in the guest that get stuck:
strace:
No particular
dmesg
logs.We've seen this happen on hosts that have AMD Epyc CPUs. We haven't tested with other CPUs.
Interestingly, this only happens in certain host/guest kernel version combinations.
To Reproduce
Expected behaviour
The processes should continue responding.
Environment
Checks
The text was updated successfully, but these errors were encountered: