-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trying to find the kernel commit which makes WSL non-responsive #1
Comments
Installed v5.4-4535-g9a3d7fd275b on my laptop this morning and hibernated it while traveling to my office. After about an hour of use after coming out of hibernation, I hit the unresponsive/high-CPU-usage issue and needed to kill WSL service to recover. |
@unwiredben Thanks a lot for testing it out, that is really helpful! Can you also check if v5.4 is working for you? |
I just switched over to 5.4 and will report back in a few days unless I see if hang first. |
@carlfriedrich, nice setup you have here! :) |
Well, then one might say the update fixed the issue for you. 😋 |
So far, no hangs with 5.4 across three hibernate cycles. |
Which CPU do you guys use, AMD or INTEL? |
switched to 5.4 today |
@mannfuri Thanks for your feedback. That's quite interesting, actually. I am on Intel on both my work and my home machine, and I get the issue on both. So AMD vs. Intel does not seem to be responsible for the issue to appear. I remember someone reporting in the upstream issue, that they also get the issue on ARM. |
I've just had the usual hang with the current 5.15 kernel version today. I'm keen to help with this effort and have switched to 5.4.0 just now. I'll give that a few days before moving on to v5.4-4535 |
@tobyvinnell Great, thanks a lot for your help! |
Still no freezing with 5.4. Just to add to the platform discussion, I'm using a Dell Latitude 7430 with an Intel i7-1270P. |
For kernel v5.4-4535-g9a3d7fd275b, I get the following error message when trying to start WSL:
Anyone else facing the same issue? |
@aquohn I might have seen something similar, but in my case it worked the second time I tried to start WSL. Is this reproducible for you? |
@aquohn Just checked again: Yes, I get the same message, but calling |
FYI: I have been running v5.4 for over a week now on both my work and home machine without any hangs, and since nobody else reported a hang so far, I am marking it as "good" in the issue description. I also added a column with the number of good/bad reports for each kernel version, just to keep track of on how much feedback we based the decision. So please keep reporting your experiences, even if we already have marked a version as "good" or "bad". I will switch to v5.4-4535-g9a3d7fd275b now. |
This is like the higgs boson search 🙂 |
@mungojam I am quite optimistic that we will need less than 40 years for this. :-) |
Hallo. I've been tracking the Interrupt storm issue for a while now. Due to some unrelated stuff, I needed to reinstall my distro and do a complete setup from scratch. Since I needed complete systemd to have proper lvm mounting on boot I installed XanMod Kernel - 5 days+ no issues with hangs and CPU usage. Would any of you be willing to give it a test run for a couple of days? |
@seebeen Interesting project, haven't heard of that before. We're trying to bisect to a certain commit here, though, so while trying some other kernel images might be interesting in general, it will not help with the progress of this work. |
Hi @carlfriedrich, sometimes hibernate does work for me. Last time after successful return from hibernation WSL with v5.4-4535-g9a3d7fd275b kernel has hanged. |
Somebody in the other thread observed that windows sometimes seems to start in an immune state, and other times not, so make sure you are restarting as well as hibernating when testing a given version. Sorry I haven't got the space to help with this search. |
That was me, haha :) |
@carlfriedrich Unfortunately even with four consecutive
lines. However, with the 5.4 kernel, I boot on the second |
@onereal7 Thanks for your feedback! We have two reports who had the issue with v5.4-4535-g9a3d7fd275b now, so I marked it "bad" in the issue description and continued the bisection. Next test candidate is v5.4-2622-g386403a115f. I just switched to this version and will test it through the next days. @aquohn Can you check if your boot issue also appears with this version? I encountered it again as well for like 2-3 times when trying to boot the new candidate, but on the next try it worked. Don't know why this happens, though. @ everyone: please keep reporting your experiences with all prvious versions as well. The more data we have, the better. |
I'll switch to that shortly. I never had any suspend issue with 5.4.0 but did have a problem using Docker Desktop with it because a /proc/sys/vm/compaction_proactiveness was missing on that build. Will check to see when that setting was enabled for current WSL kernels. |
Glad to hear that! Let me know if I can be useful in any way. |
I have also been using kernel version "5.10.102.2-microsoft-standard-WSL2" for several days now. Hibernated several times but so far no issues. I will now test the latest linux-msft-wsl-5.15.153.2 and see how it goes. |
So far no issues with linux-msft-wsl-5.15.153.2 after a week |
Hi - been following your excellent efforts for some time, and decided to join in on the testing. |
Tested "linux-msft-wsl-5.15.153.2" for over a week now. So far no issues. Since we seem to have found the commit which was causing the problem, is there a plan to share it with Microsoft team so that they can integrate it into their official releases. Or is more testing needed? |
I've been in contact with the Microsoft people on the Hyper-V and WSL teams about the issue. They are aware of the situation and the relationship between the Linux commit and the underlying root cause, which is in Hyper-V. I'm expecting an update from them on how they want to proceed. Many people extended the U.S. July 4th public holiday last week into a longer vacation, so I expect progress has been slowed by people being out. |
Had the same issue, kernel linux-msft-wsl-5.15.153.2 now runs since a couple of days flawlessly. |
I'm trying 5.15.153.2 now, but I'm already seeing an improvement. On the old version (5.15.153.1) I was getting spammed by the warning below constantly, every 6 seconds. With the custom .2 version, I am no longer getting that error. I have no idea if it's related, but if it's not causation, at least it's correlation, which is almost as good ;)
|
@carlfriedrich, I also confirm that linux-msft-wsl-5.15.153.2 is running smoothly for 3 weeks with multiple hibernations and restarts |
Have been using 5.15.153.2 for several more weeks now, through multiple hibernations. No issues. :) |
Hello @kelleymh, |
I guess moving to Linux completely solves this
…Sent from my iPhone
On 8 Aug 2024, at 09:44, onereal7 ***@***.***> wrote:
Hello @kelleymh,
maybe you have any news about this issue? :)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.
|
No news. :-( On Monday, I ping'ed my former colleagues on the Hyper-V and WSL teams again because I hadn't heard anything from them for a while regarding this issue. But they are still looking at the best way to proceed. I'll be a little more proactive in following up. |
Is anyone else here running into issues using specific kernels? An issue was reported at docker/for-win#14240 that might impact the ability to use custom kernels to address the issue reported here. |
Emacs tiene algunos problemas con esta configuración, pero funciona razonablemente bien. Problemas conocidos: - El desempeño del programa cae con el tiempo. Esto se observa, por ejemplo, en la demora de Emacs para desplegar en pantalla los caracteres ingresados con el teclado. - Emacs suele dejar de responder luego de una hibernación del computador. El primer problema se puede solucionar parcialmente al reiniciar el programa. Pero, a medida que pase el tiempo, el desempeño de Emacs volverá a caer. Si bien el profiler de Emacs no ha ayudado a aislar el código que está dando problemas, es razonable suponer que estamos usando uno o más paquetes con bugs. La desventaja de usar straight.el para administrar paquetes es que no podemos seleccionar sus versiones estables de forma automática. Esto se debe hacer de forma manual y no hemos hecho dicha verificación. Otra causa para la caída de desempeño podría encontrarse en la versión de Emacs que está disponible en openSUSE-Tumbleweed. Esta es la única distribución de Linux en WSL en la que hemos probado esta configuración. Otros bugs que ya no son reproducibles con ella fueron solucionados actualizando el programa. El segundo problema se debe a un bug en el kernel más actual de WSL. Ver la siguiente discusión para saber más sobre él: carlfriedrich/wsl-kernel-build#1 Entorno: - openSUSE-Tumbleweed corriendo en WSL versión 2 (Windows 10 actualizado). - Kernel: Linux DESKTOP-NULNQSE 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux. - GNU Emacs 29.4 (build 2, x86_64-suse-linux-gnu, GTK+ Version 3.24.43, cairo version 1.18.0) - Org mode 9.7-pre (release_9.6.25-1345-gb45b39) Esta versión de Org mode es vulnerable a ataques: permite la ejecución arbitraria tanto de código Lisp como de comandos shell. La versión 9.7.5 de Org no cuenta con estos problemas de seguridad. Este commit incluye el lockfile de straight.el necesario para reproducir el estado de cada uno de los paquetes instalados. También incluye early-init.el, archivo auxiliar que nos permite configurar Emacs para que solo utilice straight.el como administrador de paquetes.
Yes, there's now some specific action underway from the Microsoft folks to provide a resolution. I have suggested that they might want to update this thread and the WSL Issue #6982 thread when they are ready, which will hopefully be in the next few weeks. |
I never managed to get hibernate on Linux (Kubuntu), is it even a thing on other Linuxes? |
just ran |
I tried the kernel that was included with the latest WSL release (2.3.24), and I had the problem again so I reverted to linux-msft-wsl-5.15.153.2 |
@Crypto-Spartan As noted in this issue's description, 5.15.153.2 is a custom build from here which we built as a community effort in order to verify the bad commit and provide a quick fix to the affected users. The version will not go upstream and hence not be available via |
Ah, I see that now. Thank you for the explanation/clarification. Apologies, I should have read more carefully. |
Tested & deployed the patched kernel one week ago with latest wsl version and custom config for 3 users in my company that experienced the hibernate/unlock/lock problem with WSL2 almost each time. No issues, I think we can confirm the root cause is found. |
I can't wait for this to be deployed in Windows 12. |
@carlfriedrich by the way, do you have any .patch file or something so we can patch automatically in a CI/CD the kernel/commit? it seems to not be as direct as I though, conflicts & stuff. thanks |
@borjamunozf You can generate the patch from the commit on v5.15. It did, however, not apply cleanly on each kernel version. For some versions I had to resolve conflicts manually. v5.15 is the latest version I applied the patch on. If you port it to newer versions, feel free to open a PR on my WSL kernel fork, then I can provide a release here as well. |
We're trying to find the kernel commit which makes WSL non-responsive after hibernation, which is described in the issues microsoft/WSL#8696 and microsoft/WSL#6982.
Our starting point
Bisecting the kernel
We have about 13,000 commits between v5.4 and v5.5-rc1. Using
git bisect
we should be able to track down the commit introducing the issue within 14 rounds. As a start, I have built the start and end versions and one in between. I will update this table as soon as the versions are confirmed to be working or non-working and add new versions as I continue the bisection. The links in the table lead to the release page for the corresponding version where you can download the kernel image.How you can help
uname --kernel-release
in your comment.I will wait for a reasonable number of reports for each version, so even if somebody else reported a working or non-working version before, please do report your experience as well.
How you cannot help
We're not looking for any workarounds or environment information related to the issue here. I am not a Microsoft developer, so I am not debugging the issue or collecting any information to help solving it.
If you want to share any information of this kind, please do so in one of the upstream issues.
Thanks a lot for your help in advance. 💚
Update
We have found the kernel commit introducing the issue:
Merge commit:
microsoft/WSL2-Linux-Kernel@64d6a12094f3
Atomic commit:
microsoft/WSL2-Linux-Kernel@dce7cd62754b5
From here on I will try to build more recent kernel versions with the commit reverted. Feel free to use these and report your experience.
The text was updated successfully, but these errors were encountered: