-
Notifications
You must be signed in to change notification settings - Fork 838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WSL is unkillable without rebooting (permissions/security problems for admins) #1086
Comments
Thanks for reporting the issue. The service is a "locked down" process since it runs as a protected process (light) so access is limited from other non-protected processes - https://msdn.microsoft.com/en-us/library/windows/desktop/dn313124(v=vs.85).aspx. The service hang reported in the other issue is a bug we should fix. |
@stehufntdev So is there a way for an admin to manually kill this kind of protected process without rebooting (like a protected-mode task killer? I have no idea) or is that kind of thing only available running in a less secure mode used for debugging (like SELinux permissive mode)? Also, even when you fix the underlying bug in #1085 , it seems reasonable that you might want to kill the Linux instance from the Windows side if it is malfunctioning. Could you maybe make it so stopping the service kills the Linux instance more aggressively (for example, by setting a timeout after which it will force kill the instance)? Also, I understand why you wouldn't want to allow random code injection into the service process, but it's not an antimalware program, so it seems like protecting it from being force-killed is a bit of overkill. |
I just about asked this same question last week when I managed to hose WSL a bunch of times. Unfortunately I don't have a useful repro1 which is why I didn't bring it up. In my case the process couldn't be killed, and if I closed the shell and tried to open another it would hang with no prompt. Reboot was the only way out. It would be nice to have a
|
@therealkenc Well luckily I spent an hour and found a pretty minimal configuration on Trusty to get a repro. Hopefully they will fix both the underlying bug as well as provide the nukeinstance interface (or add a nukeinstance timer to the session manager service on manual stop). I also had a problem trying to do an scp of a large file inside of tmux, but I did it with the messed up zsh/tmux, so who knows what caused the bug. |
Can you please share some information about why lxss even needs to run as a protected process? This would be very useful information since I may need to change the way our kernel-mode 'anti-rootkit' feature (included in Process Hacker) is presently able to terminate the LxssManager service and WSL processes. You can do one of two things to terminate the LxssManager service:
That will enable full access to the LxssManager service (including termination rights) but you're likely going to open up a security hole by doing so. Alternatively, Process Hacker is able to terminate 'protected' processes such as WSL:
|
@dmex How is process hacker doing that? An exploit? If process hacker can do this, mustn't it be using a local privilege escalation? What would stop malware doing the same? Edit: Oh, I guess only if you are running with the kernel mode driver? Edit2: I guess your kernelmode driver is signed, but what is stopping third party malware from using your signed driver to circumvent security features like this one? |
Process Hacker simply calls ZwTerminateProcess from kernel-mode using a kernel driver? That function allows 'protected' processes to be terminated (when called from kernel-mode) and is officially documented and supported by Microsoft - feel free to review our source code, you'll find calls to that function are very well protected :) The major problem is that anti-virus/malware software is unable to defend against zero-day attacks (or the majority of malicious executables compiled within the last month or so) and vendors usually take up to several weeks (or even months) to create signatures and remove malicious software from your machine - instead Process Hacker allows you to terminate and remove rootkits and malware yourself. Rootkits and malware can obviously use the same function if they create a kernel driver and obtain an EV certificate to sign it but EVs are expensive and require a lots of legal documentation to be granted (and their expensive certificate is easily revoked). The serious question is why LXSS needs to run as a protected process... If something executing inside Bash manages to exploit WSL, then it would be unkillable by everything except Process Hacker (at minimum for several weeks until the majority of Anti-virus software created signatures and starting blocking the malicious code). Hopefully @stehufntdev or someone can share some details about why this protection level is needed so I can decide if I need to make changes to Process Hacker to prevent LxssManager from being terminated (if it's just to protect calls to/from lxcore.sys then I can suggest a much better alternative to protection levels!) . |
@dmex Yeah I thought about it a bit more, and it seems like it doesn't matter that much bc in order to call the process hacker kill functionality you already need ring 3 admin, in which case you can probably find some other way to obtain persistence. The point, I guess, is that unless the driver itself is vulnerable, it probably won't allow arbitrary execution of code in Supervisor mode even from Ring 3 adminland, and I assume that even from Ring 3 adminland you can't disable secure boot without exploiting the firmware, in which case certs will be enforced and unsigned/self signed rootkits can't be installed with only Ring3 admin (unless you find a bug that allows you to jump from admin into the kernel in the first place). Makes sense, and I guess I trust process hacker to install drivers into kernel mode as much as I trust NVidia, lol. |
For now, we only want Windows signed code accessing the lxss device driver. This is accomplished today by marking the lxss device driver ACL as local system Windows PPL, and running the lxssmanager service as local system Windows PPL service. |
@stehufntdev Is that also enforced at the COM interface on the service? Is that the reason why I'm getting an immediate crash with @ionescu007 's lxlaunch (because bash.exe is a signed Windows component, it's being allowed to connect to the service, but other binaries may not be able to do so)? Or is it just a weird thing that's happening bc of something I screwed up while compiling it?? |
@fpqc - Non windows signed binaries should be able to connect to LxssManager (just not the driver). We are still refining the COM interface which is a large part of the reason we have not yet documented it. |
@benhillis He just rewrote the relevant part of the program 3 days ago, I'm pretty sure it's up-to-date. It actually does a check to see if the build is >=15000, bizarrely enough (maybe someone over at MS is giving him nightly builds? No idea). |
The build number check is simply a bug on my end. It was meant to be 14500. The code does not crash for me, so I will have to debug what's wrong with it. The COM interface changed to allow the creation of an unnamed IPC channel with the launcher. Also, your questions on Protected Processes & etc, and why MS locked down the driver this way (I asked them to) are explained in the BlackHat presentation. |
@benhillis You're shipping an IDL file for Lxss? ;-) |
@ionescu007 we plan on documenting the com interfaces at some point but we want to make sure we don't do that while they are still in flux. |
That's actually really cool. And yes, the proper thing is to update the IID Version when you add a parameter ;-) Understandable not to have done it for the 'beta' I guess. No source/privates used in my research, sadly (except ole32/combase which are on the symbol server) -- too much of an NDA risk. Public symbols + the intense debug output is enough :) |
Turns out kernel developers aren't the best at following COM best practices :) |
It might be an idea to add a /kill parameter to lxrun or somewhere (as @therealkenc suggested) that sends an ioctl to lxcore and terminates all running pico processes? It should be possible to terminate/recover your session without having to reboot (or use 3rd party software). I know you asked for PPL ;) but I was hoping ntdev might consider some changes that allow non-critical PPL processes (such as LxssManager) to be terminated using Task manager and other software. If Lxss was somehow compromised or become a runaway processes the ability to terminate it would improve both reliability and security? |
The COM interface does have a Stop/Terminate command that sends the IOCTL. Best regards, On Thu, Sep 29, 2016 at 7:53 AM, Steven G notifications@github.com wrote:
|
Stopping the service through scm will end up do the same thing as the COM interface's stop\terminate command. In the hang caused by zsh, a thread was deadlocked in the kernel (non alterable wait) so it wouldn't have been possible to kill without a restart. All bugs aside in WSL, stopping the service is the right way to terminate the running instance. |
Thanks! Please add that into the next version ❤️ |
I could not stop the service Env: Windows 10 ver 1607 (build 14955.1000) |
@gdh1995 In my case, I booted as safe mode. I just deleted most recent files from |
I somehow managed to freeze the entire WSL by running a mv command in bash. Just renaming a folder which itself contains a single non-critical file. Could not kill the mv process by any means, nor could I kill LxssManager even with the Process Explorer method mentioned above. |
@emberquill let me guess you're on Windows 10 1607 right? |
quietly closing this up @benhillis . Should be marked "bydesign" by merit of it being a kernel driver. |
FWIW, I realize the discussion went elsewhere, but going back to the OP on this issue, I just encountered this. Ran a normal session, then hit Ctrl-D to logout. Got "logout" in the terminal, and it hung. Ctrl-C does nothing, attempts to kill it do nothing, and of course it doesn't show up in task manager. I tried stopping LxssManager, which timed out, and now its status just pegged at "Stopping". |
@jefferai This is gonna happen with any deadlock in the driver. There are instructions in some of the documentation on enabling a system memory dump and manually crashing windows to generate it if you get into a deadlocked state like this. Memory will be written out to disk, and you can host the dump on onedrive and submit it to secure@microsoft.com attn: Ben Hillis or Sunil Muthuswamy Or, if you have a reliable repro, you can submit it as a new bug report. The problem is that if the driver deadlocks, restarting the service sends an ioctl to the kernel to stop the driver, but the driver is deadlocked so it just sorta hangs. |
@fpqc any chance you can point me to those docs? Happy to submit the dump if so. |
I have found a solution, but it requires the use of third party software. Step 1: Write click on ProcessHacker.exe and Run As Administrator. Step 2: Go to services tab, search for LxssManager, right click on it and select Go to process. It will automatically highlight the svchost.exe process associated with LxssManager. Step 5: Since it is automatically selected, just press Del button on your keyboard to terminate the process and click on Terminate to confirm the same. Step 6: Again go to services tab, search for LxssManager, right click and click on Start to start the LxssManager service. Step 7: Since KProcessHacker3 creates a SYSTEM level service to manage and kill system services, open Administrator Command Prompt and use the command Now enjoy WSL without requiring a system reboot. |
Looks like even KProcessHacker won't help if you already have initiated the service stopping. It just brings up an error about trying to access a process that is in a shutdown already. |
Can't believe this hasn't been fixed after 4 years... |
@mewbow1 Nope. Running Process Hacker as administrator. |
Try using the nightly builds of Process Hacker - the v2.39 release doesn't support Windows 10 @LoganDark |
@dmex When running as administrator, it says it can't load the kernel module because access is denied. When not running as administrator it didn't. So being the idiot that I am, I tried to start up bash and then tested if PH could terminate it. It couldn't. Now I have to reboot again, and I don't say that lightly. |
The kernel driver is required for WSL process termination. That access_denied error can only occur when Antivirus software and/or malicious software has blocked the driver from loading. You will need to add exceptions to your security software or remove the malicious software to be able to use the driver. |
Which is why I included it in my post.
It was my antivirus, thanks. I have it on silent to stop ad popups (shitty I know), but when I turned off silent mode, it made a popup about the kernel module. Should be able to add an exception. I am not going to test this again till later. |
Just encountered this recently. When I boot up my PC, the WSL doesn't launch properly and when I try to stop it through the And so far the only solution that I found was to reboot. Windows version: Version 1909 (OS Build 18363.720) |
Happens to me every day. Should be somehow related to EACCESS issue in WSL1 because the process than can't be killed is @mewbow1 thank you so much, this actually helped. (I used current 2.39 release of Process Hacker.) Killing svchost didn't stop the LxssManager though, so I clicked restart, Process Hacked became unresponsive, I forcibly closed it, reopened and LxssManager showed as stopped this time. Then I started the service, started Ubuntu terminal and unkillable processes has gone. Yay, no more reboots.
I didn't try to stop the service via Task Manager this time, maybe that's why it worked. |
Unfortunately this led to another problem: I couldn't rename folder which was probably occupied by that process. Search for handle didn't show anything. At the same time I could restart LxssManager service perfectly fine. So it has to be some issue with WSL/NTFS interop... |
I have the same issue recently. I'm also using VS Code Remote with WSL and I also had mysterious issues with renaming or removing folders while it's open (or running npm install). Furthermore, a reproducible way for me to cause a hung WSL is running Jekyll with the watch option for a while and changing some files. It won't take more than a few minutes to get everything stuck. I'm on 18363.1139. Is this fixed in newer versions? The problem is just that for many months my update screen says that the 2004 version is "being prepared" and I will get notified when it's available for my machine... (still waiting to get that notification...) |
Still not fixed after 5 years. All commands hang and LxssManager is permanently in the "Stop Pending" state even after Reboot. Spent 6+ hours trying different suggestions mentioned in this thread and none working. This needs to be fixed ASAP though. |
That doesn't sound possible, are you sure you rebooted and didn't just turn it "off" and on again (possible not executing an actual restart but for example hibernation)? |
@CherryDT I'm sure I "rebooted". I tried all possible ways I found on internet including Registry Edit. It's just not working and seems constantly in "Stop Pending" state. No command works except Elevated CMD and elevated Powershell are not helping too. |
Will this ever actually be fixed or are we supposed to reboot our system every time LxssManager fails forever? |
I'm at Version 20H2 (OS Build 19042.985) and haven't encountered this issue anymore. But yeah, as far as I know, the only solution is to reboot. |
I can confirm this still happens in 20H2 (19042.985). I'm not sure if it's related to hibernation or just WSL being unused for a while. |
Just happened to me on 21H1 so still not fixed. I believe the Bitdefender AV caused this glitch due to a PHP script virus it encountered during a GIT clone operation. |
This just happened to me on the insider builds of Windows 11.
Can relate that I'm having this issue hibernating my PC, too. |
happen also with the corporate laptop. PS C:> Restart-Service -Name "LxssManager" |
i used taskkill command (https://www.youtube.com/watch?v=e7IsO51eTYw). |
Did someone find a solution after all these years? |
Can't believe this hasn't been fixed after 7 years... |
Install linux as main OS, easiest fix. No need to use windows anymore :). Here's a nice run once powershell script to restart it:
|
@FinalFortune This doesn't work, as mentioned it is unkillable until rebooting. Doing what you wrote will get the process stuck in a "terminating" state. It stays and when you try a second time you get the error that the process is already being terminated so you can't terminate it again. From a service perspective it will stay stuck "stopping" so start-service will also fail because the service is in an invalid state for starting it. |
Well I'm fairly certain my WSL had borked from hibernate. It wasn't terminating, so I ran the above command, which was derived from the stackoverflow solution, and it worked. However im not sure it was the exact same state mentioned earlier in this issue, I will confirm next time with the appropriate commands. This issue has been a constituent in making using docker on windows a collosal pain the ass to use in windows, I just come from the C# side so using docker from WSL was a natural progression, and plain Hyper V docker was far slower. However gorging myself on a bed of blades may have been a more enjoyable experience. |
Since this is 'by design' , has an official workaround to this issue also been designed? 😆 ?
|
I am having a much better time after creating a powershell script that runs |
In my previous issue, I gave a way to turn WSL into a zombie process that is unkillable.
I continued testing ways to kill it, and I think it's worth having its own issue
The only way to kill the service is rebooting, once it hangs. I tried launching a command prompt as NT Authority\System with psexec from the sysinternals tools, and this happened:
Not sure how access is being denied to NT Authority\System, which should not have any restrictions in Ring 0, or so I thought. Maybe this means that it's being locked by the Windows 10 hypervisor (I have hyper-V installed, so maybe it's a hypervisor-level lockout using VSM? The process explorer screenshot below says that it's not using Virtualization, so what is securing it?!)
It's running with such a high level of security that even process explorer running as admin cannot remove its protections:
The text was updated successfully, but these errors were encountered: