-
Notifications
You must be signed in to change notification settings - Fork 823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failing (and invisible in strace) clone syscall related to privileged namespaces and chrome's setuid sandbox #2242
Comments
This is the fail:
I didn't track it very far. Takes on Real Linux natch. Running suid binaries on WSL works as far as it goes. In any case, the suid sandbox is nearly dead. Still be valuable (on its own merits) looking into why the |
Oh yeah, repro:
|
Yeah, I was mainly interested to see if privileged namespace functionality was working at all. So is this a clone problem, or is it just reportinga clone problem incorrectly? |
Maybe. But probably not. That error message to Note chrome goes out of it's way to thwart All of which would be noteworthy if I thought Stephen actually put chrome in play. |
Oh, by the way, an update: I contacted the chromium security team, and they confirmed that the setuid sandbox will continue to be included for the foreseeable future, but they have no plans to patch it. It will be removed probably when it accumulates enough bugs or breakages or security flaws as to become counterproductive (unless someone wants to take up the responsibility of maintaining patches for it). I haven't yet forwarded the email I received to the Arch kernel and chromium maintainers (real life intervened and I forgot), but I can probably get an answer back in the next week or so to find out what they're going to do. If they plan to stick with the setuid sandbox also for the foreseeable future, WSL supporting it should keep WSL compatible with chromium (by reporting from the kernel that user namespaces are disabled and therefore letting chromium fall back gracefully to the setuid sandbox). The security-focused guys (like the maintainer of the linux-hardened package for Arch) have told me that they feel that user namespaces are insecure by design (and indeed they have been exploited a few times in the past two years). I'll ping the WSL team with the conclusion of my investigation once I get an answer from the Arch people. |
Amusingly enough @benhillis' work on #962 moved this forward. 16251 now gets over the Having missed that, I did get the "subsequent
So @fpqc now ya know. Since I am here, I have to underline this ticket has nada to do with "namespaces", even though the word is used 18 times in this thread so far including in the issue title (so says |
@therealkenc I'm not sure how the suid sandbox works exactly on-the-nose, but my (possibly incorrect) understanding is that it makes explicit use of the privileged namespace functionality to construct a secure chroot in which the renderer process runs. In particular, from the suid sandbox page
So in particular, what I was asking was if privileged namespaces were the problem (in this case, I guess, the PID namespace). |
Oh I see what you mean. Yeah that counts, and "nada" is clearly overstated. It would be more correct to say "the problem right now is not related to namespaces". So apologies for that. [n.b. I just didn't want to see this ticket go down the same path as #1962, which is a |
All I know about the User Namespace functionality is that it could conceptually make sandboxes/containers easier to construct and work with, but also that they have already been escaped/exploited numerous times in the past 3-4 years and have been one of the largest sources of Linux Kernel CVEs. I've spoken with some of the Linux-Hardened patchset (the successor to grsecurity) maintainers, and they have argued that User Namespaces will continue to be insecure because they are conceptually flawed, rather than flawed in implementation. I don't know myself, but that seems to be the opinion of security-minded people. Anyway, quick question: Does WSL support PID namespaces? I could have sworn you commented recently to Ben that it's not and should probably be as part of the requirements to support systemd. |
I didn't think |
Haha that's super cool. I also noticed on that page there's a launch flag to start up the suid sandbox in a way that allows it to be ptraced, so if I try to trace it again in the new build, I'll turn that on too. |
So if they wanted to add nspawn/rkt support, I looked into it: They need the multiple devpts mount support, network namespace support as part of the base systemd requirements, and the following additional namespaces: UTS From your research here it looks like PID and mount namespaces are implemented. How about UTS, IPC, and Network namespaces? |
Need some more bits for From your first post, the issue here was and remains:
...full stop... |
Haha! That's pretty cool! So maybe I should close this thread, and you could just open a new issue for just that CLONE_FS|SIGCHLD pattern, which is much more specific. PS does unshare()ing those other kernel namespaces work as expected as well? |
Heh, don't look at me. I don't even run Chrome on WSL. 😉
I haven't done an |
@therealkenc and @fpqc - Keep digging, you might find some surprises... :) |
Finding the Easter Eggs is indeed fun; which I suppose is the point. But on balance I would still rather see a Roadmap (like chakra, .NET Core, typescript, and vscode), better release notes, and elimination of the internal shadow issue tracker. That |
@therealkenc - I 100% agree with you. I'd like to be more open with the community and it's something that I'm pushing towards. Unfortunately there are a lot of politics to content with. The issue with unsupported clone flag combinations is well-understood so no test repro is needed. Thank you for the offer though. |
@therealkenc - Dug into this a bit this afternoon. Once I implemented the CLONE_FS flag chromium makes it a bit further, but ultimately fails when trying to talk to udev over netlink sockets (I think).
|
Hmmm. I assumed I can't underline how easy it is to stub the netlink socket part in the WSL kernel if you guys feel like it. My userspace stub just opens a AF_UNIX dgram socket here. Which is totally the wrong family, but it isn't like userspace knows any different. The next thing userspace does with the socket descriptor is |
@therealkenc - the CLONE_FS flag was implemented for the unshare system call, but not for the clone system call. |
@therealkenc - Well would you look at that... The change to add CLONE_FS support is pretty small so I think I can squeeze it in for Fall Creator's update. As always, thanks @therealkenc for all the help! |
Awesome. Fun, no? Now all you need to do is track why VSCode doesn't launch (toy electron apps are fine). Give an inch, they'll ask for a mile 😜. |
Did you also change the kernel to report user namespaces were disabled? I noticed you launched without the |
@fpqc - no that clone call still fails but chrome appears to recover from the error. |
Ah, neat! |
@therealkenc - I liked your idea of stubbing netlink support so I decided to see what would happen if I allowed creation of NETLINK_KOBJECT_UEVENT netlink sockets that will never receive any messages. With that change and the CLONE_FS change I above I'm able to launch chromium without any workarounds. Unfortunately I can't do a similar think for all netlink families. For example, allowing creation of NETLINK_AUDIT sockets but don't actually support the messages opens worms (sudo, adduser, passwd all start failing). I'm submitting a CR for this change now. |
Most excellent. But you forgot the PulseAudio That's the end of the hard blockers though. The rest of the messages you see in the console is one soft fail, plus configuration problems that would manifest on real Linux (including OpenGL). |
@therealkenc - I did not have to apply the pulseaudio .deb to get chrome to launch. Chrome launches but per your assertion I'm assuming remote audio will not work without said futex change? |
@benhillis It isn't 'remote audio' per se. The problem (IMO Google bug) is that chrome faceplants on #486 whether you want to hear noises from your webpages or not. Contrast Firefox which recovers. You don't need the server side up and running. You just need to get past the |
@therealkenc - You're correct my screencap was from a machine that I was testing some things on 17.04 in parallel. However my assertion that google-chrome launches without workarounds was based on trying this on a 16.04 install with none of your workarounds: I am seeing the FUTEX_CMP_REQUEUE errors in the strace but chrome seems to be handling it OK. |
Okay understood. It is possible (probable, even) that Google fixed it along the way and I never noticed because I've had the [edit] Or Stephen fixed |
I'm taking a stab at implementing the futex requeue options, doesn't look like it should be too bad (famous last words). |
How hard can it be. I have been thinking at least faking it should be possible since Sept 2016 (message). I just put up some theoretic install instructions for chrome (theoretic, because I have never run them in sequence). But if you pick out the OpenGL and dbus session setup good bits, you should be at console warning parity with Real Linux. "Should be", with the singular exception of #1353 (zombies). But oddly I am not seeing that (soft) fail in your screencap either. Typically it looks like:
So you are ahead of the game on a few fronts. |
@benhillis should be good to close up shop on this thread as well with a fixedininsiders flag |
@fpqc - will do, thanks! |
Build 16215
We already knew that chromium couldn't be launched with its usual sandbox due to the fact that user namespaces aren't implemented. I assumed privileged namespaces were not completely implemented either, though I had heard some of the devs say that privileged namespaces as a concept already exist in the underlying native NT context.
Chromium also includes a setuid sandbox implementation for kernels that don't implement user namespaces (notably Arch, though I used a clean Ubuntu Xenial image to do this test), and you can launch chromium in this way with the flag
--disable-namespace-sandbox
, which disables the user namespace sandbox but enables the classic setuid sandbox.In the user namespace sandbox case, we also get a line in between that actually shows the failing syscall:
Now the weird part is this:
Let's try to run that without an strace first:
we get (among other errors, I know) that there is a clone syscall receiving an invalid parameter or option.
But when you look into the strace, there's no EINVAL on a Clone syscall, so somewhere some clone syscall is failing on unimplemented surface but is also failing to be picked up in the strace, unlike in the namespace sandbox case.
Straces:
for setuid sandbox:
suidsandbox.txt
for user namespaces sandbox:
unamespaces.txt
The text was updated successfully, but these errors were encountered: