-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisit SELinux labeling approach #635
Comments
Would a new upstream systemd target for this help?
But could we run before those? If so, we can at least today add them to |
Sort of. It would if no other units make use of it. As soon as that's not the case, then we're mostly back to square one.
Yeah, that would work as I mentioned a little after that:
Basically, I'm just not sure if it's a good long-term solution. The list would be subject to change, and potentially distro-specific. Another issue with the current approach I hadn't mentioned is that we'd need to also split the relabeling operation in two: those in |
Fundamentally having the files labeled while in the initramfs would be way cleaner and more "Ignition-y". What would be really nice is a userspace tool that lets you say "I have this file, in this root; what should it be labeled as" but I don't think that exists. Then we could just manually figure out and set the correct label. If we do go down this route, it's probably worth having a kola test that checks that the current method and the new method give the same results. |
That tool does exist, it's |
Argg. And I suppose it's probably not best to just not load any policy in the initramfs at all. |
This is a workaround for an inherent issue with the current relabeling approach (see coreos#635). `systemd-sysctl.service` is definitely one of those early services that have a high probability of reading files from `/etc` before it's relabeled. They're both pulled in by `sysinit.target`, but `ignition-relabel.service` has an additional `After=local-fs.target` which makes it likelier to run later (also see coreos#635 about that). So for now, let's just hack around this by making sure `systemd-sysctl` runs after us.
To be clear, the current approach could be made to work just fine with e.g. split units for @poettering, could you chime in here? Is there reason to be reluctant of loading the policy in the initramfs? Or is this a fully supported scenario for systemd? A reading of |
Related: systemd/systemd#11131. This would allow us to let systemd relabel the files and thus drop |
Use tmpfiles.d for that. Drop a file in /run/tmpfiles.d/ and use the "z" line type. The new relabel-extra.d/ stuff is only necessary for stuff that needs to be labelled properly during earliest boot. For everything else tmpfiles.d should suffice. |
Ahh nice! Didn't know about the |
So, I'm working on adding support for |
`systemd-journal-catalog-update.service` writes to `/var`. However, it's not explicitly ordered wrt `systemd-tmpfiles-setup.service`, which means that it may run before or after. This is an issue for Fedora CoreOS, which uses Ignition. We want to be able to prepare `/var` on first boot from the initrd, where the SELinux policy is not loaded yet. This means that the hierarchy under `/var` is not correctly labeled. We add a `Z /var - - -` tmpfiles entry so that it gets relabeled once `/var` gets mounted post-switchroot. So any service that tries to access `/var` before `systemd-tmpfiles` relabels it is likely to hit `EACCES`. Fix this by simply ordering `systemd-journal-catalog-update.service` after `systemd-tmpfiles-setup.service`. This is also clearer since the tmpfiles entries are the canonical source of how `/var` should be populated. For more context on this, see: coreos/ignition#635 (comment)
`systemd-journal-catalog-update.service` writes to `/var`. However, it's not explicitly ordered wrt `systemd-tmpfiles-setup.service`, which means that it may run before or after. This is an issue for Fedora CoreOS, which uses Ignition. We want to be able to prepare `/var` on first boot from the initrd, where the SELinux policy is not loaded yet. This means that the hierarchy under `/var` is not correctly labeled. We add a `Z /var - - -` tmpfiles entry so that it gets relabeled once `/var` gets mounted post-switchroot. So any service that tries to access `/var` before `systemd-tmpfiles` relabels it is likely to hit `EACCES`. Fix this by simply ordering `systemd-journal-catalog-update.service` after `systemd-tmpfiles-setup.service`. This is also clearer since the tmpfiles entries are the canonical source of how `/var` should be populated. For more context on this, see: coreos/ignition#635 (comment)
`systemd-journal-catalog-update.service` writes to `/var`. However, it's not explicitly ordered wrt `systemd-tmpfiles-setup.service`, which means that it may run before or after. This is an issue for Fedora CoreOS, which uses Ignition. We want to be able to prepare `/var` on first boot from the initrd, where the SELinux policy is not loaded yet. This means that the hierarchy under `/var` is not correctly labeled. We add a `Z /var - - -` tmpfiles entry so that it gets relabeled once `/var` gets mounted post-switchroot. So any service that tries to access `/var` before `systemd-tmpfiles` relabels it is likely to hit `EACCES`. Fix this by simply ordering `systemd-journal-catalog-update.service` after `systemd-tmpfiles-setup.service`. This is also clearer since the tmpfiles entries are the canonical source of how `/var` should be populated. For more context on this, see: coreos/ignition#635 (comment)
`systemd-journal-catalog-update.service` writes to `/var`. However, it's not explicitly ordered wrt `systemd-tmpfiles-setup.service`, which means that it may run before or after. This is an issue for Fedora CoreOS, which uses Ignition. We want to be able to prepare `/var` on first boot from the initrd, where the SELinux policy is not loaded yet. This means that the hierarchy under `/var` is not correctly labeled. We add a `Z /var - - -` tmpfiles entry so that it gets relabeled once `/var` gets mounted post-switchroot. So any service that tries to access `/var` before `systemd-tmpfiles` relabels it is likely to hit `EACCES`. Fix this by simply ordering `systemd-journal-catalog-update.service` after `systemd-tmpfiles-setup.service`. This is also clearer since the tmpfiles entries are the canonical source of how `/var` should be populated. For more context on this, see: coreos/ignition#635 (comment) (cherry picked from commit 8e729d5)
So circling back to the initial proposal here, @poettering: would systemd support initrds that preload the policy? The current code seems to imply it is, but I'd like to confirm whether this is a configuration which systemd developers are not actively opposed to supporting. It would clearly simplify a lot of things for us and more generally, anything that needs to prepare files in the rootfs. One argument that was brought up against this (#569 (comment)) is that the policy might interfere with systemd itself loading the policy. Though ISTM like this is something we can plan for and fix up as required if we hit that? (FWIW, I did not hit any sort of obvious regressions when initially testing this on f29). |
If you load the policy in the initrd you need to include it in the initrd, and the policy isn't precisely tiny. You also need to be very careful with labelling everything in the initrd properly when building it, and you need to do something smart if you every allow the initrd and the host tree booted with it to get out of sync. But if you are willing to accept all the above then yes, systemd allows you to allow the policy from the initrd. I mean, few people do that, so we don't know if it actually workes, as it gets no testing, but most of the code should be written in a way that we load policy when its not loaded yet and if its available, and it doesn't matter much if that's the initrd or the host systemd that does that... what i don't get though: if you make changes to /var from the initrd, why do you unmount it again? (that's what you do, no?) systemd is happy if you just leave it mounted. And if you do, then you can use relabel-extra.d/ after all for it if you like. relabel-extra.d is intended to be used only for stuff that should be relabelled right during the transition, i.e. in a time where — in the generic case — /var is not available yet, and hence only /run and things like that are writable at all. But if in the atomic case you don't support fully-split out /var anyway (by which i mean: /var is not searched for at all whatsoever before the host systemd does that), then by all means, relabel-extra.d is something you can use. |
We would load the policy from the sysroot itself after it is mounted, so we wouldn't have to include it in the initrd (or risk getting out of sync). The point about making sure things in the initrd are labeled properly is a good one.
👍
That's correct, we unmount it (and everything else other than
Yeah, I was thinking about this. And I think we could do this (leave the mounts on and leverage I guess we could have a custom unit that does this with |
urks, but that's super ugly. systemd only supports loading policy when initializing (it caches the label db and explicitly labels many runtime fds after all). On an initrd-system systemd is supposed to initialize twice: the version in the initrd, and then the version on the host. If you wan to load the policy independently of these two points, you need to issue the equivalent of "systemctl daemon-reexec", and that's something you should never do in clean code paths really. Doing "systemctl daemon-reexec" is a hack support do allow RPM-style package upgrades without rebooting, and for testing stuff. It's not something you want to call during each and every boot. It's slow and it's a tiny bit fragile (that's because we drop off the bus temporarily for example and clients don't expect that). Hence: i would strongly advise you to load the policy only when pid 1 in the initrd first initializes or the one on the host does. But adding a reexec in between is ugly, very ugly. |
Well, but it's a tiny difference between first and subsequent boots only. Through the relabelling you'll have a ton more. If I were you I'd certainly trade leaving the mount point on for drastically simplifying the relabelling.
Nope, there's no hook for anything like that. systemd generally optimizes things for not doing stuff twice.
This is not going to work: by unmounting the /var mount you will cancel systemd's queued job to mount it, and then things are bad... Have you considered simply rebooting after the relabel? i.e. you said ignition runs once only anyway: so maybe make the changes you want t make, then continue booting into the system you just prepped, requesting a relabel with relabe-extra.d/ or tmpfiles, running only a few select services, then reboot automaticall. On the subsequent boot everything is properly set up, no relabelling necessary anymore, no ignition necessary anymore and so on. That would work a bit like RPM offline upgrades, where you boot into a special mode, which updates everything with only minimal services running, then reboots again. |
or, here's a third idea: when ignition runs in the initrd, on first boot, redirect "default.target" to a special target you define, maybe called "iginition-finish.target", this target pulls in sysinit.target (and thus systemd-tmpfiles.service and such, including its relabelling magic) and very few select other services. Then, as last service it pulls in a quick tool that does "systemctl daemon-reexec" followed by "systemctl isolate default.target", after removing the default.target redirect. This will then enqueue the full system. That saves you the extra reboot, by inserting an additional phase between the initrd and the regular system runtime, where PID 1 is already the one from the host, but we don't start all the many services a normal system would need, but just the basic minimum to complete the relabel, and not more, so that we can minimize the programs that run while mislabelled files/dirs still exist. |
Thanks, gotcha. Agreed that a
Hmm, that's an interesting idea. I'm unsure though about the added complexity if the best we can do is minimizing programs that run in a mislabeled environment vs eliminating it entirely. We're doing an OK (though at times painful) job in the current approach with minimizing these interactions; the main motivation for this ticket is to completely get rid of them. I should note another avenue we're exploring here is simply to make setting SELinux labels not require a loaded policy at all (see #569 (comment)). Overall, this would allow Ignition to completely work from the initrd and not require any modifications to the boot process post-switchroot. |
For the time being, we are also relabelling the hostname file that afterburn writes to, as part of the dracut module itself. See: coreos/afterburn#245 Longer term we should remember to revisit that as well. |
Now that we can relabel from the initrd, we should be able to categorically resolve all SELinux and Ignition issues. Here, we drop the hacky `ignition-relabel.service` and relabel all the files we need to on the spot! Requires: https://lore.kernel.org/selinux/20190819193032.848-1-jlebon@redhat.com/ Closes: coreos#635
Now that we can relabel from the initrd, we should be able to categorically resolve all SELinux and Ignition issues. Here, we drop the hacky `ignition-relabel.service` and relabel all the files we need to on the spot! Requires: https://lore.kernel.org/selinux/20190819193032.848-1-jlebon@redhat.com/ Closes: coreos#635
So, we had a large discussion in both #569 and coreos/bugs#2417 about the best way to tackle SELinux labeling. To summarize, the two options were:
(a) create an
ignition-relabel.service
unit that runsrestorecon
on all the created files(b) load the policy from the initramfs
In the end, we went with (a). I'd like us to now reconsider option (b). The reason is that there are some pitfalls with (a):
All mutations specified in the Ignition config happen pre-pivot. Dumping a relabeling service to "complete the job" post-pivot goes against that.
This is also mentioned in the Fedora devel thread I had linked in the PR: basically there is no reliable way to eliminate race conditions with other systemd services that may access mislabelled files. One example of this is
systemd-sysctl.service
, which runs earlier thanignition-relabel.service
currently, and thus can trip on drop-in files in/etc/sysctl.d/
. In fact, right nowsystemd-analyze plot
shows there are more than a dozen services that start beforeignition-relabel.service
.(Un)fortunately AFAIK systemd has no concept of "run this unit first"; i.e. if multiple units have the same
Before=
specification, it's still undefined when each will run; they're normally run in parallel. Of course, you could start playing the game of tacking on specific troublesome services in yourBefore=
but that's hacky and brittle.In contrast, the pitfall of (b) is that:
Loading the policy in the initramfs seems like a big change. The key piece of software this directly affects is systemd. Normally, post-pivot systemd is the one that loads the policy first. However, systemd is designed to deal with pre-loaded policies from the initramfs. First, it always tries to load the policy, regardless of whether a policy is already loaded. Even if it fails (e.g. somehow the policy prevents systemd from loading the policy, which it currently does not, but let's say), systemd will just keep going. Which is fine; we already have the correct policy loaded.
Note that even with the policy being loaded in the initramfs, it would still be best to run a relabeling pass (or using
setfscreatecon
at file creation time). Files created by utilities Ignition runs throughchroot
will likely be labeled properly. Other files Ignition directly creates might be labeled correctly through the default algorithm (e.g. file transition rules, or inheritance from parent dir), but it might not. So most of the current "file tracking" code used for relabeling would still be needed.My proposal is to add a new experimental compile time switch which loads the policy in Ignition and performs relabeling as part of the initramfs. We can then enable this switch in FCOS and/or RHCOS and see how it works. If there are major roadblocks, we just drop the experimental switch. If not, we stabilize on the new behaviour for the
selinuxRelabel
knob and drop theignition-relabel.service
path.Thoughts? Are there other pitfalls with (b) (or major advantages of (a))?
The text was updated successfully, but these errors were encountered: