Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit SELinux labeling approach #635

Closed
jlebon opened this issue Sep 21, 2018 · 18 comments · Fixed by #846
Closed

Revisit SELinux labeling approach #635

jlebon opened this issue Sep 21, 2018 · 18 comments · Fixed by #846

Comments

@jlebon
Copy link
Member

jlebon commented Sep 21, 2018

So, we had a large discussion in both #569 and coreos/bugs#2417 about the best way to tackle SELinux labeling. To summarize, the two options were:

(a) create an ignition-relabel.service unit that runs restorecon on all the created files
(b) load the policy from the initramfs

In the end, we went with (a). I'd like us to now reconsider option (b). The reason is that there are some pitfalls with (a):

  1. somewhat goes against the Ignition philosophy of doing everything pre-pivot

All mutations specified in the Ignition config happen pre-pivot. Dumping a relabeling service to "complete the job" post-pivot goes against that.

  1. (most importantly) running relabeling operations on a booting system is ill-defined/racy

This is also mentioned in the Fedora devel thread I had linked in the PR: basically there is no reliable way to eliminate race conditions with other systemd services that may access mislabelled files. One example of this is systemd-sysctl.service, which runs earlier than ignition-relabel.service currently, and thus can trip on drop-in files in /etc/sysctl.d/. In fact, right now systemd-analyze plot shows there are more than a dozen services that start before ignition-relabel.service.

(Un)fortunately AFAIK systemd has no concept of "run this unit first"; i.e. if multiple units have the same Before= specification, it's still undefined when each will run; they're normally run in parallel. Of course, you could start playing the game of tacking on specific troublesome services in your Before= but that's hacky and brittle.

In contrast, the pitfall of (b) is that:

  1. it's heavy-handed and might lead to systemd issues

Loading the policy in the initramfs seems like a big change. The key piece of software this directly affects is systemd. Normally, post-pivot systemd is the one that loads the policy first. However, systemd is designed to deal with pre-loaded policies from the initramfs. First, it always tries to load the policy, regardless of whether a policy is already loaded. Even if it fails (e.g. somehow the policy prevents systemd from loading the policy, which it currently does not, but let's say), systemd will just keep going. Which is fine; we already have the correct policy loaded.

Note that even with the policy being loaded in the initramfs, it would still be best to run a relabeling pass (or using setfscreatecon at file creation time). Files created by utilities Ignition runs through chroot will likely be labeled properly. Other files Ignition directly creates might be labeled correctly through the default algorithm (e.g. file transition rules, or inheritance from parent dir), but it might not. So most of the current "file tracking" code used for relabeling would still be needed.

My proposal is to add a new experimental compile time switch which loads the policy in Ignition and performs relabeling as part of the initramfs. We can then enable this switch in FCOS and/or RHCOS and see how it works. If there are major roadblocks, we just drop the experimental switch. If not, we stabilize on the new behaviour for the selinuxRelabel knob and drop the ignition-relabel.service path.

Thoughts? Are there other pitfalls with (b) (or major advantages of (a))?

@cgwalters
Copy link
Member

Would a new upstream systemd target for this help?

In fact, right now systemd-analyze plot shows there are more than a dozen services that start before ignition-relabel.service.

But could we run before those? If so, we can at least today add them to Before=.

@jlebon
Copy link
Member Author

jlebon commented Sep 24, 2018

Would a new upstream systemd target for this help?

Sort of. It would if no other units make use of it. As soon as that's not the case, then we're mostly back to square one.

But could we run before those? If so, we can at least today add them to Before=.

Yeah, that would work as I mentioned a little after that:

Of course, you could start playing the game of tacking on specific troublesome services in your Before= but that's hacky and brittle.

Basically, I'm just not sure if it's a good long-term solution. The list would be subject to change, and potentially distro-specific.

Another issue with the current approach I hadn't mentioned is that we'd need to also split the relabeling operation in two: those in /etc and those under /var (which is basically everything else) so that we can support /var mounts correctly. The ones in /etc can then run much earlier while the /var one would have to wait until after var.mount.

@ajeddeloh
Copy link
Contributor

Fundamentally having the files labeled while in the initramfs would be way cleaner and more "Ignition-y". What would be really nice is a userspace tool that lets you say "I have this file, in this root; what should it be labeled as" but I don't think that exists. Then we could just manually figure out and set the correct label.

If we do go down this route, it's probably worth having a kola test that checks that the current method and the new method give the same results.

@jlebon
Copy link
Member Author

jlebon commented Sep 24, 2018

What would be really nice is a userspace tool that lets you say "I have this file, in this root; what should it be labeled as" but I don't think that exists.

That tool does exist, it's matchpathcon. The issue is that you can't actually set the labels without first loading the policy: #569 (comment).

@ajeddeloh
Copy link
Contributor

Argg. And I suppose it's probably not best to just not load any policy in the initramfs at all.

jlebon added a commit to jlebon/ignition that referenced this issue Sep 27, 2018
This is a workaround for an inherent issue with the current relabeling
approach (see coreos#635). `systemd-sysctl.service` is definitely one of those
early services that have a high probability of reading files from `/etc`
before it's relabeled.

They're both pulled in by `sysinit.target`, but
`ignition-relabel.service` has an additional `After=local-fs.target`
which makes it likelier to run later (also see coreos#635 about that). So for
now, let's just hack around this by making sure `systemd-sysctl` runs
after us.
@jlebon
Copy link
Member Author

jlebon commented Sep 28, 2018

To be clear, the current approach could be made to work just fine with e.g. split units for /etc and /var and tacking on more troublesome services to Before= as they show up (I don't expect there'd be that many). I just want to make sure we're not avoiding the cleaner solution without good well-founded reasons.

@poettering, could you chime in here? Is there reason to be reluctant of loading the policy in the initramfs? Or is this a fully supported scenario for systemd? A reading of selinux-setup.c seems to indicate so but I'm not sure how much that path is exercised in practice.

@jlebon
Copy link
Member Author

jlebon commented Dec 12, 2018

Related: systemd/systemd#11131. This would allow us to let systemd relabel the files and thus drop ignition-files.service. Though once we support /var mounts, we'll still need some relabeling unit with e.g. After=local-fs.target and Before=basic.target or something?

@poettering
Copy link

Though once we support /var mounts, we'll still need some relabeling unit with e.g. After=local-fs.target and Before=basic.target or something?

Use tmpfiles.d for that. Drop a file in /run/tmpfiles.d/ and use the "z" line type.

The new relabel-extra.d/ stuff is only necessary for stuff that needs to be labelled properly during earliest boot. For everything else tmpfiles.d should suffice.

@jlebon
Copy link
Member Author

jlebon commented Dec 12, 2018

Use tmpfiles.d for that. Drop a file in /run/tmpfiles.d/ and use the "z" line type.

Ahh nice! Didn't know about the z type.

@jlebon
Copy link
Member Author

jlebon commented Mar 5, 2019

Use tmpfiles.d for that. Drop a file in /run/tmpfiles.d/ and use the "z" line type.

Ahh nice! Didn't know about the z type.

So, I'm working on adding support for /var mounts now, and this works well. Though I did have to tweak a few systemd units to get a clean boot: systemd/systemd#11903.

jlebon added a commit to jlebon/systemd that referenced this issue Mar 12, 2019
`systemd-journal-catalog-update.service` writes to `/var`. However, it's
not explicitly ordered wrt `systemd-tmpfiles-setup.service`, which means
that it may run before or after.

This is an issue for Fedora CoreOS, which uses Ignition. We want to be
able to prepare `/var` on first boot from the initrd, where the SELinux
policy is not loaded yet. This means that the hierarchy under `/var` is
not correctly labeled. We add a `Z /var - - -` tmpfiles entry so that it
gets relabeled once `/var` gets mounted post-switchroot.

So any service that tries to access `/var` before `systemd-tmpfiles`
relabels it is likely to hit `EACCES`.

Fix this by simply ordering `systemd-journal-catalog-update.service`
after `systemd-tmpfiles-setup.service`. This is also clearer since the
tmpfiles entries are the canonical source of how `/var` should be
populated.

For more context on this, see:
coreos/ignition#635 (comment)
poettering pushed a commit to systemd/systemd that referenced this issue Mar 14, 2019
`systemd-journal-catalog-update.service` writes to `/var`. However, it's
not explicitly ordered wrt `systemd-tmpfiles-setup.service`, which means
that it may run before or after.

This is an issue for Fedora CoreOS, which uses Ignition. We want to be
able to prepare `/var` on first boot from the initrd, where the SELinux
policy is not loaded yet. This means that the hierarchy under `/var` is
not correctly labeled. We add a `Z /var - - -` tmpfiles entry so that it
gets relabeled once `/var` gets mounted post-switchroot.

So any service that tries to access `/var` before `systemd-tmpfiles`
relabels it is likely to hit `EACCES`.

Fix this by simply ordering `systemd-journal-catalog-update.service`
after `systemd-tmpfiles-setup.service`. This is also clearer since the
tmpfiles entries are the canonical source of how `/var` should be
populated.

For more context on this, see:
coreos/ignition#635 (comment)
jkartseva pushed a commit to jkartseva/systemd that referenced this issue Mar 18, 2019
`systemd-journal-catalog-update.service` writes to `/var`. However, it's
not explicitly ordered wrt `systemd-tmpfiles-setup.service`, which means
that it may run before or after.

This is an issue for Fedora CoreOS, which uses Ignition. We want to be
able to prepare `/var` on first boot from the initrd, where the SELinux
policy is not loaded yet. This means that the hierarchy under `/var` is
not correctly labeled. We add a `Z /var - - -` tmpfiles entry so that it
gets relabeled once `/var` gets mounted post-switchroot.

So any service that tries to access `/var` before `systemd-tmpfiles`
relabels it is likely to hit `EACCES`.

Fix this by simply ordering `systemd-journal-catalog-update.service`
after `systemd-tmpfiles-setup.service`. This is also clearer since the
tmpfiles entries are the canonical source of how `/var` should be
populated.

For more context on this, see:
coreos/ignition#635 (comment)
keszybz pushed a commit to systemd/systemd-stable that referenced this issue Mar 29, 2019
`systemd-journal-catalog-update.service` writes to `/var`. However, it's
not explicitly ordered wrt `systemd-tmpfiles-setup.service`, which means
that it may run before or after.

This is an issue for Fedora CoreOS, which uses Ignition. We want to be
able to prepare `/var` on first boot from the initrd, where the SELinux
policy is not loaded yet. This means that the hierarchy under `/var` is
not correctly labeled. We add a `Z /var - - -` tmpfiles entry so that it
gets relabeled once `/var` gets mounted post-switchroot.

So any service that tries to access `/var` before `systemd-tmpfiles`
relabels it is likely to hit `EACCES`.

Fix this by simply ordering `systemd-journal-catalog-update.service`
after `systemd-tmpfiles-setup.service`. This is also clearer since the
tmpfiles entries are the canonical source of how `/var` should be
populated.

For more context on this, see:
coreos/ignition#635 (comment)

(cherry picked from commit 8e729d5)
@jlebon
Copy link
Member Author

jlebon commented May 31, 2019

So circling back to the initial proposal here, @poettering: would systemd support initrds that preload the policy? The current code seems to imply it is, but I'd like to confirm whether this is a configuration which systemd developers are not actively opposed to supporting. It would clearly simplify a lot of things for us and more generally, anything that needs to prepare files in the rootfs.

One argument that was brought up against this (#569 (comment)) is that the policy might interfere with systemd itself loading the policy. Though ISTM like this is something we can plan for and fix up as required if we hit that? (FWIW, I did not hit any sort of obvious regressions when initially testing this on f29).

@poettering
Copy link

If you load the policy in the initrd you need to include it in the initrd, and the policy isn't precisely tiny. You also need to be very careful with labelling everything in the initrd properly when building it, and you need to do something smart if you every allow the initrd and the host tree booted with it to get out of sync.

But if you are willing to accept all the above then yes, systemd allows you to allow the policy from the initrd. I mean, few people do that, so we don't know if it actually workes, as it gets no testing, but most of the code should be written in a way that we load policy when its not loaded yet and if its available, and it doesn't matter much if that's the initrd or the host systemd that does that...

what i don't get though: if you make changes to /var from the initrd, why do you unmount it again? (that's what you do, no?) systemd is happy if you just leave it mounted. And if you do, then you can use relabel-extra.d/ after all for it if you like.

relabel-extra.d is intended to be used only for stuff that should be relabelled right during the transition, i.e. in a time where — in the generic case — /var is not available yet, and hence only /run and things like that are writable at all. But if in the atomic case you don't support fully-split out /var anyway (by which i mean: /var is not searched for at all whatsoever before the host systemd does that), then by all means, relabel-extra.d is something you can use.

@jlebon
Copy link
Member Author

jlebon commented Jun 17, 2019

If you load the policy in the initrd you need to include it in the initrd, and the policy isn't precisely tiny. You also need to be very careful with labelling everything in the initrd properly when building it, and you need to do something smart if you every allow the initrd and the host tree booted with it to get out of sync.

We would load the policy from the sysroot itself after it is mounted, so we wouldn't have to include it in the initrd (or risk getting out of sync). The point about making sure things in the initrd are labeled properly is a good one.

But if you are willing to accept all the above then yes, systemd allows you to allow the policy from the initrd.

👍

what i don't get though: if you make changes to /var from the initrd, why do you unmount it again? (that's what you do, no?) systemd is happy if you just leave it mounted.

That's correct, we unmount it (and everything else other than /sysroot itself). The reason is that because Ignition runs only on first boot, we need to make sure that the user config has correctly set up mount options via a mount unit/fstab stanza right from the first boot. If we were to leave the mounts on, the way the system boots up and mounts filesystems would be different between the first boot and subsequent boots. And this difference could easily mask provisioning issues that only appear after the first reboot.

And if you do, then you can use relabel-extra.d/ after all for it if you like.

Yeah, I was thinking about this. And I think we could do this (leave the mounts on and leverage relabel-extra.d/), if there's a way to unmount things immediately after for the reasons above and let the regular mount units/fstab generator code mount things later on as usual. Would that be possible?

I guess we could have a custom unit that does this with Before=local-fs-pre.target, but it'd be even nicer if systemd could do this for us so that other user code that run even earlier don't wrongly key off of them (e.g. generators).

@poettering
Copy link

If you load the policy in the initrd you need to include it in the initrd, and the policy isn't precisely tiny. You also need to be very careful with labelling everything in the initrd properly when building it, and you need to do something smart if you every allow the initrd and the host tree booted with it to get out of sync.

We would load the policy from the sysroot itself after it is mounted, so we wouldn't have to include it in the initrd (or risk getting out of sync). The point about making sure things in the initrd are labeled properly is a good one.

urks, but that's super ugly. systemd only supports loading policy when initializing (it caches the label db and explicitly labels many runtime fds after all). On an initrd-system systemd is supposed to initialize twice: the version in the initrd, and then the version on the host. If you wan to load the policy independently of these two points, you need to issue the equivalent of "systemctl daemon-reexec", and that's something you should never do in clean code paths really. Doing "systemctl daemon-reexec" is a hack support do allow RPM-style package upgrades without rebooting, and for testing stuff. It's not something you want to call during each and every boot. It's slow and it's a tiny bit fragile (that's because we drop off the bus temporarily for example and clients don't expect that). Hence: i would strongly advise you to load the policy only when pid 1 in the initrd first initializes or the one on the host does. But adding a reexec in between is ugly, very ugly.

@poettering
Copy link

what i don't get though: if you make changes to /var from the initrd, why do you unmount it again? (that's what you do, no?) systemd is happy if you just leave it mounted.

That's correct, we unmount it (and everything else other than /sysroot itself). The reason is that because Ignition runs only on first boot, we need to make sure that the user config has correctly set up mount options via a mount unit/fstab stanza right from the first boot. If we were to leave the mounts on, the way the system boots up and mounts filesystems would be different between the first boot and subsequent boots. And this difference could easily mask provisioning issues that only appear after the first reboot.

Well, but it's a tiny difference between first and subsequent boots only. Through the relabelling you'll have a ton more. If I were you I'd certainly trade leaving the mount point on for drastically simplifying the relabelling.

And if you do, then you can use relabel-extra.d/ after all for it if you like.

Yeah, I was thinking about this. And I think we could do this (leave the mounts on and leverage relabel-extra.d/), if there's a way to unmount things immediately after for the reasons above and let the regular mount units/fstab generator code mount things later on as usual. Would that be possible?

Nope, there's no hook for anything like that. systemd generally optimizes things for not doing stuff twice.

I guess we could have a custom unit that does this with Before=local-fs-pre.target, but it'd be even nicer if systemd could do this for us so that other user code that run even earlier don't wrongly key off of them (e.g. generators).

This is not going to work: by unmounting the /var mount you will cancel systemd's queued job to mount it, and then things are bad...

Have you considered simply rebooting after the relabel? i.e. you said ignition runs once only anyway: so maybe make the changes you want t make, then continue booting into the system you just prepped, requesting a relabel with relabe-extra.d/ or tmpfiles, running only a few select services, then reboot automaticall. On the subsequent boot everything is properly set up, no relabelling necessary anymore, no ignition necessary anymore and so on.

That would work a bit like RPM offline upgrades, where you boot into a special mode, which updates everything with only minimal services running, then reboots again.

@poettering
Copy link

or, here's a third idea: when ignition runs in the initrd, on first boot, redirect "default.target" to a special target you define, maybe called "iginition-finish.target", this target pulls in sysinit.target (and thus systemd-tmpfiles.service and such, including its relabelling magic) and very few select other services. Then, as last service it pulls in a quick tool that does "systemctl daemon-reexec" followed by "systemctl isolate default.target", after removing the default.target redirect. This will then enqueue the full system. That saves you the extra reboot, by inserting an additional phase between the initrd and the regular system runtime, where PID 1 is already the one from the host, but we don't start all the many services a normal system would need, but just the basic minimum to complete the relabel, and not more, so that we can minimize the programs that run while mislabelled files/dirs still exist.

@jlebon
Copy link
Member Author

jlebon commented Jun 17, 2019

Hence: i would strongly advise you to load the policy only when pid 1 in the initrd first initializes or the one on the host does. But adding a reexec in between is ugly, very ugly.

Thanks, gotcha. Agreed that a daemon-reexec right in the middle would be unfortunate (when testing this originally, I was literally calling load_policy directly). So this approach indeed comes down to including the policy in the initrd, which has its own set of issues.

or, here's a third idea: when ignition runs in the initrd, on first boot, redirect "default.target" to a special target you define, maybe called "iginition-finish.target"

Hmm, that's an interesting idea. I'm unsure though about the added complexity if the best we can do is minimizing programs that run in a mislabeled environment vs eliminating it entirely. We're doing an OK (though at times painful) job in the current approach with minimizing these interactions; the main motivation for this ticket is to completely get rid of them.

I should note another avenue we're exploring here is simply to make setting SELinux labels not require a loaded policy at all (see #569 (comment)). Overall, this would allow Ignition to completely work from the initrd and not require any modifications to the boot process post-switchroot.

@yuqi-zhang
Copy link

For the time being, we are also relabelling the hostname file that afterburn writes to, as part of the dracut module itself. See: coreos/afterburn#245

Longer term we should remember to revisit that as well.

@jlebon jlebon added jira for syncing to jira and removed jira for syncing to jira labels Oct 17, 2019
jlebon added a commit to jlebon/ignition that referenced this issue Nov 29, 2019
Now that we can relabel from the initrd, we should be able to
categorically resolve all SELinux and Ignition issues.

Here, we drop the hacky `ignition-relabel.service` and relabel all the
files we need to on the spot!

Requires: https://lore.kernel.org/selinux/20190819193032.848-1-jlebon@redhat.com/

Closes: coreos#635
jlebon added a commit to jlebon/ignition that referenced this issue Nov 29, 2019
Now that we can relabel from the initrd, we should be able to
categorically resolve all SELinux and Ignition issues.

Here, we drop the hacky `ignition-relabel.service` and relabel all the
files we need to on the spot!

Requires: https://lore.kernel.org/selinux/20190819193032.848-1-jlebon@redhat.com/

Closes: coreos#635
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants