Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

35coreos-ignition: skip reboot if changed kargs match current boot #1409

Open
wants to merge 6 commits into
base: testing-devel
Choose a base branch
from

Conversation

dustymabe
Copy link
Member

If the requested kernel arguments in the Ignition config already match
the kernel arguments of the currently booted system then let's skip
the reboot because the reboot won't change anything.

One example of a use of this would be if someone is doing a PXE install
and they want to persistently use net.ifnames=0. They apply
net.ifnames=0 on the PXE boot and coreos-installer transparently
forwards it to the Ignition boot (for a single boot). Then the Ignition
config has net.ifnames=0 set in the kernel arguments section.
ignition-kargs.service will take care of setting it persistently,
but without this change the system will be rebooted. With this change
we skip the reboot.

@dustymabe dustymabe requested a review from bgilbert January 7, 2022 17:12
echo "Kernel arguments were changed. Requesting reboot."
touch /run/coreos-kargs-reboot
else
echo "Kernel arguments were changed, but they match this boot. Skipping reboot."
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately Ignition only prints stdout/stderr if the command fails, though I find having this information would be quite useful. Anything we can do to get them? Run logger?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe simplest is to just write to /dev/kmsg. That'll end up in both the console and the journal in the non-reboot case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. writing to kmsg doesn't end up giving me an entry in the journal in my local experiments. I did see it in the kernel logs/serial console (which makes sense), but not in the journal.

I tried with logger, but apparently that isn't in the initramfs (neither is systemd-cat) and I'm not sure we should put it there. Any other interfaces you know of?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. writing to kmsg doesn't end up giving me an entry in the journal in my local experiments. I did see it in the kernel logs/serial console (which makes sense), but not in the journal.

Hmm weird, it works here.

It might not show up in a journalctl from the initramfs, but should be there after switchroot. In the reboot case, no journal logs are conserved, so only the console log will be visible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can do a screenshare and compare. Wasn't working for me in the "skip reboot" case so no journal logs should have been lost IIUC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out writing to kmsg should work but we were seeing it get ratelimited. Rather than add an extra dependency into the initramfs we decided to just write to kmsg (best effort).

jlebon
jlebon previously approved these changes Jan 10, 2022
Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me overall.

else
echo "Kernel arguments were changed, but they match this boot. Skipping reboot."
fi
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, there is also the case where the kargs in the BLS config already match the Ignition config, but neither match the current boot. Before and after this patch, we don't reboot, but we likely should now that we care about firstboot kargs in the diskful path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added another commit for this case. Untested right now. If the strategy looks good I'll try to add an ext test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not 100% sure how to add an ext test for this. Since we don't have any of the logs from the first boot it's hard to get that information. Any ideas?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we put a karg in appendFirstbootKernelArgs and in the Ignition config's shouldNotExist section and then verify that the karg isn't present in /proc/cmdline in the test? We don't have logs from first boot, but we can deduce that it worked since it was present on first boot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I think the problem is that the implementation for appendFirstbootKernelArgs (the file that exists in /boot) doesn't get cleaned up until Ignition fully runs. So any karg we put there will be present.

We could add another reboot and test then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offhand, I think one way to fix this is:

  1. store the fetched Ignition config in /boot
  2. store any NM keyfiles generated from kargs in /boot
  3. empty the stamp file but leave it there; it's still only deleted by firstboot complete

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a lot of effort and maybe not worth the risk.

We could prevent the bootloop by telling ourselves that we ran once before either through a stamp file or by adding another entry to the BLS config (coreos.karg.ran), but that wouldn't prevent the karg booted versus expected mismatch. It would only prevent the bootloop, though then we could error if we wanted to.

Interested to see what @bgilbert thinks here. We could just drop 35coreos-ignition: handle the case where booted kargs don't match desired too if it's not worth it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think caching the Ignition config in /boot is something we should probably do anyway. It's more efficient of course, but also because then config servers like the MCS can enforce a "one fetch only" rule as a way to protect secrets.

From there, adding the NM keyfiles and emptying the stamp file doesn't seem like much work. The underlying mechanism for getting NM keyfiles from /boot is already in place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing a use case for adding a karg to firstboot-kargs and also Ignition shouldNotExist: that would imply the user wants to remove a default karg except on the first boot, and I don't think we have any default kargs where that's obviously reasonable. And in general, I don't think we need to be especially robust to invalid user configuration. If we deterministically boot-loop in this case, the user will notice and can fix their config.

If we think caching the config is worthwhile, I'm okay with pursuing it, but I don't think it needs to be a blocker here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An easy way this can happen I think is users relying on the auto-forwarding kargs at install time, and then wanting a different configuration at first boot time (e.g. different location, or different NIC used for install vs run). But agreed we don't need to block on this.

echo "Kernel arguments were changed. Requesting reboot."
touch /run/coreos-kargs-reboot
else
echo "Kernel arguments were changed, but they match this boot. Skipping reboot."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe simplest is to just write to /dev/kmsg. That'll end up in both the console and the journal in the non-reboot case.

@@ -22,7 +22,8 @@ install() {
diff \
lsblk \
sed \
sgdisk
sgdisk \
systemd-cat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm OK, let's do that screenshare before going this way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah - let's do that tomorrow. Sorry a bunch of other things got in the way today.

systemd-cat -t coreos-kargs <<< "$msg"
fi
msg="Kernel arguments in BLS config were updated."
systemd-cat -t coreos-kargs <<< "$msg"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's nuke the stamp file here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. rm /run/coreos-kargs-changed - does it hurt since its under /run?

@bgilbert
Copy link
Contributor

General logic flow LGTM.

@dustymabe dustymabe force-pushed the dusty-kargs-skip-reboot branch from 929203b to 6a4d3b0 Compare February 2, 2022 18:32
@dustymabe
Copy link
Member Author

ok rebased and also added a test. The test depends on coreos/coreos-assembler#2683

If the requested kernel arguments in the Ignition config already match
the kernel arguments of the currently booted system then let's skip
the reboot because the reboot won't change anything.

One example of a use of this would be if someone is doing a PXE install
and they want to persistently use `net.ifnames=0`. They apply
`net.ifnames=0` on the PXE boot and coreos-installer transparently
forwards it to the Ignition boot (for a single boot). Then the Ignition
config has `net.ifnames=0` set in the kernel arguments section.
ignition-kargs.service will take care of setting it persistently,
but without this change the system will be rebooted. With this change
we skip the reboot.
…ired

Today when you boot a system it's possible a value for a karg you
specified in the Ignition config matches the BLS config but not the
kargs from the current boot (in /proc/cmdline). Let's detect this and
reboot in that case too.
We're going to add more tests here so let's put this one under
a subheading.
- Add description
- Convert Ignition to Butane
- Use ok/fatal from commonlib.sh
This test verifies that if a kernel argument that is set as "should_exist"
in the Ignition config already exists on the kernel command line of the machine
then we can skip the reboot when applying kernel arguments but we must still
update the BLS configs to make it permanent.
This will further sanity check the karg persists even to subsequent
boots.
@dustymabe dustymabe force-pushed the dusty-kargs-skip-reboot branch from 6a4d3b0 to 83f3df2 Compare March 18, 2022 17:16
@dustymabe
Copy link
Member Author

rebased on top of latest tip - can we revisit this and get it in soon?

# by Ignition if there is no failure. This forces the info into the journal,
# but sometimes the journal will miss these messages because of ratelimiting.
# We've decided to accept this limitation rather than add the systemd-cat or
# logger utlities to the initramfs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utilities

if is-live-image; then
/usr/bin/rdcore kargs --current --create-if-changed /run/coreos-kargs-changed "$@"
if [ -e /run/coreos-kargs-changed ]; then
# If we're in a live system and the kargs don't match then we must error.
Copy link
Member

@jlebon jlebon Mar 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: tab


if is-live-image; then
# If we're in a live system and the kargs don't match then we must error.
if [ -e /run/coreos-kargs-thisboot-differ ]; then
if [ -e /run/coreos-kargs-reboot ]; then
# Since we exit with error here the stderr will get shown by Ignition
echo "Need to modify kernel arguments, but cannot affect live system." >&2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this go to /dev/kmsg for consistency?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. In the case there is an error Ignition will bubble that up the user so we need to write to stderr. We only need to write to kmsg when the messages are informational (i.e. Ignition swallows any I/O when there isn't an error).

Comment on lines +47 to +48
msg="Desired kernel arguments don't match current boot. Requesting reboot."
echo "$msg" > /dev/kmsg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: feels like we can collapse these here and elsewhere now that we're only outputting to /dev/kmsg?

@@ -2,6 +2,7 @@
# TODO: Doc

set -xeuo pipefail
# This test runs on all platforms and verifies Ignition kernel argument setting.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message of the commit introducing this hunk says "- Use ok/fatal from commonlib.sh" but that seems to already be the case now.

fi
if grep mitigations /proc/cmdline; then
fatal "found mitigations in kernel cmdline"
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this similarly also check the BLS?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants