-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
35coreos-ignition: skip reboot if changed kargs match current boot #1409
base: testing-devel
Are you sure you want to change the base?
Conversation
echo "Kernel arguments were changed. Requesting reboot." | ||
touch /run/coreos-kargs-reboot | ||
else | ||
echo "Kernel arguments were changed, but they match this boot. Skipping reboot." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately Ignition only prints stdout/stderr if the command fails, though I find having this information would be quite useful. Anything we can do to get them? Run logger
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe simplest is to just write to /dev/kmsg
. That'll end up in both the console and the journal in the non-reboot case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. writing to kmsg
doesn't end up giving me an entry in the journal in my local experiments. I did see it in the kernel logs/serial console (which makes sense), but not in the journal.
I tried with logger
, but apparently that isn't in the initramfs (neither is systemd-cat
) and I'm not sure we should put it there. Any other interfaces you know of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. writing to
kmsg
doesn't end up giving me an entry in the journal in my local experiments. I did see it in the kernel logs/serial console (which makes sense), but not in the journal.
Hmm weird, it works here.
It might not show up in a journalctl
from the initramfs, but should be there after switchroot. In the reboot case, no journal logs are conserved, so only the console log will be visible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can do a screenshare and compare. Wasn't working for me in the "skip reboot" case so no journal logs should have been lost IIUC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out writing to kmsg should work but we were seeing it get ratelimited. Rather than add an extra dependency into the initramfs we decided to just write to kmsg (best effort).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me overall.
else | ||
echo "Kernel arguments were changed, but they match this boot. Skipping reboot." | ||
fi | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, there is also the case where the kargs in the BLS config already match the Ignition config, but neither match the current boot. Before and after this patch, we don't reboot, but we likely should now that we care about firstboot kargs in the diskful path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added another commit for this case. Untested right now. If the strategy looks good I'll try to add an ext test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not 100% sure how to add an ext test for this. Since we don't have any of the logs from the first boot it's hard to get that information. Any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we put a karg in appendFirstbootKernelArgs
and in the Ignition config's shouldNotExist
section and then verify that the karg isn't present in /proc/cmdline
in the test? We don't have logs from first boot, but we can deduce that it worked since it was present on first boot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I think the problem is that the implementation for appendFirstbootKernelArgs
(the file that exists in /boot
) doesn't get cleaned up until Ignition fully runs. So any karg we put there will be present.
We could add another reboot and test then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offhand, I think one way to fix this is:
- store the fetched Ignition config in
/boot
- store any NM keyfiles generated from kargs in
/boot
- empty the stamp file but leave it there; it's still only deleted by firstboot complete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like a lot of effort and maybe not worth the risk.
We could prevent the bootloop by telling ourselves that we ran once before either through a stamp file or by adding another entry to the BLS config (coreos.karg.ran
), but that wouldn't prevent the karg booted versus expected mismatch. It would only prevent the bootloop, though then we could error if we wanted to.
Interested to see what @bgilbert thinks here. We could just drop 35coreos-ignition: handle the case where booted kargs don't match desired
too if it's not worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I think caching the Ignition config in /boot
is something we should probably do anyway. It's more efficient of course, but also because then config servers like the MCS can enforce a "one fetch only" rule as a way to protect secrets.
From there, adding the NM keyfiles and emptying the stamp file doesn't seem like much work. The underlying mechanism for getting NM keyfiles from /boot
is already in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing a use case for adding a karg to firstboot-kargs and also Ignition shouldNotExist
: that would imply the user wants to remove a default karg except on the first boot, and I don't think we have any default kargs where that's obviously reasonable. And in general, I don't think we need to be especially robust to invalid user configuration. If we deterministically boot-loop in this case, the user will notice and can fix their config.
If we think caching the config is worthwhile, I'm okay with pursuing it, but I don't think it needs to be a blocker here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An easy way this can happen I think is users relying on the auto-forwarding kargs at install time, and then wanting a different configuration at first boot time (e.g. different location, or different NIC used for install vs run). But agreed we don't need to block on this.
echo "Kernel arguments were changed. Requesting reboot." | ||
touch /run/coreos-kargs-reboot | ||
else | ||
echo "Kernel arguments were changed, but they match this boot. Skipping reboot." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe simplest is to just write to /dev/kmsg
. That'll end up in both the console and the journal in the non-reboot case.
06bae91
to
929203b
Compare
@@ -22,7 +22,8 @@ install() { | |||
diff \ | |||
lsblk \ | |||
sed \ | |||
sgdisk | |||
sgdisk \ | |||
systemd-cat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm OK, let's do that screenshare before going this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah - let's do that tomorrow. Sorry a bunch of other things got in the way today.
systemd-cat -t coreos-kargs <<< "$msg" | ||
fi | ||
msg="Kernel arguments in BLS config were updated." | ||
systemd-cat -t coreos-kargs <<< "$msg" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's nuke the stamp file here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. rm /run/coreos-kargs-changed
- does it hurt since its under /run
?
General logic flow LGTM. |
929203b
to
6a4d3b0
Compare
ok rebased and also added a test. The test depends on coreos/coreos-assembler#2683 |
If the requested kernel arguments in the Ignition config already match the kernel arguments of the currently booted system then let's skip the reboot because the reboot won't change anything. One example of a use of this would be if someone is doing a PXE install and they want to persistently use `net.ifnames=0`. They apply `net.ifnames=0` on the PXE boot and coreos-installer transparently forwards it to the Ignition boot (for a single boot). Then the Ignition config has `net.ifnames=0` set in the kernel arguments section. ignition-kargs.service will take care of setting it persistently, but without this change the system will be rebooted. With this change we skip the reboot.
…ired Today when you boot a system it's possible a value for a karg you specified in the Ignition config matches the BLS config but not the kargs from the current boot (in /proc/cmdline). Let's detect this and reboot in that case too.
We're going to add more tests here so let's put this one under a subheading.
- Add description - Convert Ignition to Butane - Use ok/fatal from commonlib.sh
This test verifies that if a kernel argument that is set as "should_exist" in the Ignition config already exists on the kernel command line of the machine then we can skip the reboot when applying kernel arguments but we must still update the BLS configs to make it permanent.
This will further sanity check the karg persists even to subsequent boots.
6a4d3b0
to
83f3df2
Compare
rebased on top of latest tip - can we revisit this and get it in soon? |
# by Ignition if there is no failure. This forces the info into the journal, | ||
# but sometimes the journal will miss these messages because of ratelimiting. | ||
# We've decided to accept this limitation rather than add the systemd-cat or | ||
# logger utlities to the initramfs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utilities
if is-live-image; then | ||
/usr/bin/rdcore kargs --current --create-if-changed /run/coreos-kargs-changed "$@" | ||
if [ -e /run/coreos-kargs-changed ]; then | ||
# If we're in a live system and the kargs don't match then we must error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: tab
|
||
if is-live-image; then | ||
# If we're in a live system and the kargs don't match then we must error. | ||
if [ -e /run/coreos-kargs-thisboot-differ ]; then | ||
if [ -e /run/coreos-kargs-reboot ]; then | ||
# Since we exit with error here the stderr will get shown by Ignition | ||
echo "Need to modify kernel arguments, but cannot affect live system." >&2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this go to /dev/kmsg
for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. In the case there is an error Ignition will bubble that up the user so we need to write to stderr. We only need to write to kmsg when the messages are informational (i.e. Ignition swallows any I/O when there isn't an error).
msg="Desired kernel arguments don't match current boot. Requesting reboot." | ||
echo "$msg" > /dev/kmsg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: feels like we can collapse these here and elsewhere now that we're only outputting to /dev/kmsg
?
@@ -2,6 +2,7 @@ | |||
# TODO: Doc | |||
|
|||
set -xeuo pipefail | |||
# This test runs on all platforms and verifies Ignition kernel argument setting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message of the commit introducing this hunk says "- Use ok/fatal from commonlib.sh" but that seems to already be the case now.
fi | ||
if grep mitigations /proc/cmdline; then | ||
fatal "found mitigations in kernel cmdline" | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this similarly also check the BLS?
If the requested kernel arguments in the Ignition config already match
the kernel arguments of the currently booted system then let's skip
the reboot because the reboot won't change anything.
One example of a use of this would be if someone is doing a PXE install
and they want to persistently use
net.ifnames=0
. They applynet.ifnames=0
on the PXE boot and coreos-installer transparentlyforwards it to the Ignition boot (for a single boot). Then the Ignition
config has
net.ifnames=0
set in the kernel arguments section.ignition-kargs.service will take care of setting it persistently,
but without this change the system will be rebooted. With this change
we skip the reboot.