-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
service: exit and prevent restarts if booted into a container image #896
Conversation
Thanks for assembling this patch, I like this direction. It sounds like we are now generally trying to address the case of Zincati somehow running on unexpected OS/state, which is a good idea from the point of view of UX. One of the goals here seem to be avoiding an infinite restart loop. I think systemd has a dedicated Aside, I personally think it would be a good time to try to plumb an "error object with optional exit code" till here: Lines 58 to 61 in ebb241c
If you don't want to waste too much time on that, I can give it a try using |
So, automatic updates on by default are core to the CoreOS philosophy. Is your rationale behind having zincati explicitly fail (and hence that having that failure at least minimally appear via CLHM and other automation tools) to represent this state in a very visible way? It might be of course that admins have set up something to upgrade the system not via zincati; it's easy to have an external orchestration system effectively just run In that scenario, the user can mask zincati and hence mask the spurious failure. On the other hand, what I was thinking with this patch is that their choice to use an image also kind of signals such an intent. |
This is part of coreos/fedora-coreos-tracker#1263 We don't yet have an official stance on how zincati and custom container images interact. Today, zincati just crash loops. This changes things so that we exit (but still with an error) if we detect the booted system is using a container image origin. One nicer thing here is that the unit status is also updated, e.g. `systemctl status zincati` will show: `Status: "Automatic updates disabled; booted into container image ..."`
95d1c39
to
de798dd
Compare
OK, I reworked this a bit to use a custom error type and still exit with an error. |
I think so. Overall I don't think that "booted from a containerized commit" and "user wants to opt-out auto-updates" are two things that we should conflate at this time. Instead I'd scope this PR to "Zincati inspected this booted commit and doesn't know how to auto-update it". If Zincati manages to start it can surface different kind of failures (both transient and permanent) through metrics to the monitoring infra. But if it fails to complete startup for whatever reason, it needs some other kind of system to expose this failure state. Auto-updates are critical enough that "keep restarting" is often a good solution to eventually bypass any transient error. But in this case we know we can't get out of this situation without manual intervention, so I think it's fair to stay in a perma-failed state. |
if let Some(img) = status.container_image_reference.as_ref() { | ||
let msg = format!("Automatic updates disabled; booted into container image {img}"); | ||
crate::utils::update_unit_status(&msg); | ||
return Err(anyhow::Error::new(FatalError(msg))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was more expecting this additional step to be performed in the error case in fedora_coreos_stream_from_deployment()
(so that both cases would prevent restarts), but I guess just here it could work as well.
For sure this is not the final word on this...I am now realizing how much work it will be to plumb through all the data into Cincinnati etc. around the container image metadata. There's an argument that we should still crash-loop - that would handle the case where a user rebases back to an ostree commit today live. (But OTOH the most likely scenario is to reboot there anyways which will then trigger zincati retries) |
Agree, this PR is just having a better error in the latter case. |
This is part of coreos/fedora-coreos-tracker#1263
We don't yet have an official stance on how zincati and custom container images interact. Today, zincati just crash loops. This changes things so that we gracefully exit if we detect the booted system is using a container image origin.
(The code here isn't quite as clean as it could be; calling
std::process::exit()
in the middle of the call chain isn'telegant but doing better would require plumbing through an
Option<T>
through many layers)