-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restarting the Zincati Service fails randomly #671
Comments
It looks like you are restarting Zincati in the middle of an upgrade, which then leaves Taking a step back, I think what you need is a |
For reference, the underlying bug is that we leak behind a transaction running in rpm-ostree daemon, even if the client had disappeared. This came up already in coreos/rpm-ostree#3194 (comment) and we should enhance the daemon so that the lifetime of the transaction is automatically bound to the caller. |
@lucab thank you very much for the clarifications. I was suspecting that something along those lines is happening. To your question on how I ended up in this situation:
So that is how I arrived at the situation. So the problem appears to be that zincati is trying to finalize an update and is waiting until its time window to restart the server comes around. Then once a month my ci/cd pipeline comes around and tries to restart the service which it doesn't like since it's trying to finalize the update (understandable) So if I swap the order i.e. allow zincati to finalize updates before running the ci/cd pipeline I should be able to mitigate this problem to a large extent. A Aside: By finalizing the update I mean, that Zincati is either actively installing updates or is just waiting to reboot the server 😇 for my case and suggest approach it is not relevant which one it is. I hope I understood you correctly :) |
Thanks for the additional context. Yes, it looks like you are currently racing with Zincati trying to eagerly fetch/stage updates beforehand (so that they are ready to be applied as soon as your configuration allows it). Unfortunately I don't currently have a perfect solution to suggest. Some mitigations could be:
|
Good morning :) Thank you very much for your help and your suggested mitigations. I will try these :) Thank you again and have a great day and weekend. |
Ack, thanks! I will forward the last two bullet items to separate tickets (no ETA though, both of them may require quite a bit of work) and then close this. |
Followup tickets at #673 and coreos/rpm-ostree#3206. |
Bug Report
The following happened:
Our CI/CD performs an update of Zincati configuration when changes are made. After the new
.toml
files have been uploaded the zincati service is restarted in order to load the latest configuration using the command:sudo systemctl restart zincati.service
. But lately we are running into the problem, that the restart fails. The error message can be seen below. This causes the not only the CI/CD Pipeline to fail but also causes the server to enter a deadlocked state, where no applications running on the server are responsive and even trying to establish an SSH connection to fails. What is causing this issue and how can I prevent it?Environment
What hardware/cloud provider/hypervisor is being used?
Exoscale FCOS Template
Expected Behavior
The command
sudo systemctl restart zincati.service
to restart the service without failingActual Behavior
Reproduction Steps
sudo systemctl restart zincati.service
this my randomly fail and cause a time out.Other Information
I wasn't quiet sure what would be helpful information so I tried to only include what I thought was the most relevant information. But if you would like any other logs or me to test things I am more than happy to oblige.
The text was updated successfully, but these errors were encountered: