Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daemon: Make actually initiating reboot asynchronous #2848

Merged
merged 1 commit into from
May 25, 2021

Conversation

cgwalters
Copy link
Member

Alternative to #2845
which moved the reboot invocation into the client (which I think
still makes sense in some cases).

Previously we were starting the reboot before we're returned
a success reply to the client, so it could see the daemon killed
by SIGTERM and hence emit a spurious error.

(Really in this case any 3 of the calling process, the client
binary or the daemon could be killed in any order, so it's messy
but this just closes one race)

For cleanliness we reject starting any new transactions after the daemon has
started a reboot.

Closes: #1348

Alternative to coreos#2845
which moved the `reboot` invocation into the client (which I think
still makes sense in some cases).

Previously we were starting the reboot before we're returned
a success reply to the client, so it could see the daemon killed
by `SIGTERM` and hence emit a spurious error.

(Really in this case any 3 of the calling process, the client
 binary or the daemon could be killed in any order, so it's messy
 but this just closes one race)

For cleanliness we reject starting any new transactions after the daemon has
started a reboot.

Closes: coreos#1348
Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@jlebon
Copy link
Member

jlebon commented May 21, 2021

Leaving open for @lucab or @kelvinfan001 since they weighed in on the first iteration.

*/
const char *child_argv[] = { "systemctl", "reboot", NULL };
g_autoptr(GError) local_error = NULL;
if (!g_spawn_sync (NULL, (char**)child_argv, NULL, (GSpawnFlags)(G_SPAWN_CHILD_INHERITS_STDIN | G_SPAWN_SEARCH_PATH),
Copy link
Member

@kelvinfan001 kelvinfan001 May 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if there is a systemd inhibitor lock that is blocking shutdown? Would the systemctl reboot command spawned this function fail later when the reboot request is attempted, or error immediately and the daemon exits?
If the former, when used in finalize_deployment(), would we have an unlocked staged deployment since we'd unlink() _OSTREE_SYSROOT_RUNSTATE_STAGED_LOCKED but not actually reboot and return a success message to the client even if we later fail to actually reboot?
So it seems like both of these situations are not ideal. (I may also just have misunderstood systemd-inhibit and these bits of code entirely)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! So...Per this point I think really the only correct thing here is for e.g. zincati to not try to finalize if there are inhibitors. I can't think of a way to solve this without adding some sort of new inhibitor API to do something more read/write lock like so that one could do "grab an exclusive lock only when the last (read) locker has dropped".

That said honestly I was just playing with this locally and I assume I'm doing something wrong, but I can't get systemd-inhibit to block reboots. I'm doing systemd-inhibit sleep 1h in one shell, and systemctl reboot in another instantly reboots.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(OK some investigation shows I need to use systemctl reboot --check-inhibitors=yes which...seems like a bug, the man page claims auto will do that on a tty, but that's not what I'm seeing)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so the --check-inhibitors option was made available in a recent-ish commit, and is not in FCOS yet; but even so, reboots should be inhibited successfully if called on a tty. I tried locally with systemd-inhibit sleep 1h &, then if I do systemctl reboot, it is indeed inhibited.

systemctl poweroff
Operation inhibited by "sleep 1h" (PID 2653 "systemd-inhibit", user root), reason is "Unknown reason".
Please retry operation after closing inhibitors and logging out other users.
Alternatively, ignore inhibitors and users with 'systemctl poweroff -i'.

However, another issue I see right now regarding systemd-inhibit is that I haven't been able to get my inhibitor locks to inhibit rpm-ostree-driven reboots, even if I set the lock as root. I assume this is because the rpm-ostree daemon is privileged and the systemctl reboot called by the daemon is non-interactive, and so in e.g. systemd 246 (before systemd/systemd#18210), any inhibitors are ignored by rpm-ostree. So IIUC, systemd-inhibitors won't really work with rpm-ostree today, at least not until systemd 248, after which we can add the --check-inhibitors=yes option?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK yeah if you follow issue links there's a huge amount of debate about this.

Copy link
Member

@kelvinfan001 kelvinfan001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for solving the daemon/client race!

Copy link
Member Author

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum. Honestly, it seems to me that inhibitors were pretty "desktop focused" before --check-inhibitors=yes existed.

That said, we can actually just check the inhibitors in our process even for earlier systemd versions. That's pretty easy to do, it's just a DBus call. Now one might argue that that's racy, but it's also racy to do it in systemctl reboot so that's fine.

@cgwalters
Copy link
Member Author

OK merging since I think this will help, and the inhibitor behavior won't be regressed by it (inhibitors need more + orthogonal work).

@cgwalters cgwalters merged commit 9244d7a into coreos:main May 25, 2021
@kelvinfan001
Copy link
Member

kelvinfan001 commented May 25, 2021

Looks like --check-inhibitors isn't what we want either, since it also checks for inhibitor locks in delay mode, which to me sounds like overkill, since programs that take the delay lock probably don't want to actually totally inhibit shutdowns, as they simply have some cleanup tasks to do once they're notified that a shutdown is requested.
So I think a better path would be to do the inhibitor checking ourselves in rpm-ostree, as @cgwalters mentioned, and only check for locks in block mode. If we detect that there exists locks in block mode, then rpm-ostree shouldn't even attempt the reboot, and return an error to the CLI so consumers (e.g. Zincati) could handle it and be in charge of retries.
Luckily, we don't need to handle locks in delay mode, since the behaviour for delay vs block is actually different. For block, the shutdown attempt immediately fails and returns 1, but for delay, the shutdown is queued by systemd and e.g. systemctl reboot returns 0 and waits for the delay lock to expire (defaults to 5 seconds) or for the delay lock to be released by the program that took it.

@cgwalters
Copy link
Member Author

Agree, we probably want systemctl reboot --check-inhibitors=block in systemd actually for the same reason.
But for now it should be OK to replicate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race condition when using --reboot flag
3 participants