daemon: Make actually initiating reboot asynchronous #2848

cgwalters · 2021-05-21T15:46:57Z

Alternative to #2845
which moved the reboot invocation into the client (which I think
still makes sense in some cases).

Previously we were starting the reboot before we're returned
a success reply to the client, so it could see the daemon killed
by SIGTERM and hence emit a spurious error.

(Really in this case any 3 of the calling process, the client
binary or the daemon could be killed in any order, so it's messy
but this just closes one race)

For cleanliness we reject starting any new transactions after the daemon has
started a reboot.

Closes: #1348

Alternative to coreos#2845 which moved the `reboot` invocation into the client (which I think still makes sense in some cases). Previously we were starting the reboot before we're returned a success reply to the client, so it could see the daemon killed by `SIGTERM` and hence emit a spurious error. (Really in this case any 3 of the calling process, the client binary or the daemon could be killed in any order, so it's messy but this just closes one race) For cleanliness we reject starting any new transactions after the daemon has started a reboot. Closes: coreos#1348

jlebon

Nice!

jlebon · 2021-05-21T17:28:50Z

Leaving open for @lucab or @kelvinfan001 since they weighed in on the first iteration.

kelvinfan001 · 2021-05-22T03:01:27Z

src/daemon/rpmostreed-daemon.cxx

+   */
+  const char *child_argv[] = { "systemctl", "reboot", NULL };
+  g_autoptr(GError) local_error = NULL;
+  if (!g_spawn_sync (NULL, (char**)child_argv, NULL, (GSpawnFlags)(G_SPAWN_CHILD_INHERITS_STDIN | G_SPAWN_SEARCH_PATH),


What happens if there is a systemd inhibitor lock that is blocking shutdown? Would the systemctl reboot command spawned this function fail later when the reboot request is attempted, or error immediately and the daemon exits?
If the former, when used in finalize_deployment(), would we have an unlocked staged deployment since we'd unlink() _OSTREE_SYSROOT_RUNSTATE_STAGED_LOCKED but not actually reboot and return a success message to the client even if we later fail to actually reboot?
So it seems like both of these situations are not ideal. (I may also just have misunderstood systemd-inhibit and these bits of code entirely)

Good point! So...Per this point I think really the only correct thing here is for e.g. zincati to not try to finalize if there are inhibitors. I can't think of a way to solve this without adding some sort of new inhibitor API to do something more read/write lock like so that one could do "grab an exclusive lock only when the last (read) locker has dropped".

That said honestly I was just playing with this locally and I assume I'm doing something wrong, but I can't get systemd-inhibit to block reboots. I'm doing systemd-inhibit sleep 1h in one shell, and systemctl reboot in another instantly reboots.

(OK some investigation shows I need to use systemctl reboot --check-inhibitors=yes which...seems like a bug, the man page claims auto will do that on a tty, but that's not what I'm seeing)

Hmm, so the --check-inhibitors option was made available in a recent-ish commit, and is not in FCOS yet; but even so, reboots should be inhibited successfully if called on a tty. I tried locally with systemd-inhibit sleep 1h &, then if I do systemctl reboot, it is indeed inhibited.

systemctl poweroff Operation inhibited by "sleep 1h" (PID 2653 "systemd-inhibit", user root), reason is "Unknown reason". Please retry operation after closing inhibitors and logging out other users. Alternatively, ignore inhibitors and users with 'systemctl poweroff -i'.

However, another issue I see right now regarding systemd-inhibit is that I haven't been able to get my inhibitor locks to inhibit rpm-ostree-driven reboots, even if I set the lock as root. I assume this is because the rpm-ostree daemon is privileged and the systemctl reboot called by the daemon is non-interactive, and so in e.g. systemd 246 (before systemd/systemd#18210), any inhibitors are ignored by rpm-ostree. So IIUC, systemd-inhibitors won't really work with rpm-ostree today, at least not until systemd 248, after which we can add the --check-inhibitors=yes option?

OK yeah if you follow issue links there's a huge amount of debate about this.

kelvinfan001

LGTM for solving the daemon/client race!

cgwalters

Hum. Honestly, it seems to me that inhibitors were pretty "desktop focused" before --check-inhibitors=yes existed.

That said, we can actually just check the inhibitors in our process even for earlier systemd versions. That's pretty easy to do, it's just a DBus call. Now one might argue that that's racy, but it's also racy to do it in systemctl reboot so that's fine.

cgwalters · 2021-05-25T16:58:35Z

OK merging since I think this will help, and the inhibitor behavior won't be regressed by it (inhibitors need more + orthogonal work).

kelvinfan001 · 2021-05-25T17:52:22Z

Looks like --check-inhibitors isn't what we want either, since it also checks for inhibitor locks in delay mode, which to me sounds like overkill, since programs that take the delay lock probably don't want to actually totally inhibit shutdowns, as they simply have some cleanup tasks to do once they're notified that a shutdown is requested.
So I think a better path would be to do the inhibitor checking ourselves in rpm-ostree, as @cgwalters mentioned, and only check for locks in block mode. If we detect that there exists locks in block mode, then rpm-ostree shouldn't even attempt the reboot, and return an error to the CLI so consumers (e.g. Zincati) could handle it and be in charge of retries.
Luckily, we don't need to handle locks in delay mode, since the behaviour for delay vs block is actually different. For block, the shutdown attempt immediately fails and returns 1, but for delay, the shutdown is queued by systemd and e.g. systemctl reboot returns 0 and waits for the delay lock to expire (defaults to 5 seconds) or for the delay lock to be released by the program that took it.

cgwalters · 2021-05-25T18:21:00Z

Agree, we probably want systemctl reboot --check-inhibitors=block in systemd actually for the same reason.
But for now it should be OK to replicate it.

cgwalters mentioned this pull request May 21, 2021

Move reboot handling to client #2845

Closed

jlebon approved these changes May 21, 2021

View reviewed changes

kelvinfan001 reviewed May 22, 2021

View reviewed changes

kelvinfan001 approved these changes May 25, 2021

View reviewed changes

cgwalters commented May 25, 2021

View reviewed changes

cgwalters merged commit 9244d7a into coreos:main May 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

daemon: Make actually initiating reboot asynchronous #2848

daemon: Make actually initiating reboot asynchronous #2848

cgwalters commented May 21, 2021

jlebon left a comment

jlebon commented May 21, 2021

kelvinfan001 May 22, 2021 •

edited

Loading

cgwalters May 24, 2021

cgwalters May 24, 2021

kelvinfan001 May 25, 2021

cgwalters May 25, 2021

kelvinfan001 left a comment

cgwalters left a comment

cgwalters commented May 25, 2021

kelvinfan001 commented May 25, 2021 •

edited

Loading

cgwalters commented May 25, 2021

daemon: Make actually initiating reboot asynchronous #2848

daemon: Make actually initiating reboot asynchronous #2848

Conversation

cgwalters commented May 21, 2021

jlebon left a comment

Choose a reason for hiding this comment

jlebon commented May 21, 2021

kelvinfan001 May 22, 2021 • edited Loading

Choose a reason for hiding this comment

cgwalters May 24, 2021

Choose a reason for hiding this comment

cgwalters May 24, 2021

Choose a reason for hiding this comment

kelvinfan001 May 25, 2021

Choose a reason for hiding this comment

cgwalters May 25, 2021

Choose a reason for hiding this comment

kelvinfan001 left a comment

Choose a reason for hiding this comment

cgwalters left a comment

Choose a reason for hiding this comment

cgwalters commented May 25, 2021

kelvinfan001 commented May 25, 2021 • edited Loading

cgwalters commented May 25, 2021

kelvinfan001 May 22, 2021 •

edited

Loading

kelvinfan001 commented May 25, 2021 •

edited

Loading