Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slight UX issue with broadcast interactive terminal notification about upgrade #554

Closed
dustymabe opened this issue May 14, 2021 · 10 comments · Fixed by #556
Closed

Slight UX issue with broadcast interactive terminal notification about upgrade #554

dustymabe opened this issue May 14, 2021 · 10 comments · Fixed by #556

Comments

@dustymabe
Copy link
Member

Bug Report

I was recently working in a VM (doing dev work) and I see this scroll
by on the interactive terminal:

[core@localhost config]$
Broadcast message from Zincati at Fri 2021-05-14 20:48:42 UTC:
New update 34.20210503.1.1 deployed.
Rebooting into this update in around 10 minutes (if permitted by update strategy).

This is awesome. So glad this exists to give the user some kind of
clue before their system auto-reboots.

However, what happened after this could use some improvement IMHO.

I thought to myself.. I don't want to wait 10 minutes for the reboot
to happen. I want it now and then I can move on with my life. So I:

[core@localhost config]$ sudo reboot
Connection to 192.168.122.197 closed by remote host.
Connection to 192.168.122.197 closed.
[dustymabe@media ~]$
[dustymabe@media ~]$
[dustymabe@media ~]$ ssh core@192.168.122.197
Warning: Permanently added '192.168.122.197' (ECDSA) to the list of known hosts.
Fedora CoreOS 34.20210503.1.0
Tracker: https://github.com/coreos/fedora-coreos-tracker
Discuss: https://discussion.fedoraproject.org/c/server/coreos/

Last login: Fri May  7 20:05:08 2021 from 192.168.122.1
[systemd]
Failed Units: 1
  example.service
[core@localhost ~]$
Broadcast message from Zincati at Fri 2021-05-14 20:58:24 UTC:
New update 34.20210503.1.1 deployed.
Rebooting into this update in around 10 minutes (if permitted by update strategy).

[core@localhost ~]$
[core@localhost ~]$
[core@localhost ~]$ rpm-ostree status
State: idle
AutomaticUpdatesDriver: Zincati
  DriverState: active; update staged: 34.20210503.1.1; reboot delayed due to active user sessions
Deployments:
  ostree://fedora:fedora/x86_64/coreos/next
                   Version: 34.20210503.1.1 (2021-05-12T22:57:51Z)
                    Commit: c229af6df100607f73216ecda14dfc3a71e4c7094f4a7cdd260dc97a5c650a3e
              GPGSignature: Valid signature by 8C5BA6990BDB26E19F2A1A801161AE6945719A39
                      Diff: 2 upgraded

● ostree://fedora:fedora/x86_64/coreos/next
                   Version: 34.20210503.1.0 (2021-05-04T09:22:11Z)
                    Commit: 1b7e0d318f5e78013434236ad17edb8b65297f0e8de3e765562fdedc1a7e52ab
              GPGSignature: Valid signature by 8C5BA6990BDB26E19F2A1A801161AE6945719A39

So performing a reboot manually didn't give me an updated system. I
think this could use better UX. Either of these:

  1. The sudo reboot yields an updated system.
  2. The broadcast message tells the user how to initiate the upgrade+reboot now.

Environment

local libvirt VM

Expected Behavior

A reboot would yield me an updated system.

Actual Behavior

System still not updated after reboot.

@cgwalters
Copy link
Member

The sudo reboot yields an updated system

It's definitely intentional that reboots not from zincati don't trigger updates to ensure predictability.

The broadcast message tells the user how to initiate the upgrade+reboot now.

This seems like a good fix!

@kelvinfan001
Copy link
Member

I believe @cgwalters is right and extra care was actually taken so that manual reboots by the user don't trigger an update even when an update is staged by Zincati.

The broadcast message tells the user how to initiate the upgrade+reboot now.

Unfortunately today there is no way to do initiate the upgrade + reboot immediately 😅
Although there is currently a PR that does exactly that (related issue), there are some small kinks that need to be sorted out first.

For now though, if you log out, Zincati should allow the reboot within a minute, so perhaps the instruction could just be to tell the user to log out.

@lucab
Copy link
Contributor

lucab commented May 17, 2021

I think @kelvinfan001 reply is on the spot: the expectation is that the user finishes their work and logs out.
They are not expected to do anything manually, but they should take note that in 10 minutes the machine may reboot even if they are still logged in.
The broadcast message is not expected to direct them towards rebooting (because a random reboot does not apply any staged update) nor manual intervention (because finalization is overall gated by strategy configuration, not by user intervention).

I guess the current message subtly hinted at "An update is available, please reboot into it"; suggestions on how to reword it in order to (succinctly) avoid such effects and to convey the above points are welcome.

From my side, I guess an improved message could read as:

A new update '34.20210503.1.1' is available and will be automatically applied.
Finalization delayed until all interactive users have logged out (or at most up to 10 minutes).
Please log out in order to let the auto-update process continue.

@cgwalters
Copy link
Member

Suggested text LGTM

@dustymabe
Copy link
Member Author

dustymabe commented May 17, 2021

LGTM. Maybe a log out all active sessions (in case there is more than one).

@dustymabe
Copy link
Member Author

Another option would be to tell the user touch /run/zincati/continue or something.

@kelvinfan001
Copy link
Member

kelvinfan001 commented May 17, 2021

Another option would be to tell the user touch /run/zincati/continue or something.

I think this API would overlap a lot with #498 and #540. #540 attempts to solve the issue around manually telling Zincati to update without being too interactive, but I believe at some point we will still need to implement #498 to fill in the gaps. Am I understanding correctly that #498 would be the equivalent of touch /run/zincati/continue?

@dustymabe
Copy link
Member Author

Am I understanding correctly that #498 would be the equivalent of touch /run/zincati/continue?

not exactly, but it should satisfy the need. Let me explain:

What I'm considering is a user who doesn't want to wait any amount of time. Logging out and waiting a minute is fine, but it's not determinate and slightly annoying. What if I try to log back in too fast and the system hasn't finalized and started to reboot yet? Then I caused the process to start over (waiting). What I want is some instruction to tell the user how to achieve the goal now. #498 would indeed be exactly that because we could just say run zincatictl finalize-update (which would reboot the machine).

To be clear. I'm good with #498 being the answer here. We don't need touch /run/zincati/continue. Your PR over in #556 satisfies the request from this issue.

@lucab
Copy link
Contributor

lucab commented May 18, 2021

Generally speaking, we still aim for a zero-touch upgrade process.

If the current 60s period between retries is still causing friction we can either 1) make them more frequent (naive approach, some delay still exists) or, 2) start watching for logind SessionRemoved dbus signal (smarter approach, no delay involved, but the code is more complex and racing/missing a signal notification incurs larger penalty).

The "finalize now!" verb hinted at above is a large footgun which I personally won't introduce lightheartedly. In particular, if we want to pursue that in the future, we need to zoom out from this simple case (immediate strategy, one interactive user session, all good), and reason about behaviors related to:

  • finalization strategy (e.g. should this verb bypass configured calendar/locks? Or should it maybe error out on non-immediate strategies?)
  • other users logged in (e.g. should this verb ignore all logged in users? and for how long?)
  • error handling (e.g. what to do if rpm-ostree fails to finalize for some reason? What if the agent panic-restart, or the node reboot for some other reason?)

Even if we are not going to introduce that verb now, I think we can sit down and draft a larger design to foresee how the UX of this may look like.

@dustymabe
Copy link
Member Author

Even if we are not going to introduce that verb now, I think we can sit down and draft a larger design to foresee how the UX of this may look like.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants