Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.2-rc3 issue with autostarting HVMs at boot. #2302

Closed
JoeThielen opened this issue Sep 8, 2016 · 10 comments
Closed

3.2-rc3 issue with autostarting HVMs at boot. #2302

JoeThielen opened this issue Sep 8, 2016 · 10 comments
Labels
C: core T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.

Comments

@JoeThielen
Copy link

Qubes OS version (e.g., R3.1):

R3.2-rc3

Affected TemplateVMs (e.g., fedora-23, if applicable):

N/A


Expected behavior:

After setting the autostart preference to true (and verifying it was set), upon reboot that HVM should boot.

Actual behavior:

Per the logs it tries to boot, but sometimes fails. Sometimes it starts, but it looks like it starts before the XFCE GUI actually starts, causing it to be lost somewhere. I can SSH to it, but it's no where (graphically) to be found on the screen. If it does start like this it shows up in the Qubes VM manager with a yellow state instead of green.

Steps to reproduce the behavior:

  • After setting the autostart preference to true (and verifying it was set).
  • Reboot

General notes:

I have three HVMs set to boot on startup, in addition to sys-net and sys-firewall. Sometimes one of them will start just fine. But I've not seen them all start normally since installing 3.2-rc3. I do not recall if I had this issue with 3.2-rc2 as I didn't get that far in testing. I did not have this issue with 3.1 to my recollection.

Attached is the output of "journalctl --system". At about 16:24:41 you can see where it tries to start all the VMs. In this particular boot, the tpf-keycloak HVM started, but in the background (yellow state on the Qubes VM Manager). The tpf-proxy HVM fails to start. tpf-firewall is a ServiceVM, a copy of sys-firewall, I don't recall ever having problems with it.


Related issues:

qubes-journalctl-log.txt

@andrewdavidwong andrewdavidwong added T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. C: core labels Sep 8, 2016
@andrewdavidwong andrewdavidwong added this to the Release 3.2 milestone Sep 8, 2016
@JoeThielen
Copy link
Author

I forgot to mention, but these are CentOS 7 (1511 Minimal) HVMs.

Also, if I have them start using System Tools -> Session and Startup -> Application Autostart they start just fine on bootup.

@JoeThielen
Copy link
Author

If I set up an HVM to start via the XFCE Application Autostart, then when I shutdown or reboot Qubes, it hangs in some kind of systemd loop. I am not well versed in systemd at all. Basically the machine will not actually shutdown or reboot, it finally ends up with a bunch of repeated errors like this:

device-mapper: remove ioctl on qubes_dom0-root failed: Device or resource busy.
Command failed

Then finally:

Rebooting.

But it doesn't actually shutdown or reboot.

@JoeThielen
Copy link
Author

After diving into systemd (something I'm not at all well versed in), I've developed a systemd service that seems to be working well, for both bootup and shutdown. However, after learning a little more, I've just discovered the systemd "system" versus "user" service units. Mine is a system unit, which maybe only works because my system has autologin enabled. Should this be a "user" unit service instead I wonder?

Anyway, I guess I'm very confused. The initial issue I reported on was then when setting autostart to true, it caused an issue because the HVM started before the GUI. Is there some other way I should be doing this, that will reliably start the service on user login and shutdown the HVM properly (before user logout! otherwise the above device-mapper error occurs).

@marmarek
Copy link
Member

marmarek commented Sep 9, 2016

Starting a VM before user login should be fine - GUI should be automatically connected as soon as you login.
This is done using side effect of qvm-run --all true, but qvm-run will refuse to operate on a VM without qrexec installed. I guess your HVM do not have it, right?
In such a case, we need something better to handle post-login GUI reconnection...

@JoeThielen
Copy link
Author

JoeThielen commented Sep 9, 2016

You are correct, the HVM does not have qrexec installed.

Maybe having a new feature with a separate checkbox / preference setting would be helpful here? That would have it create a different systemd file so it loads / shuts down appropriately.

Attached is a systemd file I created for an HVM named tpf-proxy. I put this in /usr/lib/systemd/system then enabled it using systemctl enable tpf-proxy-hvm.

This is also set up to restart the HVM on the event it crashes. I'm not sure if it works without user auto-login, as my system automatically logs me in.

As I noted I'm not very familiar with systemd so I don't know if it would be more appropriate to create a user systemd file versus as system systemd file. Mine seems to work at the moment, so I will leave it unless someone more knowledgeable than me in this area can answer that question.

tpf-proxy-hvm.service.txt

@JoeThielen
Copy link
Author

I went round and round with this. The previous systemd service script was not perfect and did not always work, especially on system shutdown.

I'm currently exploring separate avenues for reliable HVM startup and shutdown for a system with automatic user login via lightdm.

There are two, separate, systemd scripts:
#1: For system startup there is a "system" systemd script to handle startup. This script requires "user-1000.slice" and must be started after that. That way, it's started after a user is logged in and the GUI is running. This script also monitors the HVM (via a PID file) in case of HVM shutdown/crash and will restart it. There is a fairly long restart wait (45 seconds) due to needing to wait a fair amount of time during system shutdown so it doesn't try to restart it again!
NOTES:

  • It may be possible for this script to handle shutting down the VM too. However, my highest concern here is that, in addition to reliable startup, that system shutdown be reliable. If an HVM is running during system reboot/shutdown, 3.2-rc3 shutdown crashes and the machine will not reboot or shutdown fully. So if you manually stop the systemd service, which could stop the HVM, but then manually start the HVM, the script will not be "Active" and will not reliably shutdown the HVM!
  • I originally tried to have this run as a systemd "user" service. However, I found issues with networking rules being automatically applied in this situation!
  • This script gets placed in /usr/lib/systemd/system and enabled with systemctl enable tpf-proxy-hvm.service.

#2: For reliable HVM shutdown I created a separate systemd "user" script/service. I found that in order for this script to run 100% of the time on user logout/system shutdown I needed to run it as a systemd user service which is listed to run before anything like exit.target, reboot.target, shutdown.target, or halt.target.
NOTES:

  • This service must be enabled by the dom0 terminal when NOT in sudo nor su (i.e., root). i.e., via a command like: systemctl --user enable tpf-proxy-hvm-shutdown.service when the script is placed in /usr/lib/systemd/user.

So far this seems reliable. I will continue to test it. I'm also wondering if the shutdown systemd user script could actually be a systemd system script instead and just require user-1000.slice so it might still shutdown before the user is logged out. I will experiment with this.

If anyone else has any hints here I'd be happy to hear them. I am not a systemd expert by any means and am just fudging around until I get a working solution. If there is a better, "proper" way to do it, I'd be more than happy to hear it. But it must be reliable!

Attached are my systemd scripts.

tpf-proxy-hvm.service.txt
tpf-proxy-hvm-shutdown.service.txt

@JoeThielen
Copy link
Author

I tried to make my shutdown service a systemd "system" service instead of a "user" service and it did not work correctly. I set it to require the user-1000.slice, but apparently that didn't do it. The system did not shutdown/reboot fully. So back to having one script run as a "system" service and the other as a "user" service. That setup appears to be reliable so far.

@petertyyseng
Copy link

Same situation for autostart Window HVM and standalone HVM. I've to killed them and restart again.

@JoeThielen
Copy link
Author

@petertyyseng my systemd scripts above have been very reliable in my testing. It would be interesting to see if they functioned for Windows HVMs as well.

@andrewdavidwong
Copy link
Member

This issue is being closed because:

If anyone believes that this issue should be reopened, please let us know in a comment here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: core T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Projects
None yet
Development

No branches or pull requests

4 participants