Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wsl executes any command specified/starts shell before systemd has initialized, when starting instance #8886

Open
cerebrate opened this issue Sep 25, 2022 · 7 comments
Labels

Comments

@cerebrate
Copy link

Is your feature request related to a problem? Please describe.
When a WSL instance with systemd support enabled is started (either with wsl or wsl <command>), the shell inside the instance is started immediately, without waiting for systemd to have initialized (reached running state) and indeed without even waiting for the system dbus to be available. This is far earlier than native Linux provides a login prompt (getty.target and systemd-logind.service are both called upon by multi-user.target.)

This means that if you are using systemd support in a way that involved heavy use of systemd services (and commands relating to same), there is a good chance that initial commands will fail. To pick a trivial example, wsl systemctl is-system-running usually errors out unless the WSL instance has already started.

Describe the solution you'd like
When systemd support is enabled, by default, wsl.exe should hold any further actions (executing commands or starting shells) until systemd has reached the running or degraded state (as described by systemctl is-system-running) to ensure commands behave as they would on native Linux and novel failure modes aren't introduced.

(This should be overridable for debugging purposes, ideally with a wsl.exe command-line option; a suitably lengthy or configurable timeout might also be useful.)

Describe alternatives you've considered
For compatibility with WSL as-it-is, this could be an option to be enabled in /etc/wsl.conf.

@cerebrate
Copy link
Author

cerebrate commented Sep 29, 2022

Still happens under 0.68.2.

@tusharsnx
Copy link

tusharsnx commented Oct 1, 2022

IMO It might lead to undesirable and longer boot time. Some Systemd services can hold the distro from coming up for a really long time, For e.g. systemd-networkd holds the system from becoming 'ready' until the default interface gets an IP address from the DHCP server.

You might want to run apps that do not depend on systemd to just work like before without waiting for is-system-running e.g. vs-code, docker.

A better solution could be to introduce a new flag maybe --wait-for-systemd or --after-systemd ? To tell wsl to run the command after systemd has completed its job.

$ wsl -d <distro> --wait-for-systemd <command>

We also have to think about what happens when systemd is not enabled within the distro, should wsl let the command run or give a friendly error saying systemd is disabled?

@cerebrate
Copy link
Author

Hrm. I'm not sure that's sufficient, because on one hand, there are apps that optionally depend on systemd and will behave differently with a partial-systemd (and bits of other services, dbus &c.) than they would with no systemd; and on the other hand, even apps that don't care about systemd per se may depend on things that systemd would then change out from underneath them, such as binfmts, sysctl parameters, securityfs rules, etc. ad naus.

For example, systemd does a whole bunch of filesystem mounting and altering on the assumption that non-startup things aren't running at the same time - even an app that doesn't know anything about systemd may protest if /tmp suddenly switches out from beneath it in mid-execution, for example. This is also likely to include things like WSLg (per however #8888 shakes out), the state of the user runtime directory and overall user session (per however #8842 and #8918 work out), and so forth.

So while I do recognize your valid point here, I think that on the whole allowing these early-running commands (certainly by default - I might propose the opposite flag --no-wait-for-systemd instead, complete with disclaimer) would create untold opportunities for obscure, hard-to-diagnose race conditions.

On the other hand, this has interesting synergies with #8854 . I think we can probably agree that the two good options for just throwing out wsl <command> rather than starting a shell are either no systemd or always-running/long-running systemd; initializing a whole system (fully) just to run one command and shut down again is terribly wasteful, and part-initializing it just to kill it again before it's even finished starting is even worse.

We also have to think about what happens when systemd is not enabled within the distro, should wsl let the command run or give a friendly error saying systemd is disabled?

On this, I'd say that all systemd-related flags, be they --wait-for-systemd or --no-wait-for-systemd should fail if systemd is not enabled, on the grounds that specifying them when it isn't indicates that the situation isn't what the command-runner thinks it is and there may be other assumptions he needs to rethink as well.

@tusharsnx
Copy link

tusharsnx commented Oct 2, 2022

I see, we should keep things intuitive here. The user would expect systemd services to have completed their job before a shell is provided to them. If booting is terribly slow, they should be able to use systemd-analyze blame or systemd-analyze critical-chain to find what caused the delay. This way wsl does not have to work around trying to stay 'fast'.

The race conditions have already seeded a lot of problems which does not look like they have the same issue underneath at the glance making them hard-to-debug.

We should not let systemd and /init go around and keep manipulating directories and files at the same time. This also means that we should not have such services running that manipulate each other files and directories (I might be wrong here) in the first place.

Should we ask users to provide the details about mount/services on their system with debug logs (if systemd is enabled) ?

@AdelinaHartzler
Copy link

Is your feature request related to a problem? Please describe. When a WSL instance with systemd support enabled is started (either with wsl or wsl <command>), the shell inside the instance is started immediately, without waiting for systemd to have initialized (reached running state) and indeed without even waiting for the system dbus to be available. This is far earlier than native Linux provides a login prompt (getty.target and systemd-logind.service are both called upon by multi-user.target.)

This means that if you are using systemd support in a way that involved heavy use of systemd services (and commands relating to same), there is a good chance that initial commands will fail. To pick a trivial example, wsl systemctl is-system-running usually errors out unless the WSL instance has already started.

Describe the solution you'd like When systemd support is enabled, by default, wsl.exe should hold any further actions (executing commands or starting shells) until systemd has reached the running or degraded state (as described by systemctl is-system-running) to ensure commands behave as they would on native Linux and novel failure modes aren't introduced.

(This should be overridable for debugging purposes, ideally with a wsl.exe command-line option; a suitably lengthy or configurable timeout might also be useful.)

Describe alternatives you've considered For compatibility with WSL as-it-is, this could be an option to be enabled in /etc/wsl.conf.

@cerebrate
Copy link
Author

Should we ask users to provide the details about mount/services on their system with debug logs (if systemd is enabled)?

For my systemd-enabling hacks, I have generally been asking for the output of a systemctl status to help in debugging. It probably wouldn't hurt to add that to the list.

@razamatan
Copy link

related: #11078

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants