Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/switch-to-configuration: add new implementation #308801

Merged
merged 1 commit into from
May 13, 2024

Conversation

jmbaur
Copy link
Contributor

@jmbaur jmbaur commented May 3, 2024

Description of changes

This adds an implementation of switch-to-configuration that allows for closer interaction with the lifecycle of systemd units by using DBus APIs directly instead of using systemctl. It is disabled by default, but can be enabled by specifying { system.switch = { enable = false; enableNg = true; }; }.

The goal of this change is to provide an implementation of switch-to-configuration that is compatible (bug-for-bug, feature-for-feature) with the current perl script. This means that it is expected that all current NixOS VM tests under ./nixos/tests will pass without any of the test scripts being modified. Assuming the changes here land, hopefully this implementation is found to be easier to maintain and more testable, allowing for easier modification of it in the future and eventually deprecation of the perl script.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.05 Release Notes (or backporting 23.05 and 23.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` labels May 3, 2024
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 labels May 3, 2024
@jmbaur jmbaur force-pushed the switch-to-configuration-rs branch 2 times, most recently from 211f98e to 36f7214 Compare May 3, 2024 14:25
@jmbaur jmbaur marked this pull request as ready for review May 3, 2024 14:28
@jmbaur jmbaur requested a review from dasJ as a code owner May 3, 2024 14:29
@jmbaur jmbaur force-pushed the switch-to-configuration-rs branch 3 times, most recently from 6e2c9bb to deee302 Compare May 3, 2024 14:48
@jmbaur
Copy link
Contributor Author

jmbaur commented May 3, 2024

@ofborg test switchTestNg

@jmbaur jmbaur force-pushed the switch-to-configuration-rs branch from deee302 to 8e920d6 Compare May 3, 2024 16:46
@jmbaur
Copy link
Contributor Author

jmbaur commented May 3, 2024

@ofborg test switchTestNg

@dasJ
Copy link
Member

dasJ commented May 3, 2024

Stellar work, thank you! I haven't looked at the Rust code in detail yet but scrolling over it quickly looked great. (Edit: The lack of unwrap() gives me a good feeling about the code :)

One thing to keep in mind is the existence of #307562.

Also, pretty much the last thing I wanted to do with Perl STC (apart from getting the sockets to work) was to port the fstab handling to the same logic the .mount unit uses.
The reason is that the .mount handling is newer and uses more sophisticated (if you can call it that) logic to decide on whether to reload or restart.
While this would make both implementations easier (probably only the Rust one since I don't think I will find the energy to factor out a new subroutine in Perl), this would also get my unfinished test cases for fstab handling to nixpkgs. This way, we could make sure that this implementation doesn't introduce regressions and also that it works the exact same as the Perl one.
What do you think about that? The only obvious downside I can see is that this would change the existing stc and I don't think we should rush this before 24.05 branch-off.

@jmbaur
Copy link
Contributor Author

jmbaur commented May 3, 2024

One thing to keep in mind is the existence of #307562.

I did see that issue. I haven't tested against #307562 (comment) yet, but I'll do so shortly and paste the results. I started this PR as a 1-to-1 port of the perl code, so I'm "hoping" that this issue is also present in this PR, then they can be addressed together.

port the fstab handling to the same logic the .mount unit uses.

That sounds great, I believe the logic was if any of the mount options change (and nothing else changed), reload the unit, otherwise restart it?

I don't think we should rush this before 24.05 branch-off.

Definitely no rush

nixos/modules/system/activation/switchable-system.nix Outdated Show resolved Hide resolved
nixos/modules/system/activation/switchable-system.nix Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely split this file into multiple smaller ones.

@jmbaur jmbaur force-pushed the switch-to-configuration-rs branch from 8e920d6 to 6c333a3 Compare May 9, 2024 21:32
@ofborg ofborg bot added 8.has: package (new) This PR adds a new package 11.by: package-maintainer This PR was created by the maintainer of the package it changes 10.rebuild-darwin: 1-10 10.rebuild-darwin: 1 and removed 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin labels May 9, 2024
@jmbaur
Copy link
Contributor Author

jmbaur commented May 9, 2024

@ofborg test switchTestNg

@jmbaur jmbaur force-pushed the switch-to-configuration-rs branch from 6c333a3 to 1391794 Compare May 10, 2024 03:41
@alyssais
Copy link
Member

We need to be very careful about rewriting fundamental parts of NixOS in Rust, because we're going to break NixOS on platforms that Rust doesn't support, but do have some committed NixOS users, like MIPS, which had a lot of work put into it over the last couple of years. (Most scripting languages don't have this problem, so something like Python might be a better fit, but compiled languages, especially those using LLVM, will.)

This adds an implementation of switch-to-configuration that allows for
closer interaction with the lifecycle of systemd units by using DBus
APIs directly instead of using systemctl. It is disabled by default, but
can be enabled by specifying `{ system.switch = { enable = false; enableNg = true; }; }`.
@jmbaur jmbaur force-pushed the switch-to-configuration-rs branch from 1391794 to 32bf051 Compare May 10, 2024 23:33
@jmbaur
Copy link
Contributor Author

jmbaur commented May 10, 2024

@alyssais totally agree, though I would assume LLVM would give us great platform support? After fixing the derivation (using dbus interface definitions from buildPackages.systemd instead of systemd), I was able to cross-compile this implementation using pkgsCross.mips64el-linux-gnuabi64.switch-to-configuration-ng, were there other platforms of concern?

@alyssais
Copy link
Member

alyssais commented May 11, 2024

Rust supports cross-compilation to more platforms than it supports running the compiler on, but the latter is what would be important for switch-to-configuration. Otherwise, we'd be completely removing the ability to switch configurations on those systems, since that system wouldn't be cross-compiled.

So systems supported by Nixpkgs that would potentially lose out on being able to self-host NixOS from this change would be:

nix-repl> lib.subtractLists rustc.meta.platforms lib.platforms.linux
[
  "armv5tel-linux"
  "armv7a-linux"
  "m68k-linux"
  "microblaze-linux"
  "microblazeel-linux"
  "mips-linux"
  "mips64-linux"
  "mips64el-linux"
  "mipsel-linux"
  "riscv32-linux"
  "s390-linux"
]

Some of these, like microblaze and s390, likely can't even build all of NixOS anyway. MIPS definitely has actual users (one use case I think is NixOS on a router, but I don't really know) and I'm pretty sure armv7 does too, although that might be armv7l and it wouldn't surprise me if most of those systems were cross-compiled.

@jmbaur
Copy link
Contributor Author

jmbaur commented May 11, 2024

MIPS definitely has actual users (one use case I think is NixOS on a router, but I don't really know) and I'm pretty sure armv7 does too, although that might be armv7l and it wouldn't surprise me if most of those systems were cross-compiled.

I run nixos on my armv7 router :) though I will admit I cross compile to it, as just doing an eval on it is too expensive.

I guess it's important to narrow down the focus to the list of users that run on the platforms listed above that don't cross compile their systems, since users that do cross-compile would be fine. While I would personally be surprised if users were doing native builds on many of those platforms listed above, I don't have much knowledge outside of the arm and x86 space and I definitely don't want to make any arbitrary limits on the platforms nixos can be built from.

@alyssais
Copy link
Member

I think MIPS64 is the main one to worry about.

@dasJ
Copy link
Member

dasJ commented May 12, 2024

Is this even something to worry about? Afaik, NixOS has no official MIPS64 support and for these platforms I honestly think it would be okay to require cross-compilation.

Also, while the new implementation is there now, it's still opt-in and I don't see why we wouldn't keep the old variant around for a bit after it has be come the default. This would also give the LLVM people more time (years probably) to fix the MIPS targets if they even intend doing so.

@alyssais
Copy link
Member

It depends whether this would break anything for anybody in practice. I've drawn attention to this PR in places where users using those platforms are likely to be reachable (#exotic:nixos.org, #networking:nixos.org), and nobody has come forward, so it seems like that's a no, and so that's fine. It's just important to make sure of that when making this kind of decision.

@flokli
Copy link
Contributor

flokli commented May 13, 2024

As written a bunch of times above, it's probably fine to introduce this additionally, even flip the default at some point. rustc platform support is gonna catch up at some point, and we can still keep the perl script around for these usecases - if people for these architectures really need it, they at some point need to step up with maintainership of the perl flavour.

Having the activation script in something not-perl is such a long-term quality of life improvement, and empowers more people to fix it, so I'm heavily in favor of this. Thanks ❤️

Copy link
Contributor

@pluiedev pluiedev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less Perl in NixOS is definitely the way to go forward — following the footsteps of #270727 — and I'm excited to see how this will turn out!

@flokli
Copy link
Contributor

flokli commented May 13, 2024

Let's get this in - this is behind an explicit feature flag, so a no-op for anyone not explicitly enabling it.

@flokli flokli merged commit 2a2f796 into NixOS:master May 13, 2024
21 checks passed
@stigtsp
Copy link
Member

stigtsp commented May 13, 2024

Less Perl in NixOS is definitely the way to go forward [..]

Fewer interpreters to reduce closure size, I hope you mean? :)

Anyway, good effort on the rewrite @jmbaur!

@jmbaur jmbaur deleted the switch-to-configuration-rs branch May 13, 2024 14:32
@nyabinary
Copy link
Contributor

Using networking.useNetworkd with this makes the switch hang up on systemd-networkd-wait-online.service.

restarting sysinit-reactivation.target
the following new units were started: sysinit-reactivation.target, systemd-tmpfiles-resetup.service
warning: the following units failed: systemd-networkd-wait-online.service
× systemd-networkd-wait-online.service - Wait for Network to be Configured
     Loaded: loaded (/etc/systemd/system/systemd-networkd-wait-online.service; enabled; preset: enabled)
    Drop-In: /nix/store/nlj3x6lpwgvkmnpj5zd8k7z6ynrgr5q1-system-units/systemd-networkd-wait-online.service.d
             └─overrides.conf
     Active: failed (Result: exit-code) since Thu 2024-05-16 09:35:39 EDT; 266ms ago
       Docs: man:systemd-networkd-wait-online.service(8)
    Process: 5664 ExecStart=/nix/store/9cxd17xnmw0bi8n4nf722ysqj2bjlh8s-systemd-255.4/lib/systemd/systemd-networkd-wait-online --timeout=120 (code=exited, status=1/FAILURE)
   Main PID: 5664 (code=exited, status=1/FAILURE)
         IP: 0B in, 0B out
        CPU: 4ms

May 16 09:33:39 nyan systemd[1]: Starting Wait for Network to be Configured...
May 16 09:35:39 nyan systemd-networkd-wait-online[5664]: Timeout occurred while waiting for network connectivity.
May 16 09:35:39 nyan systemd[1]: systemd-networkd-wait-online.service: Main process exited, code=exited, status=1/FAILURE
May 16 09:35:39 nyan systemd[1]: systemd-networkd-wait-online.service: Failed with result 'exit-code'.
May 16 09:35:39 nyan systemd[1]: Failed to start Wait for Network to be Configured.
warning: error(s) occurred while switching to the new configuration

@samhug
Copy link
Contributor

samhug commented May 16, 2024

Using networking.useNetworkd with this makes the switch hang up on systemd-networkd-wait-online.service.

I think the perl implementation also hangs if it has to wait for systemd-networkd-wait-online.service to timeout?

I include the following in my config to make the service log which interface it's waiting on:

systemd.services.systemd-networkd-wait-online.environment.SYSTEMD_LOG_LEVEL = "debug";

@nyabinary
Copy link
Contributor

Using networking.useNetworkd with this makes the switch hang up on systemd-networkd-wait-online.service.

I think the perl implementation also hangs if it has to wait for systemd-networkd-wait-online.service to timeout?

I include the following in my config to make the service log which interface it's waiting on:

systemd.services.systemd-networkd-wait-online.environment.SYSTEMD_LOG_LEVEL = "debug";

Not for me, I tested it on both and the switch happened on the Perl implementation but hangs on this implementation.

@jmbaur
Copy link
Contributor Author

jmbaur commented May 16, 2024

Made an issue here

@nh2
Copy link
Contributor

nh2 commented May 29, 2024

@jmbaur

First: Great. I think the Perl-written key parts of NixOS are some of the weakest parts of it: Very difficult to read/understand, very easy to footgun.

Now just install-grub.pl has to go!

Next:

I recommend to make the following improvements to make this whole thing more likely to succeed:

  • Add a README.md to pkgs/by-name/sw/switch-to-configuration-ng/ that explains the project's goals, design, and usage instructions. I had to dig up this PR to figure out what the hell switch-to-configuration-ng is that I found mentioned in some newer issue.

  • Create a NixOS maintainers team for people who are interested in this, which can be pinged on issues.

  • Add to the beginning of switch-to-configuration.pl a comment:

    # Note there is a new alternative implementation "switch-to-configuration-ng"
    # that aims to be compatible with this one for now.
    # If you make changes here, consider making them there as well or
    # let the maintainers (INSERT TEAM NAME CREATED ABOVE), so they can mirror them.
    # While that's not required, it would certainly be helpful.

@pluiedev
Copy link
Contributor

pluiedev commented Jun 2, 2024

Now just install-grub.pl has to go!

Well, I just made a WIP port of install-grub.pl (also in Rust) in one (very productive) Saturday: https://github.com/pluiedev/install-grub
I'll try to figure out if I can get some inputs to test this script with tomorrow, which could be a bit challenging since I don't use GRUB myself ;)

@jmbaur
Copy link
Contributor Author

jmbaur commented Jun 6, 2024

@nh2 Thanks for the guidance! Made a follow up PR here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 8.has: package (new) This PR adds a new package 10.rebuild-darwin: 1-10 10.rebuild-darwin: 1 10.rebuild-linux: 1-10 11.by: package-maintainer This PR was created by the maintainer of the package it changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.