Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PSA: systemd in initramfs #72401

Closed
kirelagin opened this issue Nov 1, 2019 · 42 comments
Closed

PSA: systemd in initramfs #72401

kirelagin opened this issue Nov 1, 2019 · 42 comments

Comments

@kirelagin
Copy link
Member

This is something like an RFC, but more of a heads-up: we need systemd in initramfs. The advantages are:

  • Smoother transition into stage-2 from stage-1 with a lot of interesting state passed.
  • systemd knows some cool tricks we could use, such as proper unmounting or using a tmpfs/overlaysfs to make the system stateless.
  • Some people say it’s faster, because it’s systemd: C, parallel execution, socket activation, etc.
  • Apparently there are some open issues that will be more-or-less trivially resolved by having a systemd-based initramfs (things like networking, I’m not sure, but I definitely saw some).
  • Likely, a lot of other good things, I’m not sure, suggestions welcome.

Therefore I started nixos-init. It is more of a skeleton rather than something working, however it already really starts systemd in initramfs, and this systemd even tries to mount root (if you add root= to the command line), so, I suppose, this thing will actually boot a system (not that I tried) if mounting root does not involve anything “interesting” (such as encryption or LVM). Once again, at this point it is just a prototype, but it already has a logo, so you can be sure it’s legit.

The plan is roughly the following (in no particular order):

  • Polish rootfs detection. It seems that the fstab generator is stupid enough to be unable to generate sensible units for the root filesystem from anything that is not the kernel command line arguments, and I am not very happy with it, so I was hoping I’d be able to find a work around that will not involve writing the .mount unit by hand.
  • Have a “minimalistic” version of systemd without 100 Mb of useless libs in its closure (should be fairly straightforward but will require tweaking the expression, which is currently too rigid).
  • Make the thing boot mobile-nixos instead of its current boring stage-1 shell script.
  • Achieve feature parity with current stage-1. I don’t even know what this involves, but, I suspect, it’s basically just LVM (add the unit file?), cryptsetup (uncomment cryptsetup in the systemd expr?), and then something about the iso (pass systemd.volatile= on the command line?).

Let me know what you think, and feel free to join the fun.

@kirelagin kirelagin added the 0.kind: bug Something is broken label Nov 1, 2019
@kirelagin
Copy link
Member Author

Ok, lol, it is not a bug, it is a feature, but I don’t think there is a template for them 🤔.

@kirelagin
Copy link
Member Author

cc @lheckemann @samueldr

@lheckemann lheckemann removed the 0.kind: bug Something is broken label Nov 1, 2019
@lovesegfault
Copy link
Member

Why not just use somethin like Dracut?

@kirelagin
Copy link
Member Author

kirelagin commented Nov 3, 2019

@lovesegfault The short answer is, basically, “for the same reasons why we do not just use Portage”.

Dracut is essentially a “build system” written in Shell that takes a bunch of modules and combines them. Nix does the same and does it better.
I considered wrapping a NixOS-like module system around Dracut, but that just felt weird in the end. I also want it to be very flexible, build packages differently depending on settings (to minimise the closure size), and also, as I mentioned in the first message, have it working on my phone – I am not very closely familiar with Dracut, but I imagine that could be tricky.

@kirelagin
Copy link
Member Author

kirelagin commented Nov 3, 2019

And since I am already writing in this issue, here is a small status update: I tweaked the systemd expression to make optional dependencies really optional. This brought the systemd’s closure size down to 68 Mb, 27 of which is glibc. I am now trying to build it with musl, and I am almost there, all the dependencies successfully cross-compiled, and now I am just staring at systemd’s compilation errors that are a result of differences between glib and musl.

@kirelagin
Copy link
Member Author

Current closure:

36K     /nix/store/17xr28nkr5ag10w386pxykvxy9kfq8wl-libcap-2.27-lib
84K     /nix/store/dhvf0bhjynghdpay0b6lpds1msxal4z8-bzip2-1.0.6.0.1-bin
92K     /nix/store/xvjxwc2hqddhk6g33m96dv66z7ch2mad-bzip2-1.0.6.0.1
152K    /nix/store/m1qb538b5ic2j822zyjyj5h3i04lyca0-zlib-1.2.11
180K    /nix/store/ryq3kb1a508yxy9z4jpqj5n55l78jplz-gzip-1.10
404K    /nix/store/4raxvk6vj57q9h5z4zzvq9ildayvrp1n-kmod-26
412K    /nix/store/j02gkp91nnjlbp6j58lgaj5gx41f4cci-xz-5.2.4
1004K   /nix/store/y0czvd9pkrng1b45fd4hl4qqpy9yb0ka-busybox-1.30.1
1.4M    /nix/store/zavn4np1jvm79f0rafkv0p1mrag09qkz-bash-4.4-p23
1.7M    /nix/store/cybq1magn5ij8r9nvcb6sqy3i7lpgwcs-util-linux-2.33.2
1.8M    /nix/store/hmm0k2pqs5fkff3p6gzraigxg0g3xlkm-systemd-243-lib
1.8M    /nix/store/vnxng3k0f7wrs297x9hxh9csvbinh1qf-coreutils-8.31
2.6M    /nix/store/9ixhrn06mq3hxsqq9hwvv2kjah4fzx99-glibc-2.27-bin
4.2M    /nix/store/b1i93xpjmajq42azv5jzpmkhqbgh7mkn-shadow-4.7
4.2M    /nix/store/x8ahicv59dnlybiasb7676hcvz63fmz7-kbd-2.0.4
5.4M    /nix/store/bgr5kldcqanf7bhgh6n42mr5b4vv6cxs-util-linux-2.33.2-bin
11M     /nix/store/wbwy0z8vmrl7ysfh4wjx4aaq9p0ld7x9-systemd-243
29M     /nix/store/qn76sklvyalzw9ilnxz6sh0020gl2qn6-glibc-2.27
65M     total

I think it’s good enough for now, I am a little tired of optimising this one to be honest. If someone want to help, you are welcome.

  • The obvious next step is to get rid of Bash and replace it with ash from busybox. It appears there because the /bin/bash is being substituted with it in shebangs and error messages (!) in a bunch of packages. I haven’t looked yet into possible options here.
  • Would be nice to shrink util-linux further (here it is already utillinuxMinimal), I think they only need (u)mount, libmount, and libblkid from it (while the fattest things in there are fdisk), but I am not sure.
  • kbd is a candidate for a surgery that will remove all keymaps but qwerty, because who needs them in initramfs anyway.
  • coreutils has to be replaced with busybox. At the moment it doesn’t work because their ln does not support --relative, but there are systemd patches on the internet for this.

Regarding glibc vs. musl, there are two possible approaches (and both of them have already been implemented by people on the internet):

  1. Patch musl to make it more compatible with glibc (1).
  2. Patch systemd to make it compatible with musl (2).

IIUC, systemd developers have been hostile to the idea of supporting other libc’s, but, on the other hand, I’m pretty sure I saw some musl-related patches being applied on the mailing list.

musl developers are also not huge fans of improving glibc compatibility. Their position is that they want to be compliant with the standards, so they will merge anything only if it does not contradict the standards (yes, apparently, there are things in glibc that are no in line with POSIX, but do not quote me on this), and if it looks like it might become the standard soon. So, as one could imagine, getting something merge into libc is tricky as well.

I am not sure what move would be right here, as I have both sides have valid arguments. I think the best approach would be a mix of the two: contribute to systemd changes that revert it to officially standardised APIs, where it makes sense; contribute to musl where it does not contradict existing standards; provide a compatibility layer in all other cases. Maybe, I’ll get back to it one day, but for now I am not entirely convinces that it is a good idea.

@lheckemann
Copy link
Member

kbd is a candidate for a surgery that will remove all keymaps but qwerty, because who needs them in initramfs anyway.

People who want to type their encryption passphrase on their familiar layout, I think this is worth keeping or at least keeping the user's configured consoleKeymap.

That aside, great work!

@arianvp
Copy link
Member

arianvp commented Nov 4, 2019

Hey @kirelagin I haven't looked at this stuff yet, but it's sounds promising!

Me and @flokli have been eyeing taking a stab at this for quite some time already. We did some preliminaries recently, like getting systemd built with cryptsetup support (#66856).

We hang around a lot on #nixos-systemd in freenode. Please feel free to join that channel. It's populated by people who usually do systemd maintenance work for NixOS. I'm pretty sure everybody there is very willing to collaborate and brainstorm on this!

@kirelagin
Copy link
Member Author

@arianvp Hey, that’s cool! Tbh I haven’t been doing a lot of IRC recently, but that channel sounds like something I should be watching now.

@arianvp
Copy link
Member

arianvp commented Nov 4, 2019

I can recommend checking out https://matrix.org which has a bridge to Freenode if you are rusty around IRC. It has a more friendly user interface :)

e.g. you can join through: https://riot.im/app/#/room/#freenode_#nixos-systemd:matrix.org

@Lassulus
Copy link
Member

Lassulus commented Nov 5, 2019

Hey, I'm very interested in this, whats with features like copytoram or findiso? does systemd support them natively or do we have to reimplement them?

@kirelagin
Copy link
Member Author

kirelagin commented Nov 5, 2019

Sorry, I am not really familiar with how the live cd works, I plan to look into it slightly later. But as I mentioned in my first comment, systemd natively knows how to assemble a stateless system from the underlying root mountpoint and overlayfs (which can be in RAM) on top of it – I haven’t looked into the livecd stuff, so I don’t know, but isn’t it better than copytoram?

I have no idea what findiso is, but if it is something about discovering the rootfs, then I don’t think systemd does anything like that.

@Lassulus
Copy link
Member

Lassulus commented Nov 5, 2019

ah, that sounds indeed sufficient. We can check it later.
findiso finds the iso from the initramfs (you give it the path on the grub kernel-line and it finds the iso from that) implementation is here: https://github.com/NixOS/nixpkgs/pull/69214/files#diff-7f6db71d0673ff55ffb22b0d8d86bf78R451

@arianvp
Copy link
Member

arianvp commented Nov 5, 2019

I guess findiso= is similar to the root= kernel parameter, is it not? What is differnet about findiso= compared to it?

copytoram is to be handled by the systemd.volatile= parameter

@arianvp
Copy link
Member

arianvp commented Nov 5, 2019

Actually from further reading the specs of systemd.volatile it does something slightly different than I thought seems slightly useless in the NixOS usecase :'( as it assumes /usr is where the system resides

directory, with only /usr mounted into it from the configured root file system, in read-only mode. This way the system operates in fully stateless mode, with all configuration and state reset at boot and lost at shutdown, as /etc and /var will be served from the (initially unpopulated) volatile memory file system.

Anyhow, we can for sure set up mount rules in initrd that mount /sysroot as a tmpfs and copies the contents from the ISO. I don't think it's much work

@kirelagin
Copy link
Member Author

Um, my plan was to set it to overlay?

@kirelagin
Copy link
Member Author

I guess findiso= is similar to the root= kernel parameter, is it not? What is differnet about findiso= compared to it?

Oh, judging from the code, it takes only the name of the file, and then it searches for it on all devices that it can find and mount. I imagine, there are better ways to do this through integration with udev or something like this, to only search removable drives or I dunno.

@arianvp
Copy link
Member

arianvp commented Nov 5, 2019

Ah you're right about systemd.volatile=overlay. That will indeed work.
I will think about the copytoram stuff a bit more. The benefit of copytoram is that you can remove the ISO after bootup. which you can't do with systemd.volatile=overlay

After reading the git history of @Lassulus I now better understand the purpose of findiso. It's used to implement this feature: https://www.supergrubdisk.org/wiki/Loopback.cfg

I think in systemd initrd it can be implemented using the following kernel params, if iso_path is an absolute path to the ISO file.

root=${iso_path} rootflags=lo rootfstype=iso9660

From what I understand from https://www.supergrubdisk.org/wiki/Loopback.cfg the iso_path should be an absolute path to /boot (which is already mounted in the initrd), so I'm a bit confused why we're mounting each and every device in blkid to find the ISO file in question. What is the reason for that @Lassulus ? Is that actually the expected behaviour? To me it sounds sane to limit the scope of this feature to iso images that reside on the /boot partition

If the search behaviour is desired, then we should keep the shell script around for finding the ISO image and then mounting it as a systemd oneshot service perhaps.

@Lassulus
Copy link
Member

Lassulus commented Nov 6, 2019

the /boot device is not mounted in the initrd anymore (at least not in my tests) so I had to mount every partition to find the iso which we actually booted from. If there is a better way to find the original iso I'm very happy with the alternative

@flokli
Copy link
Contributor

flokli commented Nov 8, 2019

the /boot device is not mounted in the initrd anymore

How should it? The kernel doesn't know how the on-cd bootloader previously read from the boot file system, so looping over each available file system in the initrd is the only thing it can do. This could be made a bit smarter by having the bootloader obtain the uuid of that filesystem, and passing it in too, but findiso= is made the way it is, so it's probably not on us to change that.

@flokli
Copy link
Contributor

flokli commented Nov 8, 2019

Ah, and @arianvp, that cmdline is wrong. findiso= would point to a path relative to the inside of the boot disk partition, so something like isos/nixos-….iso, whereas root= is a block device of the real systems root.
The bash script inside stage 1 looks for isos/nixos-….iso file inside all partitions, and symlinks it to /dev/root, which is the default for root=.

@schmittlauch
Copy link
Member

schmittlauch commented Dec 9, 2019

As the top post talks about an initramfs while the now proposed PR #74842 modifies the current initrd of NixOS: Are people aware that initramfs and initrd are different approaches at the same problem but not the same thing?

An initrd is a virtual compressed whole file system, while an initramfs is a compressed cpio archive.

And are there any strong preferences for or against one of them? I just noticed that most of my used other distributions (openSUSE, Arch, Gentoo) have switched to using an initramfs.

Edit:

$ zcat /boot/kernels/527b7y8rfds7r6qkmsg1bn4q2k4iqyyp-initrd-linux-4.19.87-initrd |file -
/dev/stdin: ASCII cpio archive (SVR4 with no CRC)

So it turns out that NixOS' current initrd already is an initramfs and everyone is using the wrong terminology?!

@bjornfor
Copy link
Contributor

bjornfor commented Dec 9, 2019

So it turns out that NixOS' current initrd already is an initramfs and everyone is using the wrong terminology?!

Yes.

Even the NixOS options say 'initrd' and not 'initramfs' (e.g. boot.initrd.availableKernelModules).

@stale
Copy link

stale bot commented Jun 6, 2020

Thank you for your contributions.

This has been automatically marked as stale because it has had no activity for 180 days.

If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity.

Here are suggestions that might help resolve this more quickly:

  1. Search for maintainers and people that previously touched the related code and @ mention them in a comment.
  2. Ask on the NixOS Discourse.
  3. Ask on the #nixos channel on irc.freenode.net.

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jun 6, 2020
@flokli
Copy link
Contributor

flokli commented Jun 6, 2020

This is still being worked on, it's just a lot of work.

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jun 6, 2020
@flokli
Copy link
Contributor

flokli commented Jul 14, 2020

Heads up to everyone interested in here - the necessary lvm derivation refactoring was merged, and #66856 should add the cryptsetup bits. Help welcome in proposing a test for #66856 (comment) ;-)

@TLATER
Copy link
Contributor

TLATER commented Jan 12, 2021

What are the current outstanding tasks on this? It looks like the preliminary changes to the systemd package were merged, does that mean "only" mapping the existing boot options to systemd is left?

@flokli
Copy link
Contributor

flokli commented Jan 12, 2021

@TLATER systemd cryptsetup support did land, so in theory, we should be able to make use of systemd-ask-password and friends to ask for crypto volumes.

However, we'd also need to take a look at the boot.initrd options (and modules setting them), and how stage 1/2 would look like with systemd-in-initramfs.

If you want, you can start giving this a try. I propose joining #nixos-systemd, where a lot of the systemd efforts are discussed :-)

@weilbith
Copy link

Any updates on this? 🙃

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/removing-persistent-boot-messages-for-a-silent-boot/14835/1

@TLATER
Copy link
Contributor

TLATER commented Sep 3, 2021

freenode happened, so I think #nixos-systemd is probably gone, and I never managed to get around to it. I'd love to know if the conversation has moved somewhere else.

@arianvp
Copy link
Member

arianvp commented Sep 3, 2021

It moved to a simiarly named channel in the NixOS matrix instance https://matrix.to/#/#community:nixos.org

@flokli
Copy link
Contributor

flokli commented Sep 3, 2021 via email

@arianvp
Copy link
Member

arianvp commented Sep 3, 2021

Are they bridged? If not can we get them bridged?

@flokli
Copy link
Contributor

flokli commented Sep 3, 2021 via email

@TLATER
Copy link
Contributor

TLATER commented Sep 4, 2021

There is a general bridge for libera.chat, though it'd be nice to get one of those fancy matrix-irc cross servers. IRC through matrix is a faff.

@kirelagin
Copy link
Member Author

Hey, everyone! So, back then, this work required changes to systemd, and they conflicted with some other changes to the systemd expression that were happening at that time. As a result, it all got stuck and I moved on to other things. However, I am still very interested in this project (except that now that I got a Dell XPS 13 and realised how horrible everything is, I think I might need to start fixing things at an even lower level) and I hope to get back to it soon.

To prove my commitment (to myself, in the first place), I went ahead and created a Matrix channel #sensible-initramfs:matrix.org (I don’t believe in IRC anymore). To be clear: I am not actively looking for collaborators right now since nothing is clear yet, but I will be very happy to discuss anything with anyone interested.

@samueldr
Copy link
Member

samueldr commented Sep 6, 2021

cc @ElvishJerricco

@kirelagin if you hadn't seen, #120015 exists.

Hopefully you can collaborate :).

@arianvp
Copy link
Member

arianvp commented Apr 3, 2022

#164943 has been merged. Can this be closed?

@TLATER
Copy link
Contributor

TLATER commented Apr 3, 2022

#164943 has been merged. Can this be closed?

That PR implements this as an opt-in, and doesn't fully implement everything the old initramfs is capable of yet. It'll take more work to get to an actual systemd-in-initramfs, but it's a very, very good first step :)

Not sure if this can be closed, but I think it should perhaps be replaced. I feel like there should be a tracking issue for an actual full conversion that also includes a list of things that are still missing. This issue has become a bit chaotic over the years, and hardly tracks the original work anymore.

@lovesegfault
Copy link
Member

IMHO we'd be better served by closing this issue and opening a new tracking issue for the features missing in the merged effort.

The final feature is making it the default.

@arianvp
Copy link
Member

arianvp commented Apr 6, 2022

We have a project tracking progress here: https://github.com/NixOS/nixpkgs/projects/51

@arianvp arianvp closed this as completed Apr 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests