Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Redesign of grml-live #267

Open
mika opened this issue Jan 2, 2025 · 11 comments
Open

RFC: Redesign of grml-live #267

mika opened this issue Jan 2, 2025 · 11 comments

Comments

@mika
Copy link
Member

mika commented Jan 2, 2025

Redesign of grml-live

Main tasks of grml-live

grml-live is a build system for creating a Grml and Debian based Linux Live system.

grml-live includes the following main stages:

  • FAI dirinstall: FAI (Fully Automatic Installation) does the chroot related work with hooks/scripts provided by grml-live (related to grml_chroot directory)
  • mksquashfs: generates the compressed squashfs file which provides the chroot in compressed format (related to grml_cd directory)
  • xorriso: generates the ISO itself, based on the output by mksquashfs and some further bootmanger related stuff handled by grml-live (related to grml_iso directory)

We don't plan to replace any of mksquashfs nor xorriso, though consider getting rid of FAI (with its dirinstall feature and its fcopy tool that we currently use within grml-live).

Rational for getting rid of FAI

FAI is a tool for unattended mass deployment of Linux, which is available since 1999. Especially the class system of FAI was a compelling and promising reason to build grml-live on top of FAI. grml-live therefore was based on FAI since its early beginnings in 2007.

grml-live was designed for building Debian based live systems and relied on FAI's dirinstall feature underneath. The way live systems are built might look similar to disk based installations. But with grml-live we have specific needs for our release process that FAI was never designed for. We also provide ISO specific customization options and need to comply with GPL needs. Neither of those requirements are fullfilled by FAI.

We also had to deal with behavior changes within FAI (e.g. shell.log vs. scripts.log: 74c4bce, make-fai-nfsroot.conf: 75e65f3, ignoring packages: 121b348).

As with any software, we hit bugs also with FAI, like e.g. #760133 ('mount --make-runbindable' breaks underlying device for unrelated operations), #989547 (ROOTCMD relies on specific unshare features), and #1056151 (which makes /usr/sbin/init vanish in /usr-move conditions).

For certain issues we implemented our own workarounds (e.g. 5c3e795, 4e93b8e, e5e4578, ef74e87, 8352df0 AKA https://bugs.debian.org/928981, e9da330, 4e027c7, 8cfed16, 4bc598a).

Usage of FAI within containers also isn't straight forward, we need to e.g. patch hardcoded behavior in /usr/lib/fai/subroutines (see

def fixup_fai():
).

We also hit problems with FAI's base tar files approach (see #143 (comment) + https://bugs.debian.org/881829), which we relied on for performance reasons (as bootstrapping a Debian system via https://wiki.debian.org/Debootstrap is quite slow).
Throughout the last years mmdebstrap became the go-to implementation for bootstrapping Debian systems, being a very fast and also customizable tool which we want to utilise for our needs and also avoid need for base files.
FAI hardcodes usage of debootstrap within fai-make-nfsroot (also see e.g. f365415), preventing us to switch to mmdebstrap therefore.

Last but not least, we also have several workarounds in place for dealing with the way FAI handles logfiles and exit codes, that we would like to get rid of.

Re-Design Goal

  • Minimize dependencies on 3rd party, most notably FAI
  • Build grml-live on top of mmdebstrap
  • Get rid of deprecated workarounds and backwards compatibility for anything older than oldstable
  • Use established and common Debian practices as much as possible
  • Keep FAI's class approach (GRMLBASE, GRML_FULL, GRML_SMALL,...) where useful
  • Remain as backwards-compatible for users of grml-live users as possible, but where necessary and sensible get rid of outdated customs and practices

What's going underneath FAI within grml-live usage

We invoke FAI from within grml-live via:

BUILD_ONLY="$BUILD_ONLY" BOOTSTRAP_ONLY="$BOOTSTRAP_ONLY" GRML_LIVE_CONFIG="$CONFIGDUMP" fai $VERBOSE \
              -C "$FAI_CONF_DIR" -s "file:///$GRML_FAI_CONFIG" -c"$CLASSES" \
              -u "$HOSTNAME" "$FAI_ACTION" "$CHROOT_OUTPUT" $FAI_ARGS

With a grml-live command like this:

 ./grml-live -s trixie -a amd64 -c GRMLBASE,GRML_SMALL,AMD64 -o /home/mika/build/grml-live-2025-01 -v 2025.01-rc0

The resulting fai dirinstall command line then looks like this:

BUILD_ONLY= BOOTSTRAP_ONLY= GRML_LIVE_CONFIG=/tmp/tmp.rxcvirnXxm WAYBACK_DATE= 
  fai -C /tmp/tmp.8tMsaOEkbQ \
      -s file:////home/mika/src/grml/grml-live/config \
      -cDEBIAN_TRIXIE,GRMLBASE,GRML_SMALL,AMD64 \
      -u grml \
      dirinstall /home/mika/build/grml-live-2025-01/grml_chroot

grml-live with FAI then underneath invokes:

Calling task_confdir
Calling task_setup
Calling task_defclass
Calling task_defvar
Calling task_action
Calling task_dirinstall
Calling task_extrbase
  -> Calling debootstrap --exclude=info,tasksel,tasksel-data,isc-dhcp-client,isc-dhcp-common --include=aptitude --arch amd64 trixie /home/mika/build/grml-live-2025-01/grml_chroot http://deb.debian.org/debian
Calling task_debconf
Calling task_repository
Calling hook: updatebase.GRMLBASE
Calling task_updatebase
Calling hook: instsoft.GRMLBASE
Action dirinstall of FAI (hooks/instsoft.GRMLBASE) via grml-live running
Diverting update-grub executable
Adding 'local diversion of /usr/sbin/update-grub to /usr/sbin/update-grub.distrib'
Diverting grub-probe executable
Adding 'local diversion of /usr/sbin/grub-probe to /usr/sbin/grub-probe.distrib'
instsoft.GRMLBASE    OK.
Calling task_instsoft
Calling task_configure
GRMLBASE/01-packages OK.
GRMLBASE/02-run      OK.
GRMLBASE/03-get-sources OK.
GRMLBASE/05-hostname OK.
[...]
GRMLBASE/99-finish-grml-build OK.
GRML_SMALL/90-update-alternatives OK.
Calling task_tests
Calling task_finish
Calling task_savelog
Calling task_faiend

grml-live with -b (build only) invokes:

Calling task_confdir
Calling task_setup
Calling task_defclass
Calling task_defvar
Calling task_action
FAI_ACTION: softupdate
Calling task_softupdate
Performing FAI system update. All data may be overwritten!
Calling task_debconf
[...]
Calling task_repository
[...]
Calling hook: updatebase.GRMLBASE
Action softupdate of FAI (hooks/updatebase.GRMLBASE) via grml-live running
fcopy: destination etc/apt/sources.list.d/debian.list remains unchanged
fcopy: destination etc/apt/sources.list remains unchanged
[...]
updatebase.GRMLBASE  OK.
Calling task_configure
GRMLBASE/01-packages OK.
GRMLBASE/02-run      OK.
GRMLBASE/03-get-sources OK.
GRMLBASE/05-hostname OK.
[...]
GRMLBASE/99-finish-grml-build OK.
GRML_SMALL/90-update-alternatives OK.
Calling task_tests
Calling task_finish
Calling task_savelog
Calling task_faiend

grml-live with -u (update) invokes:

Calling task_confdir
Calling task_setup
Calling task_defclass
Calling task_defvar
Calling task_action
FAI_ACTION: softupdate
Calling task_softupdate
Calling task_debconf
Calling task_repository
Calling hook: updatebase.GRMLBASE
Action softupdate of FAI (hooks/updatebase.GRMLBASE) via grml-live running
[...]
updatebase.GRMLBASE  OK.
Calling hook: instsoft.GRMLBASE
Action softupdate of FAI (hooks/instsoft.GRMLBASE) via grml-live running
Diverting update-grub executable
Adding 'local diversion of /usr/sbin/grub-probe to /usr/sbin/grub-probe.distrib'
Hit:1 http://security.debian.org/debian-security trixie-security InRelease
Get:2 http://deb.debian.org/debian trixie InRelease [175 kB]
Get:3 http://deb.debian.org/debian trixie/main Sources [10.3 MB]
Hit:4 http://deb.grml.org grml-live InRelease
[...]
instsoft.GRMLBASE    OK.
Calling task_instsoft
Calling task_configure
GRMLBASE/01-packages OK.
GRMLBASE/02-run      OK.
GRMLBASE/03-get-sources OK.
GRMLBASE/05-hostname OK.
[...]
GRMLBASE/99-finish-grml-build OK.
GRML_SMALL/90-update-alternatives OK.
Calling task_tests
Calling task_finish
Calling task_savelog
Calling task_faiend

Migration plan

Currently existing FAI related files + scripts in grml-live and their purpose:

  • config/debconf/GRMLBASE: debconf preseeding (idea: provide script which invokes debconf-set-selections within mmdebstrap?)
  • config/class/GRMLBASE.var: setting environment variables (idea: should be set by default by grml-live and be customizable via grml-live.conf?)
  • config/files: system specific configuration for e.g. apt, initramfs-tools + systemd
  • config/grml/squashfs-excludes: grml-live specific configuration to exclude files during mksquashfs execution (idea: move to etc/grml/?)
  • config/hooks/instsoft.ZFS: build zfs modules (idea: convert into a script for usage within mmdebstrap?)
  • config/hooks/updatebase.GRMLBASE: mostly workarounds to handle FAI in different build modes (idea: most shouldn't be relevant without FAI, relevant steps could be handled directly within grml-live or mmdebstrap scripts)
  • config/hooks/instsoft.GRMLBASE: in build-only mode (-b) doesn't do anything, otherwise only legacy actions or FAI workarounds take place (idea: most shouldn't be relevant without FAI, relevant steps could be handled directly within grml-live or mmdebstrap scripts)
  • config/hooks/savelog.LAST.source: FAI specific for log files related to error.log (idea: shouldn't be relevant with deprecation of FAI?)
  • config/package_config/*: which Debian packages should be installed in which grml flavour, also depending on Debian release (idea: investigate whether to keep as is or reconsider file format, also research what options could we have with apt/aptitude to easily ignore packages on-demand? how to implement class approach on our own?)
  • config/scripts: executing tools to apply configuration changes, updates etc (idea: drop scripts that only invoke FAI's fcopy to install configuration files, provide $ROOTCMD + $target via grml-live to keep scripts functional as-is

Feedback

Constructive comments, feedback and suggestions highly welcome!

@zeha
Copy link
Member

zeha commented Jan 2, 2025

As stated before, I'd like to explore using a "config .deb" approach for most of the work that FAI previously did.
In various PRs I've tried to cut down the scripts to a) only do things that are still needed and b) try to use approaches that are easy to maintain and hopefully easy to migrate into such an approach. Example: where possible stop overwriting files but install into conf.d/ directories instead.
Additionally such an approach should investigate installing relevant conf.d/ files into /usr instead, to avoid installing dpkg conffiles.

@zeha
Copy link
Member

zeha commented Jan 2, 2025

Thanks for the write-up on what grml-live does today, but I'd like to ask: what should it do in the future?

I think it's obvious that it needs to continue to create an installed ISO from scratch.

But do we need to keep the update feature -u and build-only features -b? Do we want a new "release mode" instead, which would be some form of "build-only" and "update select packages, ignoring classes etc"?

@mika
Copy link
Member Author

mika commented Jan 2, 2025

Thanks for the write-up on what grml-live does today, but I'd like to ask: what should it do in the future?

ACK, that's the reason for RFC! 😄

I think it's obvious that it needs to continue to create an installed ISO from scratch.

ACK

But do we need to keep the update feature -u and build-only features -b?

So what I see as use cases and what I'm using them for myself:

  • -b - to build the ISO without updating the chroot -> useful if you had a failing build, managed to fix things:tm: manually within grml_chroot and finally want to get a resulting ISO, OR if you play around with templates (for isolinux/grub/branding/...) and don't need much grml_chroot related actions
  • -B to build the ISO without touching the chroot (skips cleanup) -> especially useful for playing around with templates (for isolinux/grub/branding/...) and don't need much grml_chroot related actions (this is mainly about speedups though, not as important as -b to me AFAICS)
  • -u - to update an existing chroot instead of rebuilding it from scratch -> you have a working base (as in: grml_chroot) and only want to apply updates to what's already there (though that's related to the release mode you mentioned, see below!)

It feels to me that -u could be similar to to the new release mode we have in mind and might be replaced by this then?

The -b use case is something similar to -e (extract ISO and squashfs contents from ISO) and I think it's nice when working on new Grml releases to have something like that (also e.g. when you want VirtualBox, ZFS,.. or other tricky stuff as part of your ISO, which might not be always fully or easy automatable).

I'm not 100% sure whether the -B one is really needed, especially once we know how the release mode could look like.

Do we want a new "release mode" instead, which would be some form of "build-only" and "update select packages, ignoring classes etc"?

Yes! This new release mode is definitely related to my python based release script that you saw in a paste of mine and something I'm looking forward. :)

@zeha
Copy link
Member

zeha commented Jan 2, 2025

I think supporting -u is going to be a lot of work; if we could not, that'd be nice.

Dunno about -b/-B, when I did the template updates recreating the chroot was fast enough 😅 Maybe we can see how a refactor would look like and then decide how easy it is to keep them.

@akorn
Copy link
Contributor

akorn commented Jan 8, 2025

FWIW, I had also been thinking about a simplified FAI replacement (that would only be able to deploy files, not install Debian packages, but support a class system like FAI's).

Specification of a FAI replacement

Goals:

  • retain the notion of a "class" system
    • nice to have: support hierarchical classes
  • simple things should be simple and intuitive
  • complex things should be possible
  • diagnostic output should be concise but easy to understand
  • should abort on errors immediately
  • should be modular enough to be easy to extend or partially override
  • to the extent possible, rely on the filesystem itself; no database or meta-configfiles should be needed in most cases
  • the base feature set should be easy to implement; it should then be possible to add more niche features on top

Specific features wanted

  • must have: "deploy" files, directories etc. based on a hierarchical class system
    • must have: copy node (file, directory, fifo, device, socket) from class template to destination
    • should have: delete node
      • can be approximated by replacing files with empty ones or symlinks to /dev/null, but let's aim for expressiveness
    • nice to have: deploy by symlinking (see below for why)
    • nice to have: deploy by hardlinking
      • can include creating a hardlink to an existing node elsewhere, which may exist independently of us
    • nice to have: don't overwrite destination, just change its properties (owner, permissions, ACLs, timestamps, attributes, xattrs)
    • nice to have: run script and deploy its output as if it had been a source file
    • nice to have: append to file (so class1 can deploy something and class2 can append to it)
    • nice to have: support templating (e.g. Jinja2)
      • however, if you want/need that, maybe use ansible instead?
  • nice to have: check differences between what exists and what would be deployed
  • nice to have: create class from current system
    • based on index of "interesting" files
      • perhaps informed by dpkg, so it can omit unmodified configfiles?
    • finding classes that would install what is currently installed where possible
    • create new class from files that don't occur with this content in any existing class
    • challenge: make the new class minimal by finding what order existing classes have to be deployed in
      • hierarchical classes could be leveraged to facilitate this (just assume the hierarchy goes from least specific to most specific; don't try to optimize for new class "size" beyond this)

Why not just use ansible?

  • We have a fairly specific use-case whose scope is much smaller than what ansible can do
  • We want to have a system that is more intuitive, simpler and faster than ansible

Why support symlinking

This enables a use-case where you keep a (possibly sparse) checkout of the class hierarchy in e.g. /etc/_config and actual configfiles like /etc/hosts or /etc/apt/sources.list are just symlinks pointing into this checkout.
All files that this particular system needs to have customized would be in the checkout, allowing them to be revision tracked without trying to revision track the entirety of /etc (and thus tracking upstream changes to locally unmodified default configfiles as well).
This use case is not relevant for grml-live, but would make the tool useful for a wider audience (it's the use I have in mind).

Template structure

Naïve example with nonhierarchical classes, to demonstrate an issue:

etc/
	hosts/
		BASE
		CLASS1
		CLASS2
	resolv.conf/
		BASE
		CLASS1
		CLASS2
		CLASS3
	apt/
		sources.list.d/
			BASE/	# empty directory
			debian-unstable.list/
				SID

Problem: how do we know whether the BASE in etc/apt/sources.list.d/BASE is the name of a class, in which case we need to deploy etc/apt/sources.list.d/ as an empty directory when using the BASE class, or a file inside sources.list.d/ that some class would ship? FAI seems to avoid this problem by only being able to deploy files, not directories.

Possible approaches:

  • Have a separate template hierarchy for files and directories
    • Not intuitive and straightforward
  • Have an index or manifest of all stuff each class deploys, similar to tmpfiles.d(5) (perhaps even the same format)
    • ignoring the "Age" column,
    • adding a "Type" letter for hardlinking and upstream it; then we can just use systemd-tmpfiles for the deployment
      • too bad we can't completely auto-generate the tmpfiles.d-formatted manifest
        • also: when deploying as a symlink farm, symlink targets should be able to be relative AND point into the checkout, which could be anywhere
      • the diff and class generation features would still need to be implemented separately
  • add metadata to the individual entries (e.g. as xattrs or svn properties or whatever)
    • we could have separate metadata backends/plugins to support different solutions, if necessary
    • con: not straightforward (you have to examine each file separately to see what attributes it has)
  • add separate hierarchy levels to the template directory to differentiate classes from contents, like this:
etc/
	apt/
		sources.list.d/
			CLASSES/
				BASE/	# now it's clear that in the BASE class, sources.list.d should be deployed as an empty directory
			debian-unstable.list/
				CLASSES/
					SID
  • con: it gets confusing if we want to deploy an entry called literally CLASSES.
    • not such a big deal? Also, the name could be different, or configurable, or xattrs used for disambiguation in the rare cases the problem arises
  • pro: based on such a hierarchy, we could auto-generate a tmpfiles.d config that would then deploy everything the way we want

Alternative layout, to better support sparse checkouts at the cost of duplicating subtrees:

CLASSES/
	BASE/
		etc/
			apt/
				sources.list.d/	# empty directory
			resolv.conf		# file
	SID/
		/etc/
			apt/
				sources.list.d/
					debian.unstable.list	# file
	CLASS1/
		etc/
			hosts			# file
			resolv.conf
	CLASS2/
		etc/
			hosts			# file
			resolv.conf
  • pro: much more intuitive (files are files, directories are directories)
  • con: different from the FAI approach

Metadata

It seems to me that attempting to rely solely on the filesystem for instructions like "append" or "delete" or "symlink" or "hardlink" would require unintuitive, complex contortions (such as "magical" filename extensions that the logic then removes, such as "resolv.conf.append", except when they're escaped).

Instead, it seems better to handle non-default cases (where files don't just need to be copied to the destination, overwriting whatever is there) using metadata.

I'm deliberately not specifying the metadata backend at this point, only the structure of the data itself; I only assume that whatever way is chosen to store the metadata will support storing arbitrary key-value pairs.

Ideally, it should be possible to leverage metadata inheritance if it's supported by the backend (set something on a directory, perhaps at a class level, and all child objects under that hierarchy inherit the value).

Inheritable metadata should also be settable on the command line, both as a default that individual entries can override, and as a forced value that overrides entry metadata.

Special instructions and their metadata encoding:

  • append:<0|1> -- if 0, overwrite destination; if 1, append to destination; if 2, append only if last lines of destination file don't exactly match the lines we would be appending

Possible metadata backends

  • filesystem: a separate "METADATA" subtree under each class, where the leaf entries are plain files that contain metadata key-value pairs that correspond to this leaf entry in the template hierarchy. E.g. CLASSES/BASE/METADATA/etc/resolv.conf could contain append:1.
  • xattrs: every entry can have attributes like user.org.grml.live.append=1. (Challenge: xattrs are only supported on files and directories, not e.g. symlinks and device nodes.)
  • Subversion: every entry can have attributes like org.grml.live.append:1.
  • single file: a single METADATA file (name to be decided) contains the full relative path of template directory structure entries, followed by TAB, followed by a key:value pair. No escaping is necessary; the last TAB separates the entry name from the key:value pair.
    • this would be easy to extend to a METADATA.d directory structure with files that are concatenated to form the final list of metadata entries.

Actual implementations of this specification SHOULD support at least one of these mechanisms, and if they support more than one, SHOULD support converting between them as well as using all at the same time, with some order of precedence.

EDITS:

  • 20250112: added an alternative (and IMO better) directory layout
  • 20250115: added a bit about metadata handling

@zeha
Copy link
Member

zeha commented Jan 8, 2025

although I realize that with #281 having progressed quite a ways, this train has likely sailed.

I want to clarify that #281 is an interim step, achieving a number of things:

  • unties grml-live from the fai packages now
  • constrains the feature set used from fai, so a later migration becomes easier
  • reduces accidental complexity in grml-live that was there to work around (legacy) fai behaviours

Notably it doesn't change any of the basic concepts. This redesign RFC however probably wants to change the basic concepts :)

@akorn
Copy link
Contributor

akorn commented Jan 12, 2025

I updated my comment to add a different directory layout that I think would be better, even though it differs from what FAI uses.

@zeha
Copy link
Member

zeha commented Jan 15, 2025

I updated my comment to add a different directory layout that I think would be better, even though it differs from what FAI uses.

I like the "alternate layout" a lot.
Does anyone know why FAI did not use it?
From my fcopy implementation experience it would seem a lot simpler to implement.

@mika
Copy link
Member Author

mika commented Jan 16, 2025

I updated my comment to add a different directory layout that I think would be better, even though it differs from what FAI uses.

I like the "alternate layout" a lot.

I also think that this is a good and worthwhile idea.

Does anyone know why FAI did not use it? From my fcopy implementation experience it would seem a lot simpler to implement.

This might have been influenced by cfengine (see its class docs at https://docs.cfengine.com/docs/3.25/reference-language-concepts-classes.html) which FAI was targeting since its very beginnings.

From a user perspective it could be easier to have:

etc/apt/sources.list.d/debian.list/

and list it then, looking like:

etc/apt/sources.list.d/debian.list/DEBIAN_UNSTABLE
etc/apt/sources.list.d/debian.list/DEBIAN_STABLE
etc/apt/sources.list.d/debian.list/DEBIAN_BOOKWORM
etc/apt/sources.list.d/debian.list/DEBIAN_TESTING
etc/apt/sources.list.d/debian.list/DEBIAN_BULLSEYE

Whereas with the suggested layout we'd get:

DEBIAN_UNSTABLE/etc/apt/sources.list.d/debian.list
DEBIAN_STABLE/config/files/etc/apt/sources.list.d/debian.list
DEBIAN_BOOKWORM/config/files/etc/apt/sources.list.d/debian.list
DEBIAN_TESTING/config/files/etc/apt/sources.list.d/debian.list
DEBIAN_BULLSEYE/config/files/etc/apt/sources.list.d/debian.list

With the present and expected class preference behavior (later class names have precedence over former ones) I'm not sure which approach is easier to follow through (as in: find out which files/classes are relevant and what's the end result)?

@akorn
Copy link
Contributor

akorn commented Jan 16, 2025

I think the tool should have a dry run feature that tells the user what would be deployed from where, and how.

@mika
Copy link
Member Author

mika commented Jan 29, 2025

Now that #281 has been merged, we could think about further redesign we'd like to take care of? :)

mika added a commit that referenced this issue Jan 29, 2025
Now with the switch to minifai we no longer plan
to follow those redesign steps from docs/design.txt.
Any further redesign decisions are discussed as of
#267

Related to commit 0bf6821
mika added a commit that referenced this issue Jan 30, 2025
Executing it converts our existing config/* from FAI layout (e.g.
config/files/$FILENAME/$CLASS) into the new minifai layout (e.g.
config/files/$CLASS/$FILENAME).

See #267
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants