Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for oci-dir output (fixes #1865) #2351

Merged
merged 2 commits into from
Apr 5, 2024

Conversation

septatrix
Copy link
Contributor

@septatrix septatrix commented Feb 7, 2024

A first draft for rudimentary OCI image outputs

PS: I intend to squash this once more questions are answered/resolved

Copy link
Contributor Author

@septatrix septatrix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also note the several inline comments I left

mkosi/archive.py Outdated Show resolved Hide resolved
mkosi/config.py Show resolved Hide resolved
@septatrix septatrix marked this pull request as draft February 7, 2024 18:52
mkosi/util.py Outdated Show resolved Hide resolved
mkosi/util.py Outdated Show resolved Hide resolved
mkosi/config.py Outdated Show resolved Hide resolved
mkosi/config.py Outdated Show resolved Hide resolved
mkosi/archive.py Outdated Show resolved Hide resolved
mkosi/archive.py Outdated Show resolved Hide resolved
mkosi/archive.py Outdated Show resolved Hide resolved
@septatrix
Copy link
Contributor Author

Should I add something like podman in the CI to test the OCI? A runtime-spec for systemd-nspawn --oci-bundle should also be possible but that would be a bit more complex

@behrmann
Copy link
Contributor

behrmann commented Feb 7, 2024

I guess. If we are to add this, it should be covered by a test.

@septatrix
Copy link
Contributor Author

Thinking about the tests: Unpacking the OCI bundle and creating the runtime spec is rather annoying and umoci which would be great for this is not package in most places. So while using sd-nspawn --oci-bundle would be neat it is rather annoying and unpacking the OCI and creating the runtime spec on our own would add places where we could hide our own bugs (apart from being annoying).

As a result I think using podman to test this is likely the simplest way for now. However, installing podman in the tools tree also has it problems as the tools tree is inaccessible from within the pytest tests as the proper config parsing etc is only done within mkosi itself. So using the tools tree version would only be possible if we add podman support to the boot/shell verbs. I am not sure if this is desired as this would also likely behave differently regarding stuff like RuntimeTree, network namespaces, etc...

Alternatively we could install podman outside of the tools tree.

Which variant do you prefer?

@septatrix
Copy link
Contributor Author

I have added a basic tests using podman

Copy link
Contributor Author

@septatrix septatrix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI failures for Arch and Debian testing seem to be due to systemd/systemd#29860. No clue why CentOS fails and I did not find the motivation to lock through all the logs yet.

Regardless: This is not a problem with the way the OCI images are constructed but a bug/misconfiguration in either podman or systemd. Otherwise all CI runs would fail. I am actually inclined to add skips for the failing scenarios. People can still use the OCI generated by mkosi and start the tools directly instead of going via the init system. Merging this functionality in mkosi could even be beneficial for podman/systemd to better diagnose who exactly is at fault here.

tests/__init__.py Outdated Show resolved Hide resolved
tests/test_boot.py Outdated Show resolved Hide resolved
@septatrix
Copy link
Contributor Author

No clue why CentOS fails and I did not find the motivation to lock through all the logs yet.

It's an endless loop due to journald not starting properly (which is also why the timeout after 1h kicked in. The repeating log section is attached. I have found a few systemd issues which seem to reference what I would guess is the error cause (PR_SET_MM_ARG_START failed...). As this does not happen in the other distros I would assume that this has since been fixed in upstream systemd further making me lean towards skipping the OCI testing for centos.

Log extract
         Starting Journal Service...
systemd-tmpfiles-setup.service: starting held back, waiting for: systemd-journald.service
systemd-journald.service: Failed to add invocation ID to keyring, ignoring: Operation not permitted
Operating on architecture: x86
Operating on architecture: x32
Operating on architecture: x86-64
Operating on architecture: x86
Operating on architecture: x32
Operating on architecture: x86-64
Operating on architecture: x86
Operating on architecture: x32
Operating on architecture: x86-64
Restricting namespace to: n/a.
Operating on architecture: x86
Blocking cgroup.
Blocking ipc.
Blocking net.
Blocking mnt.
Blocking pid.
Blocking user.
Blocking uts.
Blocking time.
Operating on architecture: x32
Blocking cgroup.
Blocking ipc.
Blocking net.
Blocking mnt.
Blocking pid.
Blocking user.
Blocking uts.
Blocking time.
Operating on architecture: x86-64
Blocking cgroup.
Blocking ipc.
Blocking net.
Blocking mnt.
Blocking pid.
Blocking user.
Blocking uts.
Blocking time.
Operating on architecture: x86
Operating on architecture: x32
Operating on architecture: x86-64
Operating on architecture: x86-64
[**    ] (2 of 2) A start job is running for…odes in /dev (6min 41s / no limit)
�M
�[K[*     ] (2 of 2) A start job is running for…odes in /dev (6min 42s / no limit)
�M
�[K[**    ] (2 of 2) A start job is running for…odes in /dev (6min 42s / no limit)
�M
�[K[***   ] (1 of 2) A start job is running for Journal Service (3s / no limit)
�M
�[K[ ***  ] (1 of 2) A start job is running for Journal Service (4s / no limit)
�M
�[K[  *** ] (1 of 2) A start job is running for Journal Service (4s / no limit)
�M
�[K[   ***] (2 of 2) A start job is running for…odes in /dev (6min 44s / no limit)
�M
�[K[    **] (2 of 2) A start job is running for…odes in /dev (6min 45s / no limit)
�M
�[K[     *] (2 of 2) A start job is running for…odes in /dev (6min 45s / no limit)
�M
�[K[    **] (1 of 2) A start job is running for Journal Service (6s / no limit)
�M
�[K[   ***] (1 of 2) A start job is running for Journal Service (7s / no limit)
�M
�[K[  *** ] (1 of 2) A start job is running for Journal Service (7s / no limit)
�M
�[K[ ***  ] (2 of 2) A start job is running for…odes in /dev (6min 47s / no limit)
�M
�[K[***   ] (2 of 2) A start job is running for…odes in /dev (6min 48s / no limit)
�M
�[K[**    ] (2 of 2) A start job is running for…odes in /dev (6min 48s / no limit)
�M
�[K[*     ] (1 of 2) A start job is running for Journal Service (9s / no limit)
�M
�[K[**    ] (1 of 2) A start job is running for Journal Service (10s / no limit)
Received SIGCHLD from PID 64 (systemd-journal).
Child 64 (systemd-journal) died (code=exited, status=1/FAILURE)
systemd-journald.service: Child 64 belongs to systemd-journald.service.
systemd-journald.service: Main process exited, code=exited, status=1/FAILURE
systemd-journald.service: Failed with result 'exit-code'.
systemd-journald.service: Service will restart (restart setting)
systemd-journald.service: Changed start -> failed
systemd-journald.service: Job 293 systemd-journald.service/start finished, result=failed
�M
�[K[FAILED] Failed to start Journal Service.
�[KSee 'systemctl status systemd-journald.service' for details.
systemd-journald.socket: Changed running -> listening
systemd-journald.service: Unit entered failed state.
systemd-journald.service: Consumed 15ms CPU time.
systemd-journald.service: Changed failed -> auto-restart
systemd-journald.service: Control group is empty.
systemd-journald.service: Service has no hold-off time (RestartSec=0), scheduling restart.
systemd-journald.service: Trying to enqueue job systemd-journald.service/restart/replace
systemd-journald.service: Installed new job systemd-journald.service/restart as 298
systemd-journald.service: Enqueued job systemd-journald.service/restart as 298
systemd-journald.service: Scheduled restart job, restart counter is at 42.
systemd-journald.socket: Incoming traffic
systemd-journald.socket: Changed listening -> running
sysinit.target: starting held back, waiting for: systemd-resolved.service
systemd-journal-flush.service: starting held back, waiting for: systemd-journald.service
systemd-journald.service: Changed auto-restart -> dead
systemd-journald.service: Job 298 systemd-journald.service/restart finished, result=done
[  OK  ] Stopped Journal Service.
systemd-journald.service: Converting job systemd-journald.service/restart -> systemd-journald.service/start
systemd-journald.service: Consumed 15ms CPU time.
sysinit.target: starting held back, waiting for: systemd-resolved.service
systemd-journal-flush.service: starting held back, waiting for: systemd-journald.service
systemd-journald.service: Will spawn child (service_enter_start): /usr/lib/systemd/systemd-journald
systemd-journald.service: Failed to set 'trusted.invocation_id' xattr on control group /system.slice/systemd-journald.service, ignoring: Operation not permitted
systemd-journald.service: Failed to remove 'trusted.delegate' xattr flag on control group /system.slice/systemd-journald.service, ignoring: Operation not permitted
systemd-journald.service: Passing 3 fds to service
systemd-journald.service: About to execute /usr/lib/systemd/systemd-journald
systemd-journald.service: Forked /usr/lib/systemd/systemd-journald as 65
PR_SET_MM_ARG_START failed: Operation not permitted
systemd-journald.service: Changed dead -> start

@DaanDeMeyer
Copy link
Contributor

We're trying to get umoci packaged in Fedora so then it should be OK to depend on it.

@septatrix
Copy link
Contributor Author

That would still not solve the problem for the other distros

@DaanDeMeyer
Copy link
Contributor

@septatrix Looking at https://pkgs.org/download/umoci, it seems that only centos/fedora are missing and the rest is covered already. Are there any particular distributions you have in mind?

@septatrix
Copy link
Contributor Author

Oh in that case I guess not. I was looking at their readme but I guess that's quite outdated: https://github.com/opencontainers/umoci?tab=readme-ov-file#install

@septatrix septatrix marked this pull request as ready for review February 13, 2024 01:41
tests/test_boot.py Outdated Show resolved Hide resolved
mkosi/resources/mkosi.md Show resolved Hide resolved
@septatrix septatrix force-pushed the feature/oci-output-format branch 2 times, most recently from aa67d81 to 4a5fcef Compare February 13, 2024 03:37
@septatrix
Copy link
Contributor Author

septatrix commented Feb 19, 2024

Is there anything else which has to be addressed? Should I try to rebase/squash the commits into more reasonable collections (or maybe even a single, large one)?

The failing test was flaky at the time of my last push. I'll rebase and push which should resolve that

@keszybz keszybz mentioned this pull request Feb 20, 2024
Copy link
Member

@keszybz keszybz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled out the two small fixes to #2409 so that they get applied quickly.

This looks reasonable… Please rebase & squash, it's harder to review with such messy history.

mkosi/__init__.py Outdated Show resolved Hide resolved
mkosi/__init__.py Outdated Show resolved Hide resolved
mkosi/__init__.py Outdated Show resolved Hide resolved
mkosi/config.py Outdated Show resolved Hide resolved
mkosi/__init__.py Outdated Show resolved Hide resolved
mkosi/__init__.py Outdated Show resolved Hide resolved
@keszybz
Copy link
Member

keszybz commented Feb 20, 2024

We're trying to get umoci packaged in Fedora so then it should be OK to depend on it.

Link?

tests/test_boot.py Outdated Show resolved Hide resolved
@DaanDeMeyer
Copy link
Contributor

We're trying to get umoci packaged in Fedora so then it should be OK to depend on it.

Link?

No link, Michel Lind has been working on it.

I don't think we want to merge this until we figure out whether we can test this with systemd-nspawn instead of podman. I'd much rather get more coverage on systemd-nspawn's OCI support than having to figure out various podman failures when these tests inevitably start failing.

@septatrix
Copy link
Contributor Author

I don't think we want to merge this until we figure out whether we can test this with systemd-nspawn instead of podman.

Unless we want to support umoci -> systemd-nspawn for running oci images under a (new) verb inside mkosi itself do not have to install it inside the tools tree. Currently podman is also only installed on the CI runner. As that is using Ubuntu we could already try out umoci there.

I'd much rather get more coverage on systemd-nspawn's OCI support than having to figure out various podman failures when these tests inevitably start failing.

The problem is that unlike podman's systemd detection, umoci only generates a very basic runtime bundle and does not configure mounts/tmpdirs (e.g. cgroups, /run etc) or set environment variables (like $container_uuid) which systemd needs to function. And even all the stuff podman sets up for systemd do not seem to be enough (which is why I skipped some tests, and I assume that is also what is happening in the current Ubuntu tests, see #2351 (comment))

We could also decide that we do not care about testing systemd inside the OCI image and instead just care about it being constructed correctly (or simply run /bin/true inside it). In that case something like skopeo inspect would be sufficient to check that we created a valid OCI image.

@septatrix
Copy link
Contributor Author

I don't think we want to merge this until we figure out whether we can test this with systemd-nspawn instead of podman. I'd much rather get more coverage on systemd-nspawn's OCI support than having to figure out various podman failures when these tests inevitably start failing.

Another issue with using umoci would be that it only generates the minimal OCI runtime bundle and has no means to specify e.g. mounts. The means we would have to set up all the stuff required by systemd ourselves, likely by modifying the JSON spec generated by umoci. This means that we introduce a new step where we could hide existing errors or add our own ones

Comment on lines 3186 to 3188
# We do not want SELinux xattrs in OCI layers
# as they seldom make sense inside containers.
options=["--xattrs-exclude", "security.selinux"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People make VMs now from OCI images, so I think excluding selinux is a bit presumptuous. I would prefer the selinux xattrs are stripped out when extracting the OCI image, not when creating it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umoci logged warnings when extracting an OCI with selinux labels but I can keep them in if that is preferred

Copy link
Contributor

@DaanDeMeyer DaanDeMeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we drop the stuff for running OCI images with podman? I really don't think we should add that. I'm OK with the OCI output format though.

Also please squash commits, reviewing unsquashed commits is just pure pain.

@septatrix
Copy link
Contributor Author

Can we drop the stuff for running OCI images with podman? I really don't think we should add that. I'm OK with the OCI output format though.

Yes I can drop that and also clean up the commit history but I might only get around to it at the end of the month

@septatrix septatrix force-pushed the feature/oci-output-format branch 2 times, most recently from 06755c2 to bde0a69 Compare March 27, 2024 19:55
@septatrix
Copy link
Contributor Author

I rebased the branch and squashed some commit. I hope the split are done in a way which makes review easier.

I also dropped the podman stuff. This means that only building is tested and not running but until we find a better solution (e.g. umoci, sd-nspawn with a manually created OCI bundle, lxc import with oci-template) this should be fine. We can still revisit this at a later time (e.g. when umoci is included everywhere and then include it in the tools tree).

mkosi/__init__.py Outdated Show resolved Hide resolved
mkosi/__init__.py Outdated Show resolved Hide resolved
mkosi/__init__.py Outdated Show resolved Hide resolved
mkosi/__init__.py Outdated Show resolved Hide resolved
Copy link
Contributor

@DaanDeMeyer DaanDeMeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, but please make sure all output files and directories are created with explicit permissions. Also, please merge the second and third commit.

@septatrix septatrix force-pushed the feature/oci-output-format branch 2 times, most recently from 9d6d228 to 5f7c8d9 Compare March 29, 2024 18:16
@DaanDeMeyer
Copy link
Contributor

@septatrix This would be a separate PR, but given the "layers" stuff in OCI images, it'd be cool if we could integrate this properly with Overlay=yes and BaseTrees= somehow.

mkosi/__init__.py Outdated Show resolved Hide resolved
mkosi/config.py Outdated Show resolved Hide resolved
mkosi/__init__.py Outdated Show resolved Hide resolved
@DaanDeMeyer DaanDeMeyer merged commit 88c54ba into systemd:main Apr 5, 2024
23 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants