-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osbuild: fix virtiofs data loss; switch compression to off #3729
Conversation
Raw format is fine to use on systems that have reflinks for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, that's a pretty nasty bug.
(FWIW bootc install
is very careful to ensure flushing filesystems)
OK I gave a run of this through our staging pipeline. x86_64 progressed without issue (this is where we were getting stuck before), but aarch64 appears to have had the same problem where GRUB was complaining about not being able to find the kernel. I logged into the builder (the exact container where the build was happening) and was able to run the qemu qcow2 and see the same results (failed GRUB). I recreated the qcow2 ( Something definitely is going on here that is a bit odd. |
Either way this is an improvement. Will merge this and continue investigation as issues come up. |
I think the problem is actually more that the file written out over virtiofs isn't fully copied out, but I could be wrong. Is there a virtiofs_freeze :) ? Here is what is happening inside the supermin VM: OSBuild builds, stores things in the cache2.qcow2 XFS filesystem. At the end it copies out the qemu.qcow2 into the ext2 supermin VM root filesystem (I didn't want to store this in the cache because we don't need to save it in the cache for anything later to use it). Then we copy (over virtiofs) that file into the correct place it's supposed to be where our COSA processes can continue processing it. |
I've been investigating why a seemingly innocuous change (changing compression on OSBuild generated qemu qcow2) would cause disk images to not boot [1]. I think I have found the issue. I was first trying to make sure 100% that the files got written out over the virtiofs mount before the VM got shutdown so I decided to add a `umount $workdir` to the process. But this ended up with a `umount: /srv/: target is busy.` error. When the supermin VM gets run we `cd "${workdir}"` at the end of supermin-init-prelude.sh. This has the effect of causing all spawned processes (including PID1/init) to have a cwd of /srv/. ``` bash-5.2# lsof /srv COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME init 1 root cwd DIR 0,26 4096 10485829 /srv kthreadd 2 root cwd DIR 0,26 4096 10485829 /srv pool_work 3 root cwd DIR 0,26 4096 10485829 /srv kworker/R 4 root cwd DIR 0,26 4096 10485829 /srv ... ... ``` Which means it's unlikely that the virtiofs mount ever gets cleanly unmounted. Let's rework things here so that actual work gets spawned in a subshell to prevent `init` from having a cwd on the virtiofs mount. We also add in an `umount` of the cache qcow2 (if exists) and the virtiofs mount to strengthen our chances of a clean unmount. [1] coreos#3728
We previously did this in a different way (2a8d1e6) but then had to revert it (39fdd61) because it caused images to not boot [1]. The root cause appears to have been the virtiofs mount not being unmounted cleanly from the supermin VM and that is now fixed so let's switch back to not compressing since we rely on our outer compression [2]. [1] coreos#3728 [2] coreos/fedora-coreos-tracker#1653 (comment)
Just in case this is what is causing issues with file consistency when copying out of the supermin VM.
Still trying to figure out data corruption from coreos#3728
a2dd41f
to
9ed1b14
Compare
ok I'm still investigating this. I'm going to close this PR while I iterate on this code and test things on various architectures. |
Also picked up a commit from walters in #3712 to allow running a VM from raw image.
See individual commit messages.