-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partition Layout #18
Comments
sounds like a reasonable default.. just to be clear people would be able to add a separate mounted |
Yeah. |
it is but I think we've enabled (because people asked for it) to have |
Do you see us adopting the GPT generator and avoiding the need to have the OS mounts in |
Yeah, we support that; it was the primary rationale for the systemd fstab symlink patch. |
We could say that all the standard top level directories on I'm not a huge fan of the GPT generator. It's pretty limited since it only supports a few partitions and reeks magic that users may not be aware of. Much better to have explicit mount units imho. |
Yeah; agree with the magic aspect. Though the specific thing I don't like about it is that if one plugs in a backup drive into a system, you might end up having the old
Having a "full installer" that auto-generates UUIDs for each partition and binds those in The only way I can think of to address this in a "dd install" is to generate the machine id at install time, walk the target disk post-install and change the GUID/mount units. Ug. I guess CL has gone a long time with the current approach, and while the "foreign partition problem" does occur in the real world, it's definitely possible to work around it. |
CL makes use of labels instead of uuids for partitions*. Ignition can bake in a specified uuid to your Ignition config but that means your machines will all have the same one. Since everyone is running off the same dd'd image anyway there's already duplicate uuids across machines (hasn't been a problem as far as I've seen). Labels actually work pretty well for this case. It means that while different machines will have partitions with different uuids, all of your config files and such (e.g. /etc/fstab, mount units, etc) are the same across machines and contain no machine-specific bits. We could use uuids if Ignition supported templating but thats a rabbit hole I really don't want to go down, especailly since I see the "no machine specific bits" to be a feature. *for the root-on-raid case we actually have special type guids for raid devices containing the root. This allows us know which devices to start in the initramfs. FWIW I'd like to avoid this for FCOS. Hell, stick a config file in /boot some tool in the initramfs knows how to read for all I care. |
I have to chime in on this portion. Soon I will be open sourcing some lvm encryption work done on Atomic LVM managed volumes. We store keys remotely basically. During that exercise the part which stopped the system startup and decryption from being more elegant and robust were UUIDs (we're on baremetal with diverse set of disks). If I had a more generic interface like labels to work with, live would have been easier. |
One thing that's strongly implied by this is that Anaconda is probably not the primary path for bare metal installs; a whole goal of this is that "install" is just Side note: It'd be a nice twist to have the installer come as a privileged container. However, a whole interesting question is how we generate that "base disk image" - I'd probably at least initially start with using Anaconda for it personally, but there's also the libguestfs-style disk image creation. One question though; do we still support people using Anaconda (say that I want dm-crypt for everything, or XFS reflink=1, or...)? And there's a tension between Ignition for disk provisioning vs kickstart here. |
@cgwalters Off-topic for this issue, but I don't see any reason to support installing via Anaconda, even optionally. If you want dm-crypt or XFS or whatever, we should be supporting that through Ignition. @ajeddeloh Why don't you like the type GUID approach? It seems to work pretty well for CL. |
Re installer in a container: we could package it as such, but it should also be so simple you don't need to. The current one is just a few hundred lines of bash (most of which is a ascii armored pubkey). Re using anaconda bare metal installation: I really really really don't like this. Your Ignition config is the declaration of what your machine looks like; there shouldn't be anything else controlling that. It's also yet another thing to support. If we want to support dmcrypt or other filesystem/partition weirdness we should implement that in Ignition. @bgilbert It's a hack from when we were thinking we shouldn't use the boot partition. We're going to need to store some config on the boot partition anyway for supporting encryption. It'd be much cleaner to just store a "map" for mounting the root partition in a config file. That would also eliminate the need to do GPT on RAID on GPT on Disks (and instead of GPT on RAID on Disks) |
I'd say it's a distinct thread but the installer path is pretty intertwined with our default partitioning. |
Sounds like https://github.com/latchset/clevis ? |
I’ve modified https://github.com/HouzuoGuo/cryptctl to support LVM and stronger auditing with additional logging and events actions. It is deployed on all of our bare metal atomic nodes. |
Discussed in the In meeting today. This is what we came up with:
Potential issues:
With 4k sectors we need to consider the GPT partition layout as well as the filesystem on top. related CL issue. We will address this issue when we hit it and call it out as a risk for now. I believe @bgilbert also had another item in open floor that was relevant to this ticket that was regarding a user filling up the root partition and not being able to receive updates any longer. |
How about
And we also make A big topic that crosses this though is whether or not we use LVM by default (and if we don't whether it can be configured via ignition). If neither, then that 3GB (or whatever) rootfs is going to feel less flexible. |
My question in the meeting was: in CL, we can in principle continue to update a machine whose root filesystem is full, because we're only touching @cgwalters I like that model. In that case, the flexibility of the small root is only an issue for distro maintainers, since the user shouldn't be putting any data there. It's a potential concern (c.f. Fedora increasing the size of |
As long as I can dmcrypt /var later, and add lvm to new disks which later I can manage to dmcrypt. Believe that for atomic now all users are mapped into that writable space in var. I think right now also var and sysroot are on a shared partion with atomic. |
i'm use ostree on netbook with emmc, 3gb root is too small - not able to upgrade sometimes... |
is |
/var on rootfs |
yeah. this is something colin is proposing we change to not be on rootfs by default, which would mean we would require less space for root. |
@dustymabe no i don't think that this layout change things: |
Regarding LVM and Ignition. I want that to happen. Much like the partitioning work, it's going to be tricky to implement, but imo it's 100% worth doing. That being said I don't think we should have our standard partition layout be LVM based. Keep it simple. Regarding moving /var to a seperate partition: I'm in favor of this. Not only does it help (although not completely avoid) the issue of Re: 4k sectors and GPT. I think we can ship a disk that supports both. The gpt spec lets you move the partition table around and only the actual contents of the GPT header are included in the CRCs (not the entirety of the sector). The only fixed things are that the GPT header must be at LBA1 and the backup must be at the last LBA. Since the header is <= 512 bytes, we can have both where they want to be for the primary and backup. Here's what it would look like:
There's a few problems/risks:
cc @lucab for the GPT stuff. |
closing this ticket as we've decided that a static partition layout is suitable for FCOS. Implementation details can be worked out later I believe. |
Writing down a couple things mentioned elsewhere about the 4k/512 hybrid plan for the record. My proposal technically breaks the GPT spec since the space in a sector after the GPT bits is defined as being all zeros. Whether anything cares is another question. It would also use just about every "feature" GPT has, and thus be at risk of not working on machines with poorly implemented EFIs (or very well implemented EFIs that check things are zeroed accordingly). |
Try to match the design in coreos/fedora-coreos-tracker#18 - no lvm - separate /var
This is part of coreos/fedora-coreos-tracker#18 For now, this just drops LVM to make it easier to use Ignition to both build images, and help enable ignition-disks. Note that I tried to use a separate `/var` but this currently does not work with our Ignition, which would need to learn how to mount `/var` in the initramfs. We add growpart logic adapted from projectatomic/container-storage-setup@d4994e6 (Probably at some point should teach growpart how to grow based on mount point paths...)
This is part of coreos/fedora-coreos-tracker#18 For now, this just drops LVM to make it easier to use Ignition to both build images, and help enable ignition-disks. Note that I tried to use a separate `/var` but this currently does not work with our Ignition, which would need to learn how to mount `/var` in the initramfs. We add growpart logic adapted from projectatomic/container-storage-setup@d4994e6 (Probably at some point should teach growpart how to grow based on mount point paths...)
Having a split |
The fire alarm went off at the Westford office today and I happened to be standing near Vivek Goyal and Mike Snitzer (kernel filesystem/block people). Mike in particular said that supporting both 4k and 512b in one disk image couldn't be done because the filesystems rely on sector writes being atomic. It seems like the simplest plan is to just make two disk images? |
Yeah probably. SGTM. |
Right, sadly you cannot issue 512b IO to a native 4K device. A filesystem (e.g. XFS) that is formatted to use 512b assumes 512b is the atomic unit of IO. It'll fail to mount if the underlying device is actually a native 4K block device. You might think to go the other way and try to format the filesystem with a 4K blocksize and use that single FS image for both 512b and 4K devices. BUT, there is increased potential for a partial 4K write to a 512b device to leave the device with 512b IOs having been written (yet the larger 4K being incomplete) -- this is also known as "torn writes". |
We discussed possibly using a `var` partition for FCOS in coreos/fedora-coreos-tracker#18 I would like to do so for my own Silverblue install, and possibly for Silverblue by default. So let's mount that partition if it exists, which means the other code that cleans out what Anaconda did in `/var` will work.
I've been thinking about the dm-crypt aspect again. Some prior discussion is in coreos/ignition#577 One thing I'm wavering on a bit is how clunky it feels to rewrite all of the operating system files on boot. If we're in a cloud scenario, we don't have a lot of choice unless we provide people a tool for creating new snapshots (big implications there). On bare metal though, I think we could instead do a "minimal re-partitioner" (not quite an installer) that created dm-crypt on the target system, then took the raw disk image and mounted it, and did a filesystem-level copy. (Aside: I believe Android images encrypt on boot when you initialize them the first time, and this is probably a lot nicer since they switched to using ext4 encryption. Although I'm not sure the OS is ever encrypted, it's dm-verity.) |
(The reason I'm thinking about dm-verity is that there are definitely server-side uses for it, but I'd like Silverblue to inherit as much technology as possible from CoreOS, and dm-crypt is really quite important on client-side devices) |
I hope we're using LUKS not just dm-crypt unless there's a good reason not to. We want to ensure that Ignition remains the only "source of truth" for configuration. The Ignition config may not be known at install time, so we don't want to do anything special at install time. I instead wonder if we could add an "optimization" to Ignition/the initramfs to detect if we're recreating the root and save the repo to a tmpfs or something similar, so if we blow away the root we can repopulate it from a local source instead. Finally, what are the use cases for encrypting more than just |
Hmm; I had to look up the distinction layers here, I had always been using them interchangably.
One tricky thing is a lot of use cases want |
Hmm, we'll have to think long and hard before choosing a size for ROOT. We don't want to realize down the line that e.g. the f31 -> f32 update won't fit. One issue too is that there might be more than just two commits in there. E.g. layered pkgs & pinned deployments (and I think there were discussions making the number of rollback deployments configurable?). So settling on the right size will be tricky no matter what. |
Agreed. We could publish guidelines saying "resize to X if you plan on doing a bunch of pinning or other things that take up a lot of space", but that's not ideal. It shouldn't be nearly as bad as it was with CL since ostree dedups across deployments. My guess is ~2x the size of a single deployment should be ok. |
(Happened to stumble across https://bugzilla.redhat.com/show_bug.cgi?id=1061478 ) |
Let's call what we have now with the FCOS preview release "phase 1". Phase 2 work: #94 |
In converstations we had recently, we think that FCOS should have a default partition layout, similar to how CL has a standard fs layout since it provides consistency across bare metal and clouds and well as making the image "dd-able" directly to a drive (which makes installation trivial). Any further disk modification should be done via Ignition.
What should that partition layout look like?
My (quick and not fully thought out) proposal:
Ideally we'd be able to move ROOT around using Ignition and re-deploying the OSTree to where we moved it to between the disks and files stages. If you're on a EFI system you could even wipe away BIOS-BOOT to make more room (not that its terribly large). There's some tricky cases with that which we're still exploring, but it should be possible at very least in simple cases.
The text was updated successfully, but these errors were encountered: