Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differing behavior for aarch64 vs x86_64 disk images #855

Closed
dustymabe opened this issue Jun 8, 2021 · 25 comments · Fixed by coreos/coreos-assembler#2411 or coreos/coreos-assembler#2677
Assignees
Labels
jira for syncing to jira kind/enhancement

Comments

@dustymabe
Copy link
Member

On aarch64 we don't have a 1st partition.

$ lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda    252:0    0   10G  0 disk 
|-sda2 252:2    0  127M  0 part 
|-sda3 252:3    0  384M  0 part /boot
`-sda4 252:4    0  9.5G  0 part /sysroot

The goal here was to keep the partition numbers the same but omitting something taking up the first partition number means we get differing behavior if you configure disks using Ignition when you compare x86_64 and aarch64.

For example, the butane config from an ignition bug I just filed gives us this on x86_64:

$ lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0   40G  0 disk
├─sda1   8:1    0    1M  0 part
├─sda2   8:2    0  127M  0 part
├─sda3   8:3    0  384M  0 part /boot
├─sda4   8:4    0  1.8G  0 part /sysroot
├─sda5   8:5    0 11.7G  0 part
└─sda6   8:6    0   10G  0 part /var/home

and this on aarch64:

[core@localhost ~]$ lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0   40G  0 disk
├─sda1   8:1    0 11.7G  0 part
├─sda2   8:2    0  127M  0 part
├─sda3   8:3    0  384M  0 part /boot
├─sda4   8:4    0  1.8G  0 part /sysroot
└─sda5   8:5    0   10G  0 part /var/home

For consistency, should we just leave a sda1 on the aarch64 disk of the same size as the x86_64 disk? It looks like it's only 1M in size.

@dustymabe dustymabe added kind/enhancement meeting topics for meetings labels Jun 8, 2021
@bgilbert
Copy link
Contributor

bgilbert commented Jun 8, 2021

I'm inclined to say no, but I'd like to hear what others think.

If we care about matching both partition numbers and partition starting offsets across architectures, we'd need to waste 124 MB on ppc64le and 128 MB on s390x.

Even if we only care about matching partition numbers, s390x will be a problem. coreos-installer has code to translate the GPT to a DASD VTOC on ECKD DASDs, and the VTOC only supports three Linux partitions. We'd need to make code changes to discard the unneeded partitions (leaving one unused partition slot for user data), and then we'd still have an inconsistency on that architecture.

@dustymabe
Copy link
Member Author

I'm inclined to say no, but I'd like to hear what others think.

Indeed. Will try to bring it up in the meeting tomorrow!

If we care about matching both partition numbers and partition starting offsets across architectures, we'd need to waste 124 MB on ppc64le and 128 MB on s390x.

IMO the <130 MB hit is relatively cheap.

Even if we only care about matching partition numbers, s390x will be a problem.

Yeah it is quite a snowflake. As you mention it's going to continue to be a snowflake regardless of what we do. I don't think that should prevent us from trying across the other architectures. If we could ignore the goal on s390x (because of the limitations you mentioned) and get some uniformity for x86_64, aarch64, ppc64le (and eventually maybe riscv) then I think it's still a win.

@bgilbert
Copy link
Contributor

bgilbert commented Jun 9, 2021

IMO the <130 MB hit is relatively cheap.

Maybe not in small VMs. It's also 130 MB of additional disk I/O during each install.

@cgwalters
Copy link
Member

Without doing a deep dive into this, my intuition is to try to handle this (~= make it easy for the user) in either Butane or Ignition and not try to go to great lengths to have identical disk layout.

@bgilbert
Copy link
Contributor

bgilbert commented Jun 9, 2021

Backing up a bit: what problem are we trying to solve? What would a solution in Butane or Ignition look like?

@dustymabe
Copy link
Member Author

We discussed this in the community meeting today.

We didn't have nearly enough time for a complete discussion and we'll bring it up again next week. For now it seems like the sentiments range. Lot's were discussed including:

  • should we even try to gain uniformity
  • if we do try should we just try for partition number parity, or do we want partition offsets to be the same too?

Will continue discussion next week.

@dustymabe
Copy link
Member Author

dustymabe commented Jun 9, 2021

Backing up a bit: what problem are we trying to solve?

I'm approaching this from an "uneducated user" perspective. A lot of times I fit into this category when trying new things in FCOS.

So let's say the scenario is:

I'm a user of FCOS on AWS for my data crunching operations at $ORG. I like that it auto updates itself making my life easy. I have a crafted Ignition config to keep container storage separate from my OS partition because I heard that can save me some pain if I run out of space. I also hear that graviton instances save me 30% $$ for the same amount of compute resources. I flip on an AWS ARM instance (hopefully FCOS AMIs coming soon) and realize that my partition config looks different. Did I do something wrong?

Of course, the answer is "no" and everything should be fine, but I really like the principal of least surprise when we can achieve it. Sure we don't and can't offer promises that Ignition configs will work across architectures, but if they did when they can it would be nice.

Another scenario:

My name is Dusty and I help manage and support users who run FCOS. It's easier for me to remember that "all partitions match for every architecture but s390x" than it is to remember the differences for each one. Documentation would help, but it sure wouldn't be as easy to remember.

What would a solution in Butane or Ignition look like?

Not sure. I'm not really proposing any changes to Butane or Ignition I don't think. Are you asking "if we were to solve this a different way, how would that look?"

@cgwalters
Copy link
Member

What would a solution in Butane or Ignition look like?

I think we can take that to coreos/ignition#1219 right? Then this ticket is for the question of whether we add an empty partition or some other hack or so.

@bgilbert
Copy link
Contributor

bgilbert commented Jun 9, 2021

I flip on an AWS ARM instance (hopefully FCOS AMIs coming soon) and realize that my partition config looks different. Did I do something wrong?

Would it be sufficient to solve that with storage documentation? The partition config will look different but that shouldn't actually affect anything. Ideally users will refer to partitions by label rather than number, and if they care about the partition number, they should probably explicitly specify one.

It's easier for me to remember that "all partitions match for every architecture but s390x" than it is to remember the differences for each one.

They won't completely match anyway. For example, ppc64le has an equivalent to BIOS-BOOT with a different label and GUID. Basically I'm arguing that the differences described here are implementation details that shouldn't be considered user-facing.

What would a solution in Butane or Ignition look like?

I think we can take that to coreos/ignition#1219 right? Then this ticket is for the question of whether we add an empty partition or some other hack or so.

Sure, I'm not proposing a detailed discussion here, just a sketch of an idea. It's not clear to me what changes could be made to Ignition or Butane that would improve the situation described here.

@dustymabe
Copy link
Member Author

dustymabe commented Jun 10, 2021

I flip on an AWS ARM instance (hopefully FCOS AMIs coming soon) and realize that my partition config looks different. Did I do something wrong?

Would it be sufficient to solve that with storage documentation?

It's certainly better than not having the documentation :)

The partition config will look different but that shouldn't actually affect anything. Ideally users will refer to partitions by label rather than number, and if they care about the partition number, they should probably explicitly specify one.

You're totally right.. but, I still think that when it pops up and the partition a user created is at index 1 they are going to believe that they messed up (just as I did). It's definitely a common misconception that partitions are linear on the disk and thinking you somehow just re-ordered all the partitions on the disk is a bit alarming. Documentation will certainly help that.

Let's say I make two partitions, though. Now the first one is at sda1 and the second one I created is at sda5. It just feels weird. Now maybe I just "shouldn't look at the partitions and don't worry about it", but I'm a mediocre sysadmin and I know just enough to get myself into trouble. I look at the output of lsblk and think, oh man something weird is going on here, that must be the source of the <insert unrelated issue here> that I'm seeing, then I'm off down a rabbit hole.

It's easier for me to remember that "all partitions match for every architecture but s390x" than it is to remember the differences for each one.

They won't completely match anyway. For example, ppc64le has an equivalent to BIOS-BOOT with a different label and GUID. Basically I'm arguing that the differences described here are implementation details that shouldn't be considered user-facing.

Now I'll show you how much of a hypocrite I am :) - That particular difference I don't think matters that much so I think it's fine if the label and GUID don't match exactly.. Basically they are the same part index and size and have no bearing on the new partitions i'll create. I lump it into "boot stuff".

What would a solution in Butane or Ignition look like?

I think we can take that to coreos/ignition#1219 right? Then this ticket is for the question of whether we add an empty partition or some other hack or so.

Sure, I'm not proposing a detailed discussion here, just a sketch of an idea. It's not clear to me what changes could be made to Ignition or Butane that would improve the situation described here.

I guess one option could be to do away with number: 0. You read the docs, you tell it exactly what to do. No surprises.

That's an extreme, though. I still think we can achieve a similar goal (less surprises for the user) with the proposed partition number aligning (and optionally offset alignment).

@jlebon jlebon removed the meeting topics for meetings label Jun 16, 2021
@travier
Copy link
Member

travier commented Jun 23, 2021

Suggested during the meeting to add an empty sda1 partition only for aarch64 (if that's technically possible) so that we have parity for x86_64 & aarch64 which are the two main arches most people will currently use and add docs for other arches to document the behavior where we can not make changes easily.

@dustymabe
Copy link
Member Author

We discussed this in the community meeting today.

Unfortunately we ran out of time (again), but I do think we've boiled it down into 2ish options and we just need to vote at this point:

  • A - match partition numbers, enhance documentation
  • B - don't make any changes, enhance documentation

Neither of these options is wrong. There are good arguments for and against. Either way we'll get better documentation!

Option A has quite a few sub topics so probably worth not really digging into those until we've made the A or B decision, but here are a few:

  • Do we care about partition sizes?
  • Do we try to match on ppc64le?
    • We've already shipped on RHCOS and we'd need to make a change there.

Since we've discussed this several times already we'll just hold a vote in the next meeting and try to not eat up too much meeting time with it.

@jlebon jlebon added the meeting topics for meetings label Jun 30, 2021
@dustymabe
Copy link
Member Author

We discussed this in the community meeting today.

13:05:40      dustymabe | and then please vote here in chat with A or B
13:06:13      dustymabe | A => match partition numbers, enhance documentation
13:06:19      dustymabe | B => don't make any changes, enhance documentation
13:06:36       bgilbert | B
13:06:41        jaimelm | B
13:06:52     darkmuggle | B
13:06:54      dustymabe | A
13:06:59         jlebon | B
13:07:30        travier | A
13:07:59              * | dustymabe sets timer to 30 seconds
13:08:20       miabbott | B
13:08:36         jlebon | i'd be open to revisit if multiple users get sufficiently confused by
                        | this
13:08:55         +lucab | (I'm skipping as I don't have a good idea)
13:09:23      dustymabe | #agreed we will keep our paritioning set up for different architectures
                        | as it currently is and enhance our documentation for #855

Now the action item is to update the docs with the architecture specific partition layouts.

@dustymabe dustymabe added status/decided status/pending-action Needs action and removed meeting topics for meetings labels Jun 30, 2021
@travier
Copy link
Member

travier commented Jul 30, 2021

Another question: Should we keep some space at the beginning to allow non-EFI aarch64 to inject their own u-boot to get a UEFI bootloader?

@travier
Copy link
Member

travier commented Aug 2, 2021

@bgilbert bgilbert added the meeting topics for meetings label Aug 10, 2021
@bgilbert
Copy link
Contributor

bgilbert commented Aug 10, 2021

It turns out that the Butane templates for boot device mirroring don't include partition numbers, so we get something like this:

          {
            "label": "bios-1",
            "sizeMiB": 1,
            "typeGuid": "21686148-6449-6E6F-744E-656564454649"
          },
          {
            "label": "esp-1",
            "sizeMiB": 127,
            "typeGuid": "C12A7328-F81F-11D2-BA4B-00A0C93EC93B"
          },
          {
            "label": "boot-1",
            "sizeMiB": 384
          },
          {
            "label": "root-1"
          }

Thus, with the aarch64 template we get [ESP, boot, root] in [1, 2, 3] instead of [2, 3, 4], and with ppc64le we get [PReP, boot, root] in [1, 2, 3] instead of [1, 3, 4].

We could change Butane to include partition numbers in the templates, but that would break compatibility with existing configs that override the root partition size because of coreos/ignition#1219. When performing config merging, Ignition matches partitions by label unless either the parent or child config specifies the partition number, in which case it matches by number instead. So that change would cause affected configs to create two root partitions per disk with identical labels. Assuming we want the same config to work on every architecture, we'd need to add partition numbers on x86_64 too, so that architecture would be affected.

Sooo the conclusion in #855 (comment) might be worth revisiting. Adding empty partitions would be less user-visible than breaking compatibility with either a) configs that override the root partition size (by adding partition numbers in Butane) or b) that use duplicate labels (by fixing coreos/ignition#1219).

It feels hacky though.

@dustymabe
Copy link
Member Author

I'm going to assume (reading from the commit message) you didn't mean to close this issue with that commit.

@dustymabe dustymabe reopened this Aug 11, 2021
@dustymabe dustymabe added status/decided jira for syncing to jira labels Aug 18, 2021
ravanelli pushed a commit to ravanelli/coreos-assembler that referenced this issue Aug 25, 2021
On those architectures, the root partition is placed on partition 3
instead of partition 4 due to a Butane bug that's difficult to fix:

coreos/fedora-coreos-tracker#855 (comment)
dustymabe added a commit to dustymabe/coreos-assembler that referenced this issue Sep 3, 2021
We decided to try to keep the partition numbers in sync across our
x86_64, aarch64, and ppc64le platforms. Previously we skipped partition
numbers on some architectures (for example there was no sda1 by default
on aarch64) and this caused a few issues:

- user confusion
    - it is a common misconception that partition numbers need to start
      at one and increment
    - it can be surprising if the disk layout changes when you go from
      one architecture from the other and re-use the same ignition config
- tooling bugs
    - the butane templates for device boot mirroring didn't include
      partition numbers and ended up with inconsistent behavior on
      aarch64/ppc64le versus x86_64

Keeping the partition numbers in sync will help alleviate some of these
issues.

Fixes: coreos/fedora-coreos-tracker#855
dustymabe added a commit to dustymabe/coreos-assembler that referenced this issue Sep 3, 2021
We decided to try to keep the partition numbers in sync across our
x86_64, aarch64, and ppc64le platforms. Previously we skipped partition
numbers on some architectures (for example there was no sda1 by default
on aarch64) and this caused a few issues:

- user confusion
    - it is a common misconception that partition numbers need to start
      at one and increment
    - it can be surprising if the disk layout changes when you go from
      one architecture from the other and re-use the same ignition config
- tooling bugs
    - the butane templates for device boot mirroring didn't include
      partition numbers and ended up with inconsistent behavior on
      aarch64/ppc64le versus x86_64

Keeping the partition numbers in sync will help alleviate some of these
issues.

Fixes: coreos/fedora-coreos-tracker#855
@bgilbert
Copy link
Contributor

bgilbert commented Sep 9, 2021

Reopening pending Butane change, Butane release, and test re-enablement.

@bgilbert
Copy link
Contributor

bgilbert commented Sep 9, 2021

Butane changes in coreos/butane#278.

@dustymabe dustymabe added status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. and removed status/decided status/pending-action Needs action labels Sep 9, 2021
@dustymabe
Copy link
Member Author

Reopening pending Butane release, and test re-enablement.

@dustymabe dustymabe reopened this Sep 9, 2021
@dustymabe
Copy link
Member Author

Another question: Should we keep some space at the beginning to allow non-EFI aarch64 to inject their own u-boot to get a UEFI bootloader?

For completeness I wanted to post this here. @bgilbert and I had a chat a while back with pwhalen to see if there is anything we could or should do now with regards to arm/uboot. The full log of the discussion is reproduced below, but the outcome is/was:

2021-09-03 15:55:35  bgilbert  #proposed for aarch64 we will add a 1 MB reserved partition at the beginning
                               of the disk as partition 1.  we will document that boot_device.mirror is
                               unsupported for u-boot installs.
2021-09-03 13:55:56	@dustymabe	only remaining thing I think is deciding if we want to make the aarch64 one bigger than 1M to try to accomodate uboot stuff
2021-09-03 13:56:23	@dustymabe	i really have no idea on that one though.. we could try to match the first partition on fedora IoT but maybe we just leave it alone and don't do anything
2021-09-03 14:11:30	<--	hhlp (~hhlp@fedora/hhlp) has quit (Remote host closed the connection)
2021-09-03 14:39:11	@dustymabe	fedora iot has a 500M first partition
2021-09-03 14:39:36	@dustymabe	https://paste.centos.org/view/0e250bb3
2021-09-03 14:45:33	@dustymabe	bgilbert: do you think the size of this partition is critical to get right up front.. or do you think it would be OK to change the aarch64 reserved partition size in the future (to something >1M) once we know more about the u-boot possibilities? I guess I assume people wouldn't be happy with reserving 500M without us confirming it's usefulness first and that's probably not going to happen
2021-09-03 14:45:33	@dustymabe	right now. 
2021-09-03 14:46:42	bgilbert	it sure would be nice if the IoT folks would weigh in on this
2021-09-03 14:47:09	bgilbert	I agree that reserving 500M without knowing we need it is a bit aggressive
2021-09-03 14:48:23	bgilbert	hmm, I guess there's the question of what happens with u-boot and boot device RAID
2021-09-03 14:49:00	bgilbert	presumably transposefs should replicate u-boot to all mirrors, the same way we do with BIOS-BOOT right now?
2021-09-03 14:49:25	bgilbert	in that case, "reserved" doesn't really seem like the right partition type
2021-09-03 14:49:57	@dustymabe	right now we're not creating a filesystem there at all - so I'm not sure
2021-09-03 14:50:35	bgilbert	BIOS-BOOT doesn't have an FS either; we use `dd`
2021-09-03 14:50:52	@dustymabe	ahh - so we'd need to add a case for that in code? 
2021-09-03 14:51:02	bgilbert	(500 MB would increase the memory requirements of transposefs substantially, unless we did compression at runtime)
2021-09-03 14:51:23	@dustymabe	TBH I doubt anyone using the u-boot path would want raid
2021-09-03 14:51:43	@dustymabe	if they have a big enough server to have multiple disks, then they are probably well set up with UEFI already
2021-09-03 14:51:45	bgilbert	yeah, u-boot replication would only work if a user had a new enough FCOS _and_ a new enough Butane
2021-09-03 14:52:16	bgilbert	dustymabe: maybe so, but the failure mode is that everything appears to work until a disk failure
2021-09-03 14:52:30	bgilbert	we don't currently have carve-outs for certain envs we don't support
2021-09-03 14:53:16	bgilbert	hmm, we could use the actual BIOS-BOOT type code
2021-09-03 14:53:24	@dustymabe	true - i assume for u-boot we'd have explicit docs for people to follow 
2021-09-03 14:53:42	@dustymabe	and those docs could say something about boot mirroring not working or something maybe
2021-09-03 14:55:08	bgilbert	to answer your question, technically yes we can change it later
2021-09-03 14:55:14	bgilbert	however
2021-09-03 14:55:57	bgilbert	changing partition boundaries in existing configs seems like a gray area
2021-09-03 14:56:40	@dustymabe	i.e. if you re-provisioned in the future with newer FCOS but same ignition, you'd get a different result? 
2021-09-03 14:56:59	bgilbert	that too
2021-09-03 14:57:08	bgilbert	but also with the same FCOS + Butane config but newer Butane
2021-09-03 14:57:14	bgilbert	(in the RAID case)
2021-09-03 14:57:53	bgilbert	do you think it's feasible to nail down a plan in the next couple weeks?
2021-09-03 14:58:09	bgilbert	if we're calling the initial aarch64 releases 'preview', one more change seems okay
2021-09-03 14:58:13	bgilbert	vs. doing it months from now
2021-09-03 14:58:34	@dustymabe	yeah - so basically we could go out like it is and reserve the right to make some changes 
2021-09-03 14:58:56	bgilbert	yup
2021-09-03 14:59:01	@dustymabe	i think it should be possible to nail down a plan in the next couple weeks. and if we don't maybe it's not worth it anyway
2021-09-03 14:59:07	bgilbert	sgtm
2021-09-03 14:59:21	@dustymabe	we originally only ever targeted UEFI - then someone mentioned this might be a good opportunity to expand 
2021-09-03 14:59:26	bgilbert	right
2021-09-03 14:59:29	@dustymabe	if we don't expand, then I'm OK with that
2021-09-03 14:59:51	bgilbert	I see IoT uses "Microsoft basic data" as the partition type for the first partition
2021-09-03 14:59:53	bgilbert	which seems very strange
2021-09-03 15:00:20	@dustymabe	yeah.. im not sure why
2021-09-03 15:00:42	@dustymabe	https://pagure.io/fedora-kickstarts/blob/main/f/fedora-iot.ks#_18
2021-09-03 15:00:57	bgilbert	transposefs currently supports dd copies of the ppc64le PReP partition and BIOS-BOOT.  but in the BIOS-BOOT case it also copies the MBR boot code.
2021-09-03 15:00:58	@dustymabe	anaconda magic under the hood
2021-09-03 15:01:11	bgilbert	ah
2021-09-03 15:01:47	bgilbert	one option is to define a new partition type for this partition
2021-09-03 15:02:11	@dustymabe	yeah
2021-09-03 15:02:22	@dustymabe	i'm cool with that too
2021-09-03 15:02:33	@dustymabe	seems simple enough (we just create our own UUID right?) 
2021-09-03 15:02:36	bgilbert	yup
2021-09-03 15:02:43	@dustymabe	do we have to document it anywhere official? 
2021-09-03 15:02:52	bgilbert	the only caveat is that it'll show as "Unknown" in gdisk etc.
2021-09-03 15:03:02	bgilbert	we don't _have_ to, no
2021-09-03 15:03:12	bgilbert	maybe we'd put it on the FCOS storage page
2021-09-03 15:03:25	@dustymabe	:) - i mean - would if be bad to just make a policy of dd the reserved partitions (GUID)
2021-09-03 15:03:40	@dustymabe	and leaving it like it is? 
2021-09-03 15:03:48	@dustymabe	probably not preferrable 
2021-09-03 15:03:55	bgilbert	I think it's better to say what we mean
2021-09-03 15:04:15	bgilbert	I'd almost suggest labelling it BIOS-BOOT, which is pretty much what it is, but the side effect of copying MBR code is a bit odd here
2021-09-03 15:04:39	@dustymabe	is uboot bios?
2021-09-03 15:04:50	bgilbert	even if that copy is a no-op, it conflates two different behaviors
2021-09-03 15:05:08	bgilbert	BIOS-BOOT is where GRUB stores extra bootstrap code that doesn't fit in the MBR
2021-09-03 15:05:16	bgilbert	i.e., generic bootloader code storage
2021-09-03 15:06:14	@dustymabe	we could make the transposefs code just not do the MBR stuff on anything but x86_64 
2021-09-03 15:06:25	bgilbert	true
2021-09-03 15:08:27	pwhalen	dustymabe: the first partition type is because of the Raspberry Pi, some info here - https://bugzilla.redhat.com/show_bug.cgi?id=1975375
2021-09-03 15:10:42	@dustymabe	pwhalen: interesting 
2021-09-03 15:10:55	@dustymabe	pwhalen: got some time for some other questions :) ? 
2021-09-03 15:11:48	bgilbert	pwhalen: ahh, it gets mapped over from the MBR type code for FAT
2021-09-03 15:12:42	@dustymabe	yeah fdisk shows FAT16
2021-09-03 15:12:55	pwhalen	dustymabe: sure, not sure I'll have the answers :) , I think Christian provided a good summary of what we do and why
2021-09-03 15:13:39	@dustymabe	pwhalen: we're just mostly trying to figure out how this extra partition we're adding can be useful in other ways 
2021-09-03 15:14:12	@dustymabe	we're adding a blank partition at the beginning of the disk for "other reasons" and we figure maybe we can try to target more than just UEFI aarch64 boards in the future
2021-09-03 15:14:39	@dustymabe	so we're trying to come up with a reasonable size that might allow us to do that (hopefully not too large) 
2021-09-03 15:15:15	bgilbert	oof, I just noticed that lorbus's example https://pagure.io/arm-image-installer/blob/main/f/socs.d/Rockchips-ARMv8 writes u-boot data outside that partition
2021-09-03 15:15:46	@dustymabe	where does it write it? 
2021-09-03 15:16:55	bgilbert	partially in the first partition, partially in the gap before the first partition
2021-09-03 15:17:02	@dustymabe	oh, fun
2021-09-03 15:17:20	bgilbert	pwhalen: a complication is that we have generic code to optionally convert the boot disk to a mirrored RAID set at first boot
2021-09-03 15:17:29	bgilbert	and that code needs to know how to replicate bootloaders
2021-09-03 15:18:12	bgilbert	so there was some debate over whether it can just dd the first partition to all disks (assuming the second disk becomes the first disk after the real first disk fails)
2021-09-03 15:18:32	bgilbert	but if we might also write boot code into the gap... that's awkward
2021-09-03 15:18:45	@dustymabe	bgilbert: could we somehow make the code just error in this case so we didn't have to consider that complication ? 
2021-09-03 15:19:16	bgilbert	dustymabe: I don't know how transposefs would know it was in that case
2021-09-03 15:19:25	lorbus	there were definitely cases the fw partition where was too big and overwrite the beginning of the next one. The arm-image-installer script doesn't really have knowledge of where the partition starts or ends, and just dd's the fw image over what's on the install media
2021-09-03 15:20:12	bgilbert	pwhalen: how was the 500 MB number chosen?  arbitrary, or is there a known size requirement?
2021-09-03 15:21:55	bgilbert	dustymabe: I'm starting to feel like we should punt on the RAID problem
2021-09-03 15:21:55	pwhalen	I think it was arbitrary, not aware of a requirement for it
2021-09-03 15:23:52	@dustymabe	bgilbert: yeah, I think if a machine is big enough to have multiple disks and want RAID they probably also have UEFI and aren't using u-boot
2021-09-03 15:24:55	bgilbert	so maybe we just document it as a limitation
2021-09-03 15:25:02	<--	ksinny (~quassel@79.140.121.191) has quit (Remote host closed the connection)
2021-09-03 15:25:21	bgilbert	and if the part size is arbitrary, maybe we just go with 500 MB so we're compatibly arbitrary
2021-09-03 15:25:26	@dustymabe	+1 for me
2021-09-03 15:25:36	@dustymabe	I like it 
2021-09-03 15:25:37	pwhalen	dustymabe: right, uboot based systems are going to be small SBC's like the raspberry pis.  
2021-09-03 15:25:39	bgilbert	(unless we don't want to support the u-boot case, and stick with 1 MB)
2021-09-03 15:25:48	bgilbert	pwhalen: +1
2021-09-03 15:26:30	@dustymabe	yeah, i'm torn, but I would like to enable it if we can
2021-09-03 15:26:42	@dustymabe	since I'm sure we're going to get questions about it 
2021-09-03 15:27:04	bgilbert	yeah
2021-09-03 15:27:19	bgilbert	and I'd like to reduce the barrier to Pi support at least
2021-09-03 15:28:13	@dustymabe	so 500M 
2021-09-03 15:28:36	@dustymabe	does this change anything from a partition label or GUID perspective? 
2021-09-03 15:30:14	lorbus	How would one get the device-specifc boot partition into the FCOS image? If that's still a manual step, couldn't the partition be expanded to the required size at that point, without having everybody on a SBBR compliant system waste the 500MB?
2021-09-03 15:31:08	pwhalen	the first partition we use 7% of it mostly for the rpi firmware and uboot 
2021-09-03 15:31:23	@dustymabe	i guess we might need consider RHCOS here and the "waste 500MB" bit 
2021-09-03 15:32:00	pwhalen	 /dev/mmcblk0p1 vfat      501M   31M  471M   7% /boot/efi
2021-09-03 15:32:10	bgilbert	lorbus: depends. if we're using something like the existing shell script, we'd be giving it a much larger problem: it'd need to construct a new GPT image from scratch and copy over the partitions.
2021-09-03 15:32:20	bgilbert	rather than just dd things or mount/copy/umount things
2021-09-03 15:32:56	@dustymabe	pwhalen: maybe we could get by with a smaller partition
2021-09-03 15:33:19	bgilbert	lorbus: (which is doable, and actually coreos-installer already has most of the code to perform that operation)
2021-09-03 15:34:03	lorbus	right - but making arm-image-installer aware of where it writes instead of blindly dd'ing might be desirable anyway
2021-09-03 15:34:16	@dustymabe	i guess what c-i doesn't have is all the u-boot knowledge 
2021-09-03 15:34:18	bgilbert	pwhalen: wait, that first partition is the ESP?
2021-09-03 15:34:45	bgilbert	dustymabe: not saying c-i should understand u-boot, just that it could hypothetically provide a tool to regenerate the disk image
2021-09-03 15:34:53	@dustymabe	+1
2021-09-03 15:35:33	bgilbert	we already ship a 127 MB ESP
2021-09-03 15:36:13	bgilbert	I guess the BZ said that, and I missed it
2021-09-03 15:36:42	pwhalen	bgilbert: it is
2021-09-03 15:36:48	@dustymabe	does that mean it's OK to just dd directly into the ESP?
2021-09-03 15:37:06	bgilbert	dustymabe: in the u-boot case it presumably doesn't need to be an ESP anymore
2021-09-03 15:37:19	@dustymabe	true
2021-09-03 15:37:21	bgilbert	dustymabe: and FCOS no longer automounts the ESP so the only thing that would care is bootupd
2021-09-03 15:37:58	lorbus	bgilbert: +1. When I looked into this (last  year I think?) my thinking was that c-i would probably be the place to add this ability, i.e. rewriting the partition table for arbitrary, custom primary partition sizes (no need for actually understanding u-boot I think) 
2021-09-03 15:38:31	@dustymabe	bgilbert: so.. what i'm hearing is we move the ESP to partition 1 - make it slightly larger - and put the reserved 1M partition at slot 2 ? 
2021-09-03 15:40:17	bgilbert	lorbus: the template would be the s390x DASD code, which does a streaming conversion of GPT to DASD VTOC and realigns the partitions.  and c-i already uses a GPT library.
2021-09-03 15:41:22	bgilbert	pwhalen: does anything require this partition to be partition number 1?
2021-09-03 15:41:44	bgilbert	dustymabe: ^ if not, it'd be nice to keep it at 2 for consistency with x86_64
2021-09-03 15:44:24	pwhalen	bgilbert: yes, the esp partition is where the raspberry pi firmware+uboot & dtb files are also located
2021-09-03 15:45:00	bgilbert	and the Pi requires partition 1?
2021-09-03 15:46:34	bgilbert	....are we going to need to add a hybrid MBR?
2021-09-03 15:49:59	bgilbert	from the large NOTE box in https://www.raspberrypi.org/documentation/computers/raspberry-pi.html#boot-sequence, it looks as though the Pi no longer requires MBR or partition 1
2021-09-03 15:52:00	@dustymabe	`It is no longer necessary for the first partition to be the FAT partition, as the MSD boot will continue to search for a FAT partition beyond the first one.` 
2021-09-03 15:53:32	bgilbert	and "The boot ROM also now supports GUID partitioning"
2021-09-03 15:53:51	bgilbert	dustymabe: if the ESP has been clobbered by u-boot, transposefs would fail anyway
2021-09-03 15:54:23	@dustymabe	so at least it fails and doesn't appear to work?
2021-09-03 15:54:49	bgilbert	...I guess actually that depends on what part of the partition gets clobbered
2021-09-03 15:55:35	bgilbert	#proposed for aarch64 we will add a 1 MB reserved partition at the beginning of the disk as partition 1.  we will document that boot_device.mirror is unsupported for u-boot installs.
2021-09-03 15:56:12	@dustymabe	anything else? 
2021-09-03 15:56:29	bgilbert	that's it
2021-09-03 15:56:37	@dustymabe	should we make our ESP bigger ? 
2021-09-03 15:56:42	bgilbert	128 MB is smaller than IoT's 512, but hopefully good enough for a lot of cases
2021-09-03 15:57:02	bgilbert	we could sweep through the arm-image-installer repo and see what those boards need
2021-09-03 15:57:02	@dustymabe	also we just looked into the pi, do others require it to be different (i.e. first partition?) 
2021-09-03 15:57:08	@dustymabe	right
2021-09-03 15:57:25	@dustymabe	additionally - with the extra 1M partition would we need to teach arm-image-installer about that? 
2021-09-03 15:57:43	bgilbert	oh, that I don't know
2021-09-03 15:57:49	bgilbert	cc lorbus pwhalen for thoughts on proposed ^
2021-09-03 16:00:20	bgilbert	(and thanks for the input, both of you!)
2021-09-03 16:01:10	lorbus	sgtm!
2021-09-03 16:02:54	pwhalen	sounds reasonable to me as well, thanks for the link on the pi- wasnt aware of the change
2021-09-03 16:03:28	@dustymabe	pwhalen: you include the rpi4 files in your images by default? those don't get written by arm-image-installer ? 
2021-09-03 16:04:16	pwhalen	dustymabe: right, all included in the image. the a-i-i just dd's the uboot files when needed 
2021-09-03 16:04:17	lorbus	I think in order to support FCOS in arm-image-installer there are going to be some changes to the script needed anyway - it might also be worth investigating adding that functionality to coreos-installer, since that is also being used by IoT/RHEL for Edge, and might make for a all in all cleaner solution that everybody can then use
2021-09-03 16:06:37	@dustymabe	pwhalen: off the top - do you know which of the boards here are aarch64? 
2021-09-03 16:06:41	@dustymabe	https://pagure.io/arm-image-installer/blob/main/f/boards.d 
2021-09-03 16:07:31	lorbus	dustymabe: https://src.fedoraproject.org/rpms/uboot-tools/blob/rawhide/f/aarch64-boards
2021-09-03 16:09:33	lorbus	however, not all boards that the uboot fw image is built for are supported in a-i-i
2021-09-03 16:09:58	lorbus	(I think, at least last I checked)
2021-09-03 16:10:03	@dustymabe	yeah - i was just going to try to scan and see what images have what size uboot stuff
2021-09-03 16:10:10	@dustymabe	sanity checking the 127M number 
2021-09-03 16:14:48	lorbus	I've installed the uboot-images-armv8 RPM locally, and `tree /usr/share/uboot --du -h` doesn't show any dir bigger than 10MB
2021-09-03 16:18:58	@dustymabe	indeed 
2021-09-03 16:19:14	@dustymabe	ok
2021-09-03 16:19:26	lorbus	however there's also the offset that has to be taken into account. Not sure how big that is and how much that varies
2021-09-03 16:20:48	@dustymabe	bgilbert: that means we should be good for review on https://github.com/coreos/fedora-coreos-tracker/issues/855 - i have some more tests I want to run (mostly actual bare metal install) but I can do that this weekend. Also waiting to hear back from ravanelli on ppc64le
2021-09-03 16:21:49	@dustymabe	really the only thing that changed in the PR from yesterday was added comments (requested by you) 
2021-09-03 16:21:49	lorbus	the offset is e.g. the seek=16384 in https://pagure.io/arm-image-installer/blob/main/f/socs.d/Rockchips-ARMv8#_5
2021-09-03 16:21:50	ravanelli	dustymabe: I'm trying to setup the virt-manager test here now. 
2021-09-03 16:22:00	@dustymabe	ravanelli: +1 - let me know if you need help
2021-09-03 16:22:06	@dustymabe	I can share my virt-install command if that would help 
2021-09-03 16:22:20	ravanelli	dustymabe: That would be great =) 
2021-09-03 16:23:51	@dustymabe	lorbus: what is that - 8M ? 
2021-09-03 16:24:07	@dustymabe	512*16384/1024/1024 
2021-09-03 16:26:27	lorbus	yeah that sounds right to me. I definitely wouldn't expect it to be much bigger than that. 
2021-09-03 16:26:32	@dustymabe	ravanelli: something like https://paste.centos.org/view/6d17c9fe
2021-09-03 16:26:43	@dustymabe	you'll have to tweak it
2021-09-03 16:27:04	@dustymabe	i.e. I ran with `--arch aarch64 --boot uefi` because I was testing aarch64 
2021-09-03 16:30:38	lorbus	16384 seems to be the biggest offset any of the SoCs has, too. There is however this SoC that does something else entirely: https://pagure.io/arm-image-installer/blob/main/f/socs.d/st#_25-36 (is that even an aarch64 SoC tho?)
2021-09-03 16:31:08	@dustymabe	no idea.. obviously we won't support everything 
2021-09-03 16:32:37	lorbus	# u-boot shall be loaded 10MB before the begining of the last 32MB of the DDR.
2021-09-03 16:32:47	lorbus	wondering if that is always the case? ^
2021-09-03 16:33:00	lorbus	s/#/`/, s/./.`/
2021-09-03 16:34:04	lorbus	I'm signing out. Have a great weekend y'all! + Labor Day for all US folks :) 
2021-09-03 16:38:25	@dustymabe	bye lorbus 

@dustymabe
Copy link
Member Author

The aarch64 disk images with the updated partition table were released.

The fix for this went into stable stream release 34.20210821.3.0.

@bgilbert
Copy link
Contributor

Butane 0.14.0 includes the updated boot_device.mirror partition layouts for aarch64 and ppc64le. coreos/coreos-assembler#2677 re-enables the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira for syncing to jira kind/enhancement
Projects
None yet
5 participants