Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aleph file extra attributes gives bootupctl status error #1724

Closed
dustymabe opened this issue May 6, 2024 · 5 comments · Fixed by coreos/fedora-coreos-config#3042
Closed
Labels
jira for syncing to jira

Comments

@dustymabe
Copy link
Member

There is some history here, but the aleph file for a specific release of Fedora CoreOS on the testing and next streams had an extra field in the aleph file:

cat /sysroot/.coreos-aleph-version.json 
{
        "build": "39.20231204.1.0",
        "version": "39.20231204.1.0",
        "ref": "fedora/x86_64/coreos/next",
        "ostree-commit": "3b5484230c1f1299bdab9d52b3663468db482dc37d80bc511bd1866c8b88fac8",
        "imgid": "fedora-coreos-39.20231204.1.0-qemu.x86_64.qcow2"
}
sudo bootupctl status
 Component EFI
   Installed: grub2-efi-x64-1:2.06-109.fc39.x86_64,shim-x64-15.6-2.x86_64
   Update: Available: grub2-efi-x64-1:2.06-119.fc40.x86_64,shim-x64-15.8-3.x86_64
 No components are adoptable.
 error: duplicate field `version` at line 3 column 11

As far as I can tell this only affected one release of next and testing:

  • next
    • 39.20231119.1.0 -> good
    • 39.20231204.1.0 -> bad
    • 39.20240104.1.0 -> good
  • testing
    • 39.20231119.2.0 -> good
    • 39.20231204.2.1 -> bad
    • 39.20240104.2.0 -> good
  • stable
    • 39.20231101.3.0 -> good
    • 39.20231119.3.0 -> good
    • 39.20231204.3.3 -> good
    • 39.20240104.3.0 -> good
    • 39.20240112.3.0 -> good

We halted the 20231217 releases because we detected this problem:

We reverted the original change that caused this behavior in coreos/coreos-assembler#3686

I guess I had thought that the problematic commit hadn't actually gone into any releases but apparently it had gone out in the previous set of testing and next releases (39.20231204.2.1 and 39.20231204.1.0).

@jlebon
Copy link
Member

jlebon commented May 7, 2024

Ouch, good catch. (How did you notice this BTW?)

I was confused at first by the bootupd error here (there's only one version field), but I think it's because it treats the build field as an alias: https://github.com/coreos/bootupd/blob/48fc47b99f263e561759b2626ee413f41cd4368b/src/coreos.rs#L19.

So basically this means that anyone who installed on those specific versions will not be able to use bootupd to update their bootloader. We could ship a systemd unit to fix those nodes, though OTOH the number of nodes that installed on any specific version is quite low and the workaround is trivial. But definitely not opposed.

@dustymabe
Copy link
Member Author

Ouch, good catch. (How did you notice this BTW?)

I logged in to our x86_64 builder recently to see how old it was (so I ran sudo bootupctl status since I knew that would tell us the aleph version) when I was working on coreos/fedora-coreos-pipeline#986

Mostly just chance.

I was confused at first by the bootupd error here (there's only one version field), but I think it's because it treats the build field as an alias: https://github.com/coreos/bootupd/blob/48fc47b99f263e561759b2626ee413f41cd4368b/src/coreos.rs#L19.

Yep. I'm responsible for that code :)

So basically this means that anyone who installed on those specific versions will not be able to use bootupd to update their bootloader. We could ship a systemd unit to fix those nodes, though OTOH the number of nodes that installed on any specific version is quite low and the workaround is trivial. But definitely not opposed.

Correct. Those versions were available as "latest" for a month because we skipped the mid december release.

I think we should ship a systemd unit to fix it for a few reasons:

  1. The affected versions were available for ~twice as long as a typical release.
  2. It should be trivial
  3. It should be limited to testing/next
  4. It could be a learning experience for someone on our teams that aren't familiar with how the machanics of barriers+shipping a fix works.
  5. Any future fixes we may want to apply via bootupctl might be inhibited by this.

@jlebon
Copy link
Member

jlebon commented May 7, 2024

Any future fixes we may want to apply via bootupctl might be inhibited by this.

Good point, agreed.

@dustymabe dustymabe added the jira for syncing to jira label May 8, 2024
jbtrystram added a commit to jbtrystram/fedora-coreos-config that referenced this issue Jun 27, 2024
This causes bootupctl to fails while parsing the file.
The extra field was introduced in coreos/coreos-assembler@c2d37f4
then quickly reverted in coreos/coreos-assembler#3686

Still, a couple of builds (39.20231204.1.0 and 39.20231204.2.1) went out with the change.
Fixing this will allow bootupctl to function properly on nodes deployed with this version.
This jq filter is idempotent so it's safe to run on all nodes.

This should be removed after the next barrier release.

Fixes coreos/fedora-coreos-tracker#1724
jbtrystram added a commit to jbtrystram/fedora-coreos-config that referenced this issue Jun 27, 2024
This causes bootupctl to fails while parsing the file.
The extra field was introduced in coreos/coreos-assembler@c2d37f4
then quickly reverted in coreos/coreos-assembler#3686

Still, a couple of builds (39.20231204.1.0 and 39.20231204.2.1) went out with the change.
Fixing this will allow bootupctl to function properly on nodes deployed with this version.
This jq filter is idempotent so it's safe to run on all nodes.

This should be removed after the next barrier release.

Fixes coreos/fedora-coreos-tracker#1724
jbtrystram added a commit to jbtrystram/fedora-coreos-config that referenced this issue Jun 27, 2024
This causes bootupctl to fails while parsing the file.
The extra field was introduced in coreos/coreos-assembler@c2d37f4
then quickly reverted in coreos/coreos-assembler#3686

Still, a couple of builds (39.20231204.1.0 and 39.20231204.2.1) went out with the change.
Fixing this will allow bootupctl to function properly on nodes deployed with this version.
This jq filter is idempotent so it's safe to run on all nodes.

This should be removed after the next barrier release.

Fixes coreos/fedora-coreos-tracker#1724
jbtrystram added a commit to jbtrystram/fedora-coreos-config that referenced this issue Jun 27, 2024
This causes bootupctl to fails while parsing the file.
The extra field was introduced in coreos/coreos-assembler@c2d37f4
then quickly reverted in coreos/coreos-assembler#3686

Still, a couple of builds (39.20231204.1.0 and 39.20231204.2.1) went out with the change.
Fixing this will allow bootupctl to function properly on nodes deployed with this version.
This jq filter is idempotent so it's safe to run on all nodes.

This should be removed after the next barrier release.

Fixes coreos/fedora-coreos-tracker#1724
jbtrystram added a commit to jbtrystram/fedora-coreos-config that referenced this issue Jun 27, 2024
This causes bootupctl to fails while parsing the file.
The extra field was introduced in coreos/coreos-assembler@c2d37f4
then quickly reverted in coreos/coreos-assembler#3686

Still, a couple of builds (39.20231204.1.0 and 39.20231204.2.1) went out with the change.
Fixing this will allow bootupctl to function properly on nodes deployed with this version.
This jq filter is idempotent so it's safe to run on all nodes.

This should be removed after the next barrier release.

Fixes coreos/fedora-coreos-tracker#1724
jbtrystram added a commit to jbtrystram/fedora-coreos-config that referenced this issue Jun 27, 2024
This causes bootupctl to fails while parsing the file.
The extra field was introduced in coreos/coreos-assembler@c2d37f4
then quickly reverted in coreos/coreos-assembler#3686

Still, a couple of builds (39.20231204.1.0 and 39.20231204.2.1) went out with the change.
Fixing this will allow bootupctl to function properly on nodes deployed with this version.
This jq filter is idempotent so it's safe to run on all nodes.

This should be removed after the next barrier release.

Fixes coreos/fedora-coreos-tracker#1724
jbtrystram added a commit to jbtrystram/fedora-coreos-config that referenced this issue Jun 27, 2024
This causes bootupctl to fails while parsing the file.
The extra field was introduced in coreos/coreos-assembler@c2d37f4
then quickly reverted in coreos/coreos-assembler#3686

Still, a couple of builds (39.20231204.1.0 and 39.20231204.2.1) went out with the change.
Fixing this will allow bootupctl to function properly on nodes deployed with this version.
This jq filter is idempotent so it's safe to run on all nodes.

This should be removed after the next barrier release.

Fixes coreos/fedora-coreos-tracker#1724
jbtrystram added a commit to jbtrystram/fedora-coreos-config that referenced this issue Jun 27, 2024
This causes bootupctl to fails while parsing the file.
The extra field was introduced in coreos/coreos-assembler@c2d37f4
then quickly reverted in coreos/coreos-assembler#3686

Still, a couple of builds (39.20231204.1.0 and 39.20231204.2.1) went out with the change.
Fixing this will allow bootupctl to function properly on nodes deployed with this version.
This jq filter is idempotent so it's safe to run on all nodes.

This should be removed after the next barrier release.

Fixes coreos/fedora-coreos-tracker#1724
jbtrystram added a commit to jbtrystram/fedora-coreos-config that referenced this issue Jun 27, 2024
This causes bootupctl to fails while parsing the file.
The extra field was introduced in coreos/coreos-assembler@c2d37f4
then quickly reverted in coreos/coreos-assembler#3686

Still, a couple of builds (39.20231204.1.0 and 39.20231204.2.1) went out with the change.
Fixing this will allow bootupctl to function properly on nodes deployed with this version.
This jq filter is idempotent so it's safe to run on all nodes.

This should be removed after the next barrier release.

Fixes coreos/fedora-coreos-tracker#1724
jbtrystram added a commit to jbtrystram/fedora-coreos-config that referenced this issue Jun 27, 2024
This causes bootupctl to fails while parsing the file.
The extra field was introduced in coreos/coreos-assembler@c2d37f4
then quickly reverted in coreos/coreos-assembler#3686

Still, a couple of builds (39.20231204.1.0 and 39.20231204.2.1) went out with the change.
Fixing this will allow bootupctl to function properly on nodes deployed with this version.
This jq filter is idempotent so it's safe to run on all nodes.

This should be removed after the next barrier release.

Fixes coreos/fedora-coreos-tracker#1724
jlebon pushed a commit to jbtrystram/fedora-coreos-config that referenced this issue Jun 27, 2024
Due to an ordering mishap, some builds have both a `version` and a
`build` field. This causes bootupctl to fail while parsing the file.

Detect this case, and fix the aleph if necessary by removing the `build`
field.

This should be removed after the next barrier release.

Fixes: coreos/fedora-coreos-tracker#1724
jlebon pushed a commit to jbtrystram/fedora-coreos-config that referenced this issue Jun 27, 2024
Due to an ordering mishap, some builds have both a `version` and a
`build` field. This causes bootupctl to fail while parsing the file.

Detect this case, and fix the aleph if necessary by removing the `build`
field.

This should be removed after the next barrier release.

Fixes: coreos/fedora-coreos-tracker#1724
jlebon added a commit to jbtrystram/fedora-coreos-config that referenced this issue Jun 27, 2024
Due to an ordering mishap, some builds have both a `version` and a
`build` field. This causes bootupctl to fail while parsing the file.

Detect this case, and fix the aleph if necessary by removing the `build`
field.

This should be removed after the next barrier release.

Fixes: coreos/fedora-coreos-tracker#1724

Co-authored-by: Jonathan Lebon <jonathan@jlebon.com>
jlebon added a commit to jbtrystram/fedora-coreos-config that referenced this issue Jun 27, 2024
Due to an ordering mishap, some builds have both a `version` and a
`build` field. This causes bootupctl to fail while parsing the file.

Detect this case, and fix the aleph if necessary by removing the `build`
field.

This should be removed after the next barrier release.

Fixes: coreos/fedora-coreos-tracker#1724

Co-authored-by: Jonathan Lebon <jonathan@jlebon.com>
@marmijo marmijo added status/pending-testing-release Fixed upstream. Waiting on a testing release. status/pending-next-release Fixed upstream. Waiting on a next release. labels Jun 27, 2024
@marmijo
Copy link
Member

marmijo commented Jul 8, 2024

The fix for this went into testing stream release 40.20240701.1.0. Please try out the new release and report issues.

@marmijo marmijo added status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. and removed status/pending-testing-release Fixed upstream. Waiting on a testing release. status/pending-next-release Fixed upstream. Waiting on a next release. labels Jul 8, 2024
@marmijo
Copy link
Member

marmijo commented Jul 19, 2024

The fix for this went into stable stream release 40.20240701.3.0.

@marmijo marmijo removed the status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. label Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira for syncing to jira
Projects
None yet
3 participants