Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating partitions with Ignition bootstrap format fails when Ignition 3.1 is used. #7679

Closed
invidian opened this issue Dec 2, 2022 · 13 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@invidian
Copy link
Member

invidian commented Dec 2, 2022

What steps did you take and what happened:

Following fragment of KubeadmControlPlane spec:

...
    diskSetup:
      filesystems:
      - device: /dev/disk/azure/scsi1/lun0
        extraOpts:
        - -E
        - lazy_itable_init=1,lazy_journal_init=1
        filesystem: ext4
        label: etcd_disk
        overwrite: false
      partitions:
      - device: /dev/disk/azure/scsi1/lun0
        layout: true
        overwrite: false

Creates a following Ignition fragment:

[
  {
    "device": "/dev/disk/azure/scsi1/lun0",
    "partitions": [
      {}
    ]
  }
]

Which with latest (3374.2.0) Flatcar stable, using Ignition 3.1, produces the following error which makes machine fail to boot:

[    3.847409] ignition[548]: failed to fetch config: unable to translate 2.x ignition to 3.1: error at $.storage.disks.0.partitions.0: a partition number >= 1 or a label must be specified
[    3.853019] ignition[548]: failed to acquire config: unable to translate 2.x ignition to 3.1: error at $.storage.disks.0.partitions.0: a partition number >= 1 or a label must be specified
[    3.853914] systemd[1]: ignition-fetch.service: Main process exited, code=exited, status=1/FAILURE
[    3.854335] ignition[548]: Ignition failed: unable to translate 2.x ignition to 3.1: error at $.storage.disks.0.partitions.0: a partition number >= 1 or a label must be specified
[    3.854686] systemd[1]: ignition-fetch.service: Failed with result 'exit-code'.

What did you expect to happen:

Machine to boot successfully.

Anything else you would like to add:

While migrating to Ignition 3.1 natively would be the best thing to do, I think we can just add a number: N for each partition we create, which should solve the issue.

As a workaround, one can apply number: N themselves using ignition.containerLinuxConfig.additionalConfig with fragment like this:

          storage:
            disks:
            - device: /dev/disk/azure/scsi1/lun0
              partitions:
              - number: 1
            filesystems:
            - name: etcd_disk
              mount:
                device: /dev/disk/azure/scsi1/lun0
                format: ext4
                label: etcd_disk
                options:
                - -E
                - lazy_itable_init=1,lazy_journal_init=1

Environment:

  • Cluster-api version: v1.3.0
  • Kubernetes version: (use kubectl version): v1.23.13
  • OS (e.g. from /etc/os-release): latest (3374.2.0) Flatcar stable

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 2, 2022
@sbueringer
Copy link
Member

Sounds okay to me

@sbueringer
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 2, 2022
@primeroz
Copy link

I have been trying to use this snippet, actually from the example here https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/templates/cluster-template-flatcar.yaml

the issue i am having is that the Disk is properly formatted and labeled but is not mounted

➜ kubectl get kubeadmcontrolplane fctest1 -o json | jq .spec.kubeadmConfigSpec.ignition.containerLinuxConfig.additionalConfig -r 
systemd:
  units:
  - name: kubeadm.service
    dropins:
    - name: 10-flatcar.conf
      contents: |
        [Unit]
        After=oem-cloudinit.service
# Workaround for https://github.com/kubernetes-sigs/cluster-api/issues/7679.
storage:
  disks:
  - device: /dev/disk/azure/scsi1/lun0
    partitions:
    - number: 1
  filesystems:
  - name: etcd_disk
    mount:
      device: /dev/disk/azure/scsi1/lun0
      format: ext4
      label: etcd_disk
      path: /var/lib/etcddisk
      options:
      - -E
      - lazy_itable_init=1,lazy_journal_init=1
➜ kubectl view-secret fctest1-tf2lh value | jq -r .storage.filesystems                                                          
[
  {
    "mount": {
      "device": "/dev/disk/azure/scsi1/lun0",
      "format": "ext4",
      "label": "etcd_disk",
      "options": [
        "-E",
        "lazy_itable_init=1,lazy_journal_init=1"
      ]
    },
    "name": "etcd_disk"
  }
]

should we define the disks both in the spec.diskSetup and in the ignition.additionalConfig ?

@primeroz
Copy link

primeroz commented Feb 22, 2023

Just to follow up, i was able to get around the issue reported here and also get the disks mounted by mixing the .diskSetup and ignition.AdditionalConfig like this

  • define the partition in ignition.AdditionalConfig
➜ kubectl get kubeadmcontrolplane fctest1 -o json | jq .spec.kubeadmConfigSpec.ignition.containerLinuxConfig.additionalConfig -r                                                                                   
                                                                                                                                                                                                                   
systemd:       
  units:   
  - name: kubeadm.service                      
    dropins:
    - name: 10-flatcar.conf                                                                                                                                                                                        
      contents: |                                                                                                                                                                                                  
        [Unit]
        After=oem-cloudinit.service
# Workaround for https://github.com/kubernetes-sigs/cluster-api/issues/7679.
storage:                                     
  disks:            
  - device: /dev/disk/azure/scsi1/lun0
    partitions:                                 
    - number: 1
  • define filesystems and mounts in kubeadmConfigSpec
➜ kubectl get kubeadmcontrolplane fctest1 -o json | jq .spec.kubeadmConfigSpec.diskSetup                                         
{
  "filesystems": [
    {
      "device": "/dev/disk/azure/scsi1/lun0",
      "extraOpts": [
        "-E",
        "lazy_itable_init=1,lazy_journal_init=1"
      ],
      "filesystem": "ext4",
      "label": "etcd_disk",
      "overwrite": false
    }
  ]
}

➜ kubectl get kubeadmcontrolplane fctest1 -o json | jq .spec.kubeadmConfigSpec.mounts   
[
  [
    "etcd_disk",
    "/var/lib/etcddisk"
  ]
]

which renders to

➜ kubectl view-secret fctest1-cdqgf value | jq -r .storage.disks                                                 
[
  {
    "device": "/dev/disk/azure/scsi1/lun0",
    "partitions": [
      {
        "number": 1
      }
    ]
  }
]


➜ kubectl view-secret fctest1-cdqgf value | jq -r .storage.filesystems                                           
[
  {
    "mount": {
      "device": "/dev/disk/azure/scsi1/lun0",
      "format": "ext4",
      "label": "etcd_disk",
      "options": [
        "-E",
        "lazy_itable_init=1,lazy_journal_init=1"
      ]
    },
    "name": "etcd_disk"
  }
]


➜ kubectl view-secret fctest1-cdqgf value | jq -r '.systemd.units[] | select (.name == "var-lib-etcddisk.mount")'
{
  "contents": "[Unit]\nDescription = Mount etcd_disk\n\n[Mount]\nWhat=/dev/disk/azure/scsi1/lun0\nWhere=/var/lib/etcddisk\nOptions=\n\n[Install]\nWantedBy=multi-user.target\n",
  "enabled": true,
  "name": "var-lib-etcddisk.mount"
}

and on node

fctest1-control-plane-e95df458-9xvqn / # blkid | grep sda
/dev/sda: LABEL="etcd_disk" UUID="0d6a6b3e-00b9-4fd6-9e20-86cc2fd3cb90" BLOCK_SIZE="4096" TYPE="ext4"

fctest1-control-plane-e95df458-9xvqn / # df -h /var/lib/etcddisk/
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda        9.8G  125M  9.2G   2% /var/lib/etcddisk

fctest1-control-plane-e95df458-9xvqn / # ls -la /var/lib/etcddisk/etcd/member/
total 16
drwx------. 4 root root 4096 Feb 22 08:41 .
drwx------. 3 root root 4096 Feb 22 08:41 ..
drwx------. 2 root root 4096 Feb 22 08:41 snap
drwx------. 2 root root 4096 Feb 22 08:41 wal

This is not very ideal though, i wonder if

What do you think ?

( sorry for interjecting in this issue btw, i just really want to move away from ubuntu into flatcar :) )

@invidian
Copy link
Member Author

Hmm, thanks for your investigation @primeroz. I indeed didn't verify that the disk is mounted there. That is a serious issue I think. I'll try to investigate and come up with a proper fix next week.

@primeroz
Copy link

👍 thanks a lot for looking into this, let me know if i can help in anyway

@invidian
Copy link
Member Author

it would be better to define everything in ignition.AdditionalConfig and just set the mount unit there as well as a file unless we think the fix to actually use only KubeadmConfigSpec is doable ?

The fix is definitely doable, just requires touching CABPK, getting it released etc, so it may take some time.

it would be better to define everything in ignition.AdditionalConfig and just set the mount unit there as well as a file

We could do that, it's just another variation of a workaround. In any case, some form of it should be sent to CAPZ templates. But I think keeping the diskSetup populated is better, as it can stay consistent with other templates, even though it is not sufficient to have it working. But this is only temporarily of course. Once CABPK is fixed, we can the only remove the ignition.AdditionalConfig addition/override/workaround.

Change the mount rendering section for ignition to use the label defined rather than lookup the actual device in https://github.com/kubernetes-sigs/cluster-api/pull/4172/files#diff-70d30bf6a3b0746676cf485f5e1b66bb52ce9bd5c8276539643aed9d278c1606R112 ( so that we can have the storage setup in ignition.AdditionalConfig but still get mount units )

What do you suggest to do exactly? I think as long mounting disks via mounts/filesystem/partitions combination remains functional, we should be able to change the implementation, though I guess people may rely on it to certain degree.

I'm testing changes to the templates right now to get it fixed. I'll open CAPZ PR once I'm done. Sorry for the delay.

@invidian
Copy link
Member Author

invidian commented Mar 10, 2023

What do you suggest to do exactly? I think as long mounting disks via mounts/filesystem/partitions combination remains functional, we should be able to change the implementation, though I guess people may rely on it to certain degree.

Hm, Perhaps we could use /dev/disk/by-label/ in CABPK to avoid mapping to block devices directly.

EDIT: I've opened kubernetes-sigs/cluster-api-provider-azure#3267 to fix CAPZ.

@primeroz
Copy link

Change the mount rendering section for ignition to use the label defined rather than lookup the actual device in https://github.com/kubernetes-sigs/cluster-api/pull/4172/files#diff-70d30bf6a3b0746676cf485f5e1b66bb52ce9bd5c8276539643aed9d278c1606R112 ( so that we can have the storage setup in ignition.AdditionalConfig but still get mount units )

What do you suggest to do exactly? I think as long mounting disks via mounts/filesystem/partitions combination remains functional, we should be able to change the implementation, though I guess people may rely on it to certain degree.

Hm, Perhaps we could use /dev/disk/by-label/ in CABPK to avoid mapping to block devices directly.

Sorry my statement was not very clear :)

at the moment we do require that the filesystems are defined in the kubeadmConfigspec because of this block , in particular {{- $disk := index $.FilesystemDevicesByLabel $label }}

    {{- $disk := index $.FilesystemDevicesByLabel $label }}
    {{- $mountOptions := slice . 2 }}
    - name: {{ $mountpoint | MountpointName }}.mount
      enabled: true
      contents: |
        [Unit]
        Description = Mount {{ $label }}
        [Mount]
        What={{ $disk }}
        Where={{ $mountpoint }}
        Options={{ Join $mountOptions "," }}
        [Install]
        WantedBy=multi-user.target
    {{- end }}

if we changed it to something like

    {{- $mountOptions := slice . 2 }}
    - name: {{ $mountpoint | MountpointName }}.mount
      enabled: true
      contents: |
        [Unit]
        Description = Mount {{ $label }}
        [Mount]
        What=/dev/disk/by-label/{{ $label }}
        Where={{ $mountpoint }}
        Options={{ Join $mountOptions "," }}
        [Install]
        WantedBy=multi-user.target
    {{- end }}
  • It should work just fine
  • it would , at least, let us configure the whole storage through IgnitionAdditionalConfig instead of splitting it between KubeadmConfigspec and IgnitionAddionalConfig

k8s-infra-cherrypick-robot pushed a commit to k8s-infra-cherrypick-robot/cluster-api-provider-azure that referenced this issue Mar 15, 2023
Workaround for a CABPK bug with conversion of Ignition v2 to v3 as
described in kubernetes-sigs/cluster-api#7679
has been added while Flatcar templates were still in development.

While it boots and does not produce any errors, it turns out that the
disk does not actually get mounted. This is because at the moment 'mounts'
CABPK field requires 'diskSetup.filesystems' to be populated for mapping
disk names to device path.

This commit fixes missing mount.

Signed-off-by: Mateusz Gozdek <mgozdek@microsoft.com>
@k8s-triage-robot
Copy link

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Mar 12, 2024
@fabriziopandini
Copy link
Member

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Apr 12, 2024
@fabriziopandini
Copy link
Member

based on my (limited) understanding, this is related to #9157, so I will dedup; please feel free to re-open in case I'm wrong

/close

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: Closing this issue.

In response to this:

based on my (limited) understanding, this is related to #9157, so I will dedup; please feel free to re-open in case I'm wrong

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

6 participants