Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Ignition partitioning via MachineConfig and ensure it works #384

Closed
cgwalters opened this issue Sep 26, 2019 · 15 comments
Closed
Labels
jira lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@cgwalters
Copy link
Member

Playing with this https://github.com/cgwalters/playground/blob/master/machineconfigs/var-log/20-var-log.yaml

Passing it to the installer via additional manifests.

Inline:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 50-var-log
spec:
  config:
    ignition:
      version: 2.2.0
    storage:
      disks:
      - device: /dev/vda
        partitions:
        - label: var-log
          number: 0
          size: 2097152
      filesystems:
      - mount:
          device: /dev/disk/by-partlabel/var-log
          format: xfs
          label: var-log
          wipeFilesystem: true
        name: var-log
    systemd:
      units:
      - contents: '[Unit]

          Before=local-fs.target

          [Mount]

          Where=/var/log

          What=/dev/disk/by-partlabel/var-log

          [Install]

          WantedBy=local-fs.target

          '
        enabled: true
        name: var-log.mount

Currently ending up in emergency mode:
Partitions: op(3): op(4): [failed] reading partition table of "/run/ignition/dev_aliases/dev/vda": failed to lookup attribute on "/run/ignition/dev_aliases/dev/vda"

@cgwalters
Copy link
Member Author

Eventually here I think we should add some sugar for this to both FCCT and MachineConfig.

Also, we want to align with FCOS on rootfs repartitioning once that code/support lands.

@ashcrow ashcrow added the jira label Sep 26, 2019
@bh7cw
Copy link

bh7cw commented Jul 16, 2020

Tested on rhcos, and it works.

variant: fcos
version: 1.1.0
passwd:
  users:
    - name: core
      ssh_authorized_keys:
        - ssh-rsa AAAAB3N...
storage:
  disks:
    - device: /dev/sda
      partitions:
        - number: 0
          label: var-log
          size_mib: 0
          start_mib: 0
  filesystems:
    - device: /dev/disk/by-partlabel/var-log
      path: /var/log
      format: xfs
      wipe_filesystem: true
      with_mount_unit: true

Test Result:
[core@localhost ~]$ lsblk /dev/sda
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 16G 0 disk
├─sda1 8:1 0 384M 0 part /boot
├─sda2 8:2 0 127M 0 part /boot/efi
├─sda3 8:3 0 1M 0 part
├─sda4 8:4 0 3G 0 part
│ └─coreos-luks-root-nocrypt 253:0 0 3G 0 dm /sysroot
└─sda5 8:5 0 12.5G 0 part /var/log
[core@localhost ~]$ ls /dev/disk/by-partlabel/
BIOS-BOOT boot EFI-SYSTEM luks_root var-log

@bh7cw
Copy link

bh7cw commented Jul 16, 2020

Testing on rhcos works. It boots successfully and shows expected features. But it failed to create a cluster. Shall we close this issue and open a new issue on MCO? So, we can test again after OCP supports Ignition spec v3. May need some discussion here. @cgwalters @ashcrow

@ashcrow
Copy link
Member

ashcrow commented Jul 17, 2020

@bh7cw 👍 Moving to the MCO makes sense to me. If there is a bug we'll want to write a BZ up from the MCO folks. /cc @runcom

@bh7cw
Copy link

bh7cw commented Aug 11, 2020

Tested on AWS it works:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-master-ssh
spec:
  config:
    ignition:
      version: 3.1.0
    storage:
      disks:
      - device: /dev/nvme0n1
        wipeTable: false
        partitions:
        - sizeMiB: 10240
          startMiB: 10240
          label: var
      filesystems:
        - path: /var
          device: /dev/disk/by-partlabel/var
          format: xfs
    systemd:
      units:
        - name: var.mount
          enabled: true
          contents: |
            [Unit]
            Before=local-fs.target
            [Mount]
            Where=/var
            What=/dev/disk/by-partlabel/var
            [Install]
            WantedBy=local-fs.target
  fips: false
  kernelArguments: null
  kernelType: ""
  osImageURL: ""

Test result:

sh-4.4# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0 0 120G 0 disk
|-nvme0n1p1 259:1 0 384M 0 part /boot
|-nvme0n1p2 259:2 0 127M 0 part /boot/efi
|-nvme0n1p3 259:3 0 1M 0 part
|-nvme0n1p4 259:4 0 9.5G 0 part
| '-coreos-luks-root-nocrypt 253:0 0 9.5G 0 dm /sysroot
'-nvme0n1p5 259:5 0 10G 0 part /var

@miabbott
Copy link
Member

@chrisnegus This snippet would be useful to include in the OCP 4.6 docs ^^

@chrisnegus
Copy link

Thanks @miabbott . @bh7cw contacted me about this yesterday and provided additional information as well.

@bh7cw
Copy link

bh7cw commented Aug 17, 2020

@chrisnegus Thanks for documenting the feature. Please let me know if I can help. 🥰

@chrisnegus
Copy link

@bh7cw Your example uses /var. The question came up about what a customer should and should not make into a separate mount point. My understanding is that we should recommend that, of the existing RHCOS file systems, they should not create a separate partition for any directory other than /var. (So no separate mount point for /var/lib/containers, for example.) They could, however, create any partition they like outside of the existing RHCOS file system structure. For example, a separate /mydata partition would be fine. Thoughts?

@cgwalters
Copy link
Member Author

For example, a separate /mydata partition would be fine. Thoughts?

No, that will always fail. /var is "system data" and is the only place code/admins should be creating data (more generally, only /var and /etc should be considered writable), so /mydata will always fail, see coreos/rpm-ostree#337

So it's really easy to explain: Use /var or a subdirectory of it (and possibly reconfigure the rootfs) - that's it.

(So no separate mount point for /var/lib/containers, for example.)

This one seems quite reasonable actually to split off and I believe we are explicitly testing and supporting that today.

@chrisnegus
Copy link

@cgwalters Thanks for clarifying!

@bh7cw
Copy link

bh7cw commented Aug 21, 2020

Tested on /var/log, this works:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 98-var-log-partition
spec:
  config:
    ignition:
      version: 3.1.0
    storage:
      disks:
      - device: /dev/nvme0n1
        wipeTable: false
        partitions:
        - sizeMiB: 47000
          startMiB: 47000
          label: var-log
      filesystems:
        - path: /var/log
          device: /dev/disk/by-partlabel/var-log
          format: xfs
    systemd:
      units:
        - name: var-log.mount
          enabled: true
          contents: |
            [Unit]
            Before=local-fs.target
            [Mount]
            Where=/var/log
            What=/dev/disk/by-partlabel/var-log
            [Install]
            WantedBy=local-fs.target

Test result on worker node:

[core@ip ~]$ lsblk
NAME                         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
nvme0n1                      259:0    0  120G  0 disk 
├─nvme0n1p1                  259:1    0  384M  0 part /boot
├─nvme0n1p2                  259:2    0  127M  0 part /boot/efi
├─nvme0n1p3                  259:3    0    1M  0 part 
├─nvme0n1p4                  259:4    0 45.4G  0 part 
│ └─coreos-luks-root-nocrypt 253:0    0 45.4G  0 dm   /sysroot
└─nvme0n1p5                  259:5    0 45.9G  0 part /var/log

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 19, 2020
@bh7cw
Copy link

bh7cw commented Nov 19, 2020

This issue has been completed. /cc @cgwalters Do you agree to close it?

@cgwalters
Copy link
Member Author

Yep, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

7 participants