Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More openvswitch woes #1274

Open
mike-nguyen opened this issue May 8, 2023 · 18 comments
Open

More openvswitch woes #1274

mike-nguyen opened this issue May 8, 2023 · 18 comments
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@mike-nguyen
Copy link
Member

mike-nguyen commented May 8, 2023

openvswitch %pre scriptlet adds the openvswitch user to the hugetblfs group. Since %pre runs without set -e by default the failures are ignored resulting in worker nodes that do not come online during a cluster install.

These errors are showing up during the rpm-ostree compose:

14:30:05  openvswitch3.1.prein: usermod.rpmostreesave: /etc/passwd.6: lock file already used
14:30:05  openvswitch3.1.prein: usermod.rpmostreesave: cannot lock /etc/passwd; try again later.
@mike-nguyen
Copy link
Member Author

This was addressed by: #1275. It only prevents us from shipping with broken groups for openvswitch

@cgwalters cgwalters changed the title RHCOS 4.13-9.2 ppc64le will occassionally fail running the %pre scriptlets for openvswitch More openvswitch woes Jun 15, 2023
@cgwalters
Copy link
Member

We have a new problem now which is is rhel9 (?) builds picked up https://src.fedoraproject.org/rpms/openvswitch/c/a17c9d439da4f7e3bfec0ce4c3b178232d28d3fb?branch=rawhide it sounds like, and I believe today the problem is that use of sysusers.d just clashes with our previously hardcoded bits here https://github.com/openshift/os/blob/master/passwd#L27 and most importantly here:

os/group

Line 45 in 1f2c0eb

openvswitch:x:800:
(note openvswitch has no groups).

Basically we can do one of two things:

  • Have the user and group data injected via sysusers
  • Hardcode the user data in /usr

But it doesn't make sense to do both. At this point, we could try dropping the hardcoded user/group files from this repo and rely on sysusers (i.e. per machine state). Or we could hardcode the hugetlbfs group.

Now honestly, I think the real fix here is to move openvswithch to use DynamicUser=yes and open the hugetlbfs bits as an earlier privileged operation instead of relying on group access.

@dcbw
Copy link
Contributor

dcbw commented Jun 16, 2023

Or we could hardcode the hugetlbfs group.

@cgwalters note that openvswitch being in hugetlbfs only happens on x86-64 and ARM where we support DPDK. Not on POWER or s390.

@dcbw
Copy link
Contributor

dcbw commented Jun 16, 2023

@cgwalters I'm curious what's actually clashing here though. sysusers.d(5) says it'll do the things it's asked if the group/user doesn't exist yet:

       g
           Create a system group of the specified name should it not exist yet. Note that u implicitly creates a matching group. The group will be created with no password set.

       m
           Add a user to a group. If the user or group do not exist yet, they will be implicitly created.

But we don't get any error logs out of systemd-sysusers about why it's not doing what it's asked... If openvswitch already exists, shouldn't it just ignore /usr/lib/sysusers.d/openvswitch.conf but since hugetlbfs doesn't exist and OVS isn't in it, it should still do all of that?

@dustymabe
Copy link
Member

dustymabe commented Jun 17, 2023

I did a little investigation on this today.

I tried dropping the hardcoded bits:

diff --git a/group b/group
index e86d91b..1fb1db8 100644
--- a/group
+++ b/group
@@ -42,5 +42,3 @@ nfsnobody:x:65534:
 kube:x:994:
 sshd:x:74:
 chrony:x:992:
-openvswitch:x:800:
-hugetlbfs:x:801:
diff --git a/passwd b/passwd
index 673a3d5..893fd8a 100644
--- a/passwd
+++ b/passwd
@@ -24,4 +24,3 @@ nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin
 kube:x:996:994:Kubernetes user:/:/sbin/nologin
 sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
 chrony:x:994:992::/var/lib/chrony:/sbin/nologin
-openvswitch:x:800:800::/:/sbin/nologin

but that didn't work either (the openvswitch user still doesn't get in the hugetlbfs group). I cracked open the resulting qemu qcow (i.e. without booting it) and I see the openvswitch user and group and the hugetlbfs group in the resulting built image (although with different UID/GID so I know the removal of the hardcoded bits had effect).

I think after we remove the hardcoded bits now what's kicking in is this:

%sysusers_create_compat %{SOURCE2}
%ifarch %{dpdkarches}
%sysusers_create_compat %{SOURCE3}
%endif

where

Source2: openvswitch.sysusers
Source3: openvswitch-hugetlbfs.sysusers

and are defined as:

cat openvswitch.sysusers
#Type Name         ID         GECOS                   Home directory  Shell
u     openvswitch  -          "Open vSwitch Daemons"  /               /sbin/nologin

cat openvswitch-hugetlbfs.sysusers
#Type Name         ID         GECOS                   Home directory  Shell
m     openvswitch  hugetlbfs

So that looks normal.. but what is sysusers_create_compat? It's just a macro that calls a bash script. So I imagine some of that logic inside that bash script (i.e. maybe it works during rpm install for dnf, but not rpm-ostree?) is why we end up with openvswitch with no group.

I will note that if on a running instance I remove openvswitch and hugetlbfs entries and rerun SYSTEMD_LOG_LEVEL=debug systemd-sysusers it does create things appropriately.

dustymabe added a commit to dustymabe/os that referenced this issue Jun 17, 2023
The RPM is now using systemd-sysusers fragments [1] so we can drop the
hardcoded definitions. One problem here, though, is that the hugetlbfs
group never gets added to the openswitch user [2] so let's add a
workaround for that for now.

One side effect of this is that it does change the previously defined UID/GIDs
from 800/801 do different values. I assume this is OK because of some of
the discussion in [1].

[1] openshift#1274 (comment)
[2] openshift#1274 (comment)
dustymabe added a commit to dustymabe/os that referenced this issue Jun 17, 2023
The RPM is now using systemd-sysusers fragments [1] so we can drop the
hardcoded definitions. One problem here, though, is that the hugetlbfs
group never gets added to the openswitch user [2] so let's add a
workaround for that for now.

One side effect of this is that it does change the previously defined
UID/GIDs from 800/801 to different values (dynamically generated at build
time). I assume this is OK because of some of the discussion in [1].

[1] openshift#1274 (comment)
[2] openshift#1274 (comment)
@dustymabe
Copy link
Member

WDYT of #1317 at least to get us unblocked for now?

dustymabe added a commit to dustymabe/os that referenced this issue Jun 18, 2023
The RPM is now using systemd-sysusers fragments [1] and the RPM
scriptlets no longer successfully add the `hugetlbfs` group to the
`openvswitch` user [2]. Let's add a workaround for now while we investigate.

[1] openshift#1274 (comment)
[2] openshift#1274 (comment)
dustymabe added a commit to dustymabe/os that referenced this issue Jun 19, 2023
The RPM is now using systemd-sysusers fragments [1] and the RPM
scriptlets no longer successfully add the `hugetlbfs` group to the
`openvswitch` user [2]. Let's add a workaround for now while we investigate.

[1] openshift#1274 (comment)
[2] openshift#1274 (comment)
dustymabe added a commit to dustymabe/os that referenced this issue Jun 19, 2023
The RPM is now using systemd-sysusers fragments [1] so we can drop the
hardcoded definitions. One problem here, though, is that the hugetlbfs
group never gets added to the openswitch user [2] so let's add a
workaround for that for now.

One side effect of this is that it does change the previously defined
UID/GIDs from 800/801 to different values (dynamically generated at build
time). I assume this is OK because of some of the discussion in [1].

[1] openshift#1274 (comment)
[2] openshift#1274 (comment)
@dustymabe
Copy link
Member

At this point we can either merge #1317, which drops the hardcoded bits user/group assignments, or #1318, which just works around the RPM scriptlet not working.

IIUC merging #1317 will cause systems that upgrade to have a different (new) UID/GID for the openvswitch user/group and hugetlbfs group on the next reboot. I'm not sure if this is OK or not.

@LorbusChris
Copy link
Member

It looks like https://src.fedoraproject.org/rpms/systemd/blob/rawhide/f/sysusers.generate-pre.sh and https://pkgs.devel.redhat.com/cgit/rpms/systemd/tree/sysusers.generate-pre.sh?h=rhel-9.2.0 have diverged, and the latter is missing multiple updates to the script.

Specifically, this change that is missing in RHEL looks like the culprit to me: https://src.fedoraproject.org/rpms/systemd/c/f27d461663bec17ad64422682f260f0020ccc7f7?branch=rawhide

openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/os that referenced this issue Jun 20, 2023
The RPM is now using systemd-sysusers fragments [1] and the RPM
scriptlets no longer successfully add the `hugetlbfs` group to the
`openvswitch` user [2]. Let's add a workaround for now while we investigate.

[1] openshift#1274 (comment)
[2] openshift#1274 (comment)
@LorbusChris
Copy link
Member

LorbusChris commented Jun 24, 2023

https://bugzilla.redhat.com/show_bug.cgi?id=2217149
https://gitlab.com/redhat/centos-stream/rpms/systemd/-/merge_requests/79

@knthm
Copy link

knthm commented Aug 30, 2023

This still seems to be a problem on RHCOS 4.14-9.2.

The sysusers.d configurations created by rpmostree now clash with the ones provided by the OS packages for openvswitch and unbound:

systemd-sysusers logs
[core@clust-6hwr5-master-0 sysusers.d]$ systemctl status systemd-sysusers
× systemd-sysusers.service - Create System Users
     Loaded: loaded (/usr/lib/systemd/system/systemd-sysusers.service; static)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf
     Active: failed (Result: exit-code) since Wed 2023-08-30 13:25:16 UTC; 12min ago
   Duration: 1.774s
       Docs: man:sysusers.d(5)
             man:systemd-sysusers.service(8)
    Process: 736 ExecStart=systemd-sysusers (code=exited, status=1/FAILURE)
   Main PID: 736 (code=exited, status=1/FAILURE)
        CPU: 19ms

Aug 30 13:25:16 clust-6hwr5-master-0 systemd-sysusers[736]: /usr/lib/sysusers.d/systemd-timesync.conf:8: Conflict with earlier configuration for user 'systemd-ti>
Aug 30 13:25:16 clust-6hwr5-master-0 systemd-sysusers[736]: Creating group 'hugetlbfs' with GID 978.
Aug 30 13:25:16 clust-6hwr5-master-0 systemd-sysusers[736]: Creating group 'openvswitch' with GID 977.
Aug 30 13:25:16 clust-6hwr5-master-0 systemd-sysusers[736]: Creating group 'unbound' with GID 976.
Aug 30 13:25:16 clust-6hwr5-master-0 systemd-sysusers[736]: Creating user 'openvswitch' (Open vSwitch Daemons) with UID 977 and GID 977.
Aug 30 13:25:16 clust-6hwr5-master-0 systemd-sysusers[736]: Creating user 'unbound' (Unbound DNS resolver) with UID 976 and GID 976.
Aug 30 13:25:16 clust-6hwr5-master-0 systemd-sysusers[736]: /etc/gshadow: Group "unbound" already exists.
Aug 30 13:25:16 clust-6hwr5-master-0 systemd[1]: systemd-sysusers.service: Main process exited, code=exited, status=1/FAILURE
Aug 30 13:25:16 clust-6hwr5-master-0 systemd[1]: systemd-sysusers.service: Failed with result 'exit-code'.
Aug 30 13:25:16 clust-6hwr5-master-0 systemd[1]: Failed to start systemd-sysusers.service - Create System Users.
/var/lib/sysusers.d contents
$ ll /usr/lib/sysusers.d/
total 108
-rw-r--r--. 2 root root  291 Jan  1  1970 00-coreos-nobody.conf
-rw-r--r--. 2 root root 1317 Jan  1  1970 00-coreos-static.conf
-rw-r--r--. 2 root root  984 Jan  1  1970 10-static-extra.conf
-rw-r--r--. 2 root root  240 Jan  1  1970 20-setup-groups.conf
-rw-r--r--. 2 root root  457 Jan  1  1970 20-setup-users.conf
-rw-r--r--. 2 root root   40 Jan  1  1970 30-rpmostree-pkg-group-hugetlbfs.conf
-rw-r--r--. 2 root root   42 Jan  1  1970 30-rpmostree-pkg-group-openvswitch.conf
-rw-r--r--. 2 root root   38 Jan  1  1970 30-rpmostree-pkg-group-unbound.conf
-rw-r--r--. 2 root root   81 Jan  1  1970 35-rpmostree-pkg-user-openvswitch.conf
-rw-r--r--. 2 root root   88 Jan  1  1970 35-rpmostree-pkg-user-unbound.conf
-rw-r--r--. 2 root root   50 Jan  1  1970 40-rpmostree-pkg-usermod-openvswitch-hugetlbfs.conf
-rw-r--r--. 2 root root  359 Jan  1  1970 README
-rw-r--r--. 2 root root 1299 Jan  1  1970 basic.conf
-rw-r--r--. 3 root root  132 Jan  1  1970 chrony.conf
-rw-r--r--. 2 root root   79 Jan  1  1970 clevis.conf
-rw-r--r--. 3 root root  118 Jan  1  1970 dbus.conf
-rw-r--r--. 3 root root   59 Jan  1  1970 dnsmasq.conf
-rw-r--r--. 2 root root  134 Jan  1  1970 openssh-server.conf
-rw-r--r--. 2 root root  189 Jan  1  1970 openvswitch.conf
-rw-r--r--. 3 root root   39 Jan  1  1970 samba.conf
-rw-r--r--. 3 root root  335 Jan  1  1970 systemd-coredump.conf
-rw-r--r--. 2 root root  316 Jan  1  1970 systemd-journal.conf
-rw-r--r--. 3 root root  339 Jan  1  1970 systemd-oom.conf
-rw-r--r--. 2 root root  333 Jan  1  1970 systemd-resolve.conf
-rw-r--r--. 2 root root  344 Jan  1  1970 systemd-timesync.conf
-rw-r--r--. 2 root root  128 Jan  1  1970 tpm2-tss.conf
-rw-r--r--. 2 root root   66 Jan  1  1970 unbound.sysusers

This likely also causes openshift/installer#7265.

@cgwalters
Copy link
Member

Messy. Yes, ultimately we need one "source of truth" for users - what these packages are doing in invoking both useradd at %post time and installing a sysusers file is creating two.

Backporting this logic to RHEL would help indeed.

But, we probably also need to change rpm-ostree to detect this case. Looks like we already have coreos/rpm-ostree#2728

@knthm
Copy link

knthm commented Aug 30, 2023

It looks like https://gitlab.com/redhat/centos-stream/rpms/systemd/-/merge_requests/79 hasn't been backported to https://pkgs.devel.redhat.com/cgit/rpms/systemd/tree/sysusers.generate-pre.sh?h=rhel-9.2.0 yet.

Ah that makes sense. As a lowly customer I don't have access to devel.redhat.com. I'll keep an eye on the Bugzilla bug, thanks!

Yes, ultimately we need one "source of truth" for users.

Agreed, there doesn't seem to be a common pattern in dealing with these system-specific package configurations. I'm don't have the deepest insight into how rpmostree manages systemd configuration, I'm just confused as to why it layers identical configuration on top of what the packages already provide.

@cgwalters
Copy link
Member

Ahhhh OK something was really confusing me - we're not actually seeing this in OCP. It's because
https://github.com/coreos/rpm-ostree/blob/7153ab558ac813b963a55abb5f4892fcd2f9ceca/src/libpriv/rpmostree-container.cxx#L50
and OKD/SCOS is using container layering to build the node image (which is great).

So a workaround today is for the OKD/SCOS build to basically remove the duplicate rpm-ostree sysusers.d entries as part of the container build.
(But we should still fix this in rpm-ostree for sure)

However, coreos/rpm-ostree#4505 would also address this and have other benefits.

@knthm
Copy link

knthm commented Sep 10, 2023

@cgwalters I was able to find some time again to look at this more closely:

It turns out that in my case openshift/installer pulls the following RHCOS image on bin/openshift-install create cluster:

Initial bootstrap/master os-release
NAME="Red Hat Enterprise Linux CoreOS"
ID="rhcos"
ID_LIKE="rhel fedora"
VERSION="414.92.202308032115-0"
VERSION_ID="4.14"
VARIANT="CoreOS"
VARIANT_ID=coreos
PLATFORM_ID="platform:el9"
PRETTY_NAME="Red Hat Enterprise Linux CoreOS 414.92.202308032115-0 (Plow)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:9::coreos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://docs.openshift.com/container-platform/4.14/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
REDHAT_BUGZILLA_PRODUCT_VERSION="4.14"
REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
REDHAT_SUPPORT_PRODUCT_VERSION="4.14"
OPENSHIFT_VERSION="4.14"
RHEL_VERSION="9.2"
OSTREE_VERSION="414.92.202308032115-0"

On the master VM, rpm-ostree - as instructed by bootstrap-pivot.sh - then pulls an FCOS image and layers this on top of RHCOS, which causes the systemd-sysusers issue I've mentioned:

Pivoted master os-release
NAME="Fedora Linux"
VERSION="38.20230907.20.0 (CoreOS)"
ID=fedora
VERSION_ID=38
VERSION_CODENAME=""
PLATFORM_ID="platform:f38"
PRETTY_NAME="Fedora CoreOS 38.20230907.20.0"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:38"
HOME_URL="https://getfedora.org/coreos/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora-coreos/"
SUPPORT_URL="https://github.com/coreos/fedora-coreos-tracker/"
BUG_REPORT_URL="https://github.com/coreos/fedora-coreos-tracker/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=38
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=38
SUPPORT_END=2024-05-14
VARIANT="CoreOS"
VARIANT_ID=coreos
OSTREE_VERSION='38.20230907.20.0'

So I'm actually trying to run OCP, but the installer/bootstrap has other things in mind and pivots to an OKD image.
I'm frankly surprised more things didn't break since this really shouldn't happen. :D

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 10, 2023
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 9, 2024
@travier
Copy link
Member

travier commented Jan 23, 2024

/remove-lifecycle rotten
/lifecycle frozen

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 23, 2024
@openshift-ci openshift-ci bot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

8 participants