Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FCOS pipeline failing with error during systemctl --user --global preset-all #290

Closed
jlebon opened this issue Oct 7, 2019 · 20 comments
Closed
Labels

Comments

@jlebon
Copy link
Member

jlebon commented Oct 7, 2019

The bodhi-updates stream is failing with:

+ systemctl --user --global preset-all
Created symlink /etc/systemd/user/timers.target.wants/systemd-tmpfiles-clean.timer -> /usr/lib/systemd/user/systemd-tmpfiles-clean.timer.
Created symlink /etc/systemd/user/basic.target.wants/systemd-tmpfiles-setup.service -> /usr/lib/systemd/user/systemd-tmpfiles-setup.service.
Created symlink /etc/systemd/user/sockets.target.wants/dbus.socket -> /usr/lib/systemd/user/dbus.socket.
Created symlink /etc/systemd/user/dbus.service -> /usr/lib/systemd/user/dbus-broker.service.
Created symlink /etc/systemd/user/multi-user.target.wants/io.podman.service -> /usr/lib/systemd/user/io.podman.service.
Created symlink /etc/systemd/user/sockets.target.wants/io.podman.socket -> /usr/lib/systemd/user/io.podman.socket.
Assertion 'changes[i].type < 0' failed at ../src/shared/install.c:398, function unit_file_dump_changes(). Aborting.
/usr/bin/rpmostree-postprocess-inline-0: line 6:     6 Aborted                 systemctl --user --global preset-all
�[31m�[1merror: �[22m�[0mPostprocessing: While executing inline postprocessing script '0': Executing bwrap(/usr/bin/rpmostree-postprocess-inline-0): Child process killed by signal 6
+ rc=1
@ajeddeloh
Copy link
Contributor

Looks like this assertion https://github.com/systemd/systemd/blob/master/src/shared/install.c#L380, sounds like a systemd bug

@zhengxiaomei123
Copy link

Hi , I met this problem too. with it , I can't build fCOS image. How to work around it? Thanks.

@cgwalters
Copy link
Member

One interesting thing with this issue is that since we have lockfiles on x86_64, anyone doing local builds on x86_64 won't hit this. The main FCOS pipeline is hitting it as it's attempting to update the lockfiles.

Since we don't have lockfiles on altarches right now, you'll get the latest systemd.

How to work around it? Thanks.

So it should work to pin to an earlier version of systemd in the manifest, something like:

walters@toolbox ~/s/g/c/fedora-coreos-config> git diff
diff --git a/bootable-rpm-ostree.yaml b/bootable-rpm-ostree.yaml
index e3d4759..d885173 100644
--- a/bootable-rpm-ostree.yaml
+++ b/bootable-rpm-ostree.yaml
@@ -7,7 +7,7 @@
 packages:
  # Kernel + systemd.  Note we explicitly specify kernel-{core,modules}
  # because otherwise depsolving could bring in kernel-debug.
- - kernel kernel-core kernel-modules systemd
+ - kernel kernel-core kernel-modules systemd-241-12.git1e19bcd.fc30
  # rpm-ostree
  - rpm-ostree nss-altfiles
 

@jlebon
Copy link
Member Author

jlebon commented Oct 8, 2019

Right, to clarify, this is the bodhi-updates stream (added this to the initial description). bodhi-updates lockfiles are automatically promoted to testing-devel after a successful compose and test. So indeed testing-devel and testing are naturally still frozen on the last successful pkgset.

For multiarch, one can do as @cgwalters suggested. Another approach to get closer to what testing-devel is currently is to only keep the fedora-coreos-pool repo active and dnf download the few missing basearch-specific packages you need into overrides/rpm.

@jlebon
Copy link
Member Author

jlebon commented Oct 8, 2019

containers/podman#4218

jlebon added a commit to jlebon/libpod that referenced this issue Oct 8, 2019
Using `Also=` means that the target unit will also be
installed/uninstalled together with our unit. Doing
`Also=multi-user.target` essentially says: disable `multi-user.target`
if `io.podman.socket` is disabled, which sounds... not at all like
what we want.

In practice, systemd thankfully ignores this (likely because it's the
default target). I think having `Also=io.podman.socket` in the
`io.podman.service` already does what we want here: it gets installed
under `sockets.target` whenever the service is. (And the fact that
systemd ignored this means that it wasn't actually playing a role in
resolving containers#3998.)

This was causing `systemctl preset-all` to dump core in Fedora CoreOS:
coreos/fedora-coreos-tracker#290

(Likely there's a systemd bug around here too.)

Signed-off-by: Jonathan Lebon <jonathan@jlebon.com>
@dustymabe
Copy link
Member

dustymabe commented Oct 8, 2019

Any idea why we are only see this in FCOS. F30 Silverblue had that version of podman. Is it because we're only calling systemctl preset-all in FCOS?

@cgwalters
Copy link
Member

Is it because we're only calling systemctl preset-all in FCOS?

Yes.

@dustymabe
Copy link
Member

Also should we pin podman until the problem is fixed ?

@menantea
Copy link

menantea commented Oct 8, 2019

Not sure to understand what the problem is really but I work around forcing podman-1.5.1-3.fc30 in fedora-coreos-base.yaml

@jcajka
Copy link
Contributor

jcajka commented Oct 8, 2019

For the record, downgrading/pinning podman(to older version available on mirrors) (on ppc64le) workarounds the issue,

diff --git a/fedora-coreos-base.yaml b/fedora-coreos-base.yaml
index 45b5432..9499c2d 100644
--- a/fedora-coreos-base.yaml
+++ b/fedora-coreos-base.yaml
@@ -103,7 +103,7 @@ packages:
   # SSH
   - openssh-server openssh-clients
   # Containers
-  - podman skopeo runc systemd-container
+  - podman-1.2.0-2.git3bd528e.fc30 skopeo runc systemd-container
   - fuse-overlayfs slirp4netns
   # Remote IPC for podman
   - libvarlink-util

This aligns with the libpod issues and with that there has been update to podman going out on this weekend(thanks to @menantea for pointing me in that direction)

@dustymabe
Copy link
Member

@jcajka the version you chose is super old. Maybe you should use the one @menantea suggested: podman-1.5.1-3.fc30.

@tuan-hoang1
Copy link

@jcajka : podman 1.2 is around March 2019, and you would not get rootless sudo working with that version.

@jlebon
Copy link
Member Author

jlebon commented Oct 8, 2019

Also should we pin podman until the problem is fixed ?

This goes back to the discussion we had about the relationship bodhi-updates plays (there's coreos/fedora-coreos-config#104 (comment) about this, but we also discussed this a bunch of times elsewhere :) ).

@jlebon
Copy link
Member Author

jlebon commented Oct 8, 2019

To be more explicit on this, bodhi-updates is how we get... well Bodhi updates into testing-devel. If we don't use overrides there to work around regressions, that also means blocking all other updates as well. Using overrides in bodhi-updates implies having it pull from coreos-pool as well.

Hmm, I think what we actually want here isn't to promote lockfiles from bodhi-updates to testing-devel, but instead it's to have the "lockfile updater" run on testing-devel with cosa fetch --update-lockfile and PR'ing back the result.

And bodhi-updates then can just be "pure Fedora stable + updates repos", and it can remain broken for days without affecting us.

So in this setup, here's what would have happened:

  • the lockfile updater would've opened a PR against testing-devel to update the lockfile
  • CI would fail
  • we would debug, then added an override for podman on testing-devel on that same PR
  • CI would now pass
  • we would merge the PR

@jcajka
Copy link
Contributor

jcajka commented Oct 8, 2019

@jcajka the version you chose is super old. Maybe you should use the one @menantea suggested: podman-1.5.1-3.fc30.

Yeah it is super old, I have just started to bisect and this has been first older version available on mirrors(without need to setup special repository).

@dustymabe
Copy link
Member

Hmm, I think what we actually want here isn't to promote lockfiles from bodhi-updates to testing-devel, but instead it's to have the "lockfile updater" run on testing-devel with cosa fetch --update-lockfile and PR'ing back the result.

I think you are right. Basically the "bodhi updates can stay broken" part works just fine as long as the rpm-ostree compose still works. Once the compose is broken then we stop receiving updates into testing-devel until the compose is no longer broken, which is not exactly what we want.

I was thinking a way around this would be to just make it so sometimes we override things in the bodhi stream too, but maybe the better approach is to just do what you say and have another process that updates lockfiles.

@jcajka
Copy link
Contributor

jcajka commented Oct 8, 2019

Right, to clarify, this is the bodhi-updates stream (added this to the initial description). bodhi-updates lockfiles are automatically promoted to testing-devel after a successful compose and test. So indeed testing-devel and testing are naturally still frozen on the last successful pkgset.

For multiarch, one can do as @cgwalters suggested. Another approach to get closer to what testing-devel is currently is to only keep the fedora-coreos-pool repo active and dnf download the few missing basearch-specific packages you need into overrides/rpm.

IMHO it would be good to have the same "package"/pin set as on x86_64, until there will be pipelines for the each individual arch. AFAIK there shouldn't be any packages that are not part of the Fedora and are needed on non x86_64 arches. Assuming this will not block arch specific packages(packages that only exist on one arch, like s390-utils or powerpc-utils). Could you point me in direction how to enable that for all currently known arches to cosa?

@jlebon
Copy link
Member Author

jlebon commented Oct 8, 2019

IMHO it would be good to have the same "package"/pin set as on x86_64, until there will be pipelines for the each individual arch.

Those lockfiles are created by cosa itself, so it requires having multi-arch hardware (see #262).

jlebon added a commit to jlebon/fedora-coreos-config that referenced this issue Oct 9, 2019
coreos/fedora-coreos-tracker#290

The fix for this was merged, but it'll be some time before it makes it
to the stable repos. This is a temporary hack to get updates going again
into FCOS. Working on a cleaner approach to this in:

coreos/fedora-coreos-tracker#293
@jlebon
Copy link
Member Author

jlebon commented Oct 9, 2019

Short-term hack in coreos/fedora-coreos-config#195 to get updates flowing again for now.

jlebon added a commit to coreos/fedora-coreos-config that referenced this issue Oct 9, 2019
coreos/fedora-coreos-tracker#290

The fix for this was merged, but it'll be some time before it makes it
to the stable repos. This is a temporary hack to get updates going again
into FCOS. Working on a cleaner approach to this in:

coreos/fedora-coreos-tracker#293
jlebon added a commit to jlebon/fedora-coreos-config that referenced this issue Oct 15, 2019
jlebon added a commit to jlebon/fedora-coreos-config that referenced this issue Oct 18, 2019
jlebon added a commit to jlebon/fedora-coreos-config that referenced this issue Oct 21, 2019
jlebon added a commit to coreos/fedora-coreos-config that referenced this issue Oct 21, 2019
@jlebon
Copy link
Member Author

jlebon commented Oct 31, 2019

Final patch dropping the podman pin in coreos/fedora-coreos-config#216. I think we can close this afterwards.

@jlebon jlebon closed this as completed Oct 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants