Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flake: ubuntu 2010: systemd: Failed to create ... control group, EPERM #10386

Closed
edsantiago opened this issue May 18, 2021 · 24 comments
Closed
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. rootless

Comments

@edsantiago
Copy link
Member

edsantiago commented May 18, 2021

Starting to see this flake often on ubuntu:

$ podman run ... -t -i -d registry.access.redhat.com/ubi8-init /sbin/init
systemd 239 (239-41.el8_3.2) running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=legacy)
Detected virtualization container-other.
Detected architecture x86-64.

Welcome to Red Hat Enterprise Linux 8.3 (Ootpa)

Set hostname to <7ee143227d95>.
Initializing machine ID from random generator.
Failed to read AF_UNIX datagram queue length, ignoring: No such file or directory
Failed to install release agent, ignoring: Permission denied
Failed to create /user.slice/user-27341.slice/session-3.scope/init.scope control group: Permission denied
Failed to allocate manager object: Permission denied
!!!!!!
Failed to allocate manager object, freezing.
Freezing execution.

Podman systemd [It] podman run container with systemd PID1

@edsantiago edsantiago added flakes Flakes from Continuous Integration rootless labels May 18, 2021
@rhatdan
Copy link
Member

rhatdan commented May 18, 2021

Well it is not SELinux, is there anyway we could check the audit.log to see if there is any seccomp failures?

@edsantiago
Copy link
Member Author

OMG yes there is! Thanks to the foresight of @cevich, audit.log is preserved! For any of the above logs, go to the top, click the Task link, scroll to the bottom, and expand the audit accordion. Example. (I have no idea what to look for, though)

@rhatdan
Copy link
Member

rhatdan commented May 18, 2021

No seccomp failures, that I see.

@rhatdan
Copy link
Member

rhatdan commented May 18, 2021

This looks like ubuntu occasionally has a cgroup file system that is not allowed to be written from within the container.
@giuseppe Any thoughts?

@giuseppe
Copy link
Member

cgroup v1 for systemd doesn't need to be writeable, just /sys/fs/cgroup/systemd.

Are the rootless tests running from a systemd-run --scope --user environment?

@edsantiago
Copy link
Member Author

I don't know what that means. They are running as a user with the magic loginctl enabled, via ansible become_user.

@edsantiago
Copy link
Member Author

Oops - correction, these are failing in CI, not gating tests. I don't know what the setup is for rootless (wrt loginctl) but AFAICT those are run via ssh user@localhost. The interesting thing is that this is e2e tests, not system tests. e2e tests do automatic triple-retries, and in all the cases above, all three fail. That is: once a host gets into a mode where this test fails, it will always fail. A CI retry (which spins up a new VM) then succeeds.

@cevich
Copy link
Member

cevich commented May 19, 2021

Weird.

AFAICT those are run via ssh user@localhost

Correct.

e2e tests do automatic triple-retries

Additionally: The tests run in a randomized order. The "seed" for the order is displayed at the beginning. It's possible to run the tests with a specified seed, to guarantee the order. It might be useful to nail-down whether or not this failure is influenced by another test munging the system, or is the failure ever reproducible when running just this one test in isolation (on a fresh VM)

@edsantiago
Copy link
Member Author

Podman systemd [It] podman run container with systemd PID1

@rhatdan
Copy link
Member

rhatdan commented Jun 2, 2021

Could this be AppArmor causing this? Has anyone looked at the audit.log to see if apparmor is complaining about something?

@edsantiago
Copy link
Member Author

For any log link above: click the link, Press Home key to go to the top of the log, click the Task link at top. That takes you to the Cirrus page for the failed run. I don't see an AppArmor log anywhere, but there's a Run journal tab that includes things like "PID", "scope", "control group", and EPERM.

@cevich
Copy link
Member

cevich commented Jun 3, 2021

IIRC, there is no audit.log on Ubuntu, AppArmor messages are logged into kern.log which is captured under the 'audit' tab. The journal tab also may have clues as Ed explained.

@edsantiago
Copy link
Member Author

@cevich
Copy link
Member

cevich commented Jun 16, 2021

In case it matters: New VM images for Ubuntu were just merged in for use by PR #10451 . This includes (among other things) and updated runc v1.0-rc95.

@edsantiago
Copy link
Member Author

@edsantiago
Copy link
Member Author

Another one

@cevich
Copy link
Member

cevich commented Jul 15, 2021

FWIW, I'm working my way through numerous issues blocking adoption of refreshed VM images in #10829. Not sure if they will have any impact on this issue or not, but I'll keep my eyes out.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@github-actions
Copy link

github-actions bot commented Oct 1, 2021

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Oct 1, 2021

Did the conmon attach fix this Issue?

@edsantiago
Copy link
Member Author

Last seen September 10. Last change to IMAGE_SUFFIX in .cirrus.yml was committed on September 14. I don't know if it includes the new conmon or not... but I'm comfortable closing this with fingers crossed.

Podman systemd [It] podman run container with systemd PID1

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. rootless
Projects
None yet
Development

No branches or pull requests

4 participants