-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All Docker containers fail to start after update to 2983.2.0 #544
Comments
Hi @meltonbw, It looks like SELinux related. What's the return of:
? |
Doing |
Thanks, then could you try to deactivate the In the CI, we run every test with
|
Disabling userns didn't help unfortunately.
No custom SELinux policies and I don't use an orchestrator. Here is the full docker info:
|
Note, I am able to start a |
I think this is connected to the no-new-privileges setting, there are multiple hits when searching for problems with that and selinux. |
I'm still trying to reproduce it - even with |
We have the same issue on some hosts. It looks like the selinux policies differs between the hosts. What is the recommended way to install the default policies? |
@serbaut thanks for raising this too - we're currently investigating. Do you have the same runtime spec: SELinux in enforcing mode, no-new-privileges and so on ? |
The following seems to fix it:
|
We dont have any docker config (node is running k8s). |
@serbaut thanks. Having |
This is how it looks on a bad node
vs
So depending on which image you started with you get different selinux configs. |
@serbaut both nodes are on the same Flatcar version ? |
Yes both are on VERSION_ID=2983.2.0 |
@serbaut that's weird. I started a fresh
and it looks correct according to the tmpfiles configuration:
Could it be related to some manual intervention in the past on this nodes ? |
Strange. Both nodes are created from flatcar-stable-2765.2.3 so I dont know how they can differ like that. Is part of k8s fiddling with selinux configs? We have not touched them afaik. |
I'd be curious to hear from @meltonbw to have a look at this command output:
He's not using Kubernetes though. |
Anything else I can check? |
@serbaut thanks for helping; maybe you can try to run the following:
On the "bad nodes" to identify any errors preventing |
I dont see anything but we only have 90 days of logs. Im starting to suspect AquaSec, Prisma Cloud or k8s but it would be intesting to see if @meltonbw has the same issue. |
I'll check back monday. We dont upgrade our prod nodes until next weekend so as long as we have a fix by then I'm fine :) |
@serbaut thanks for your help and the data you provided - it's really useful. You might be interested to run some |
We actually have that in another cluster but they didnt experience any issue :P |
Very similar to @serbaut's bad setup:
I will try the fix. |
Ok, after trying @serbaut's fix, now Docker does not start, and the daemon is missing from
|
Was getting this in the logs:
I set selinux to Any ideas what beef selinux has with Docker? |
@meltonbw glad to hear you worked-around the issue - SELinux is always full of surprise. Regarding this comment, we can try to make Regarding this:
We can see an AVC message denial:
I'm currently patching our test suite to set enforcing mode at boot time to catch this kind of early boot error. Of course SELinux patch will follow to authorize this |
I have looked some more and it seems the "correct" policies are permissive (the ones with symlinks to /usr).
I still do not understand how some clusters have a different config. |
I'm 99% sure its https://www.aquasec.com/ enforcer that has created custom policies and that's why some clusters have issues and others don't. @meltonbw do you know why your selinux config was customized and not directly upstream via links? |
SELinux patch has been submitted in flatcar-archive/coreos-overlay#1426 to fix the torcx error:
@serbaut I was not aware about aquasec, is that something opensource we can easily test / add to our tests ? |
I don't know if there is an enforcer available for testing but its not open source at least. What it seems to have done is to remove the links and install its own policies. Maybe it is enough to document how to handle customized selinux policies and possible effects. |
As far as I know this was not enough and torcx will still fail |
Description
After an update to 2983.2.0, all Docker containers fail to start with the message:
standard_init_linux.go:228: exec user process caused: operation not permitted
Impact
Cannot start any containers. All containers fail to start at boot.
Environment and steps to reproduce
standard_init_linux.go:228: exec user process caused: operation not permitted
Expected behavior
Containers should start without error. I cannot start a basic container with bash from the CL either:
Additional information
The text was updated successfully, but these errors were encountered: