-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(psa) make workloads compatible with psa:restricted profile #2820
(psa) make workloads compatible with psa:restricted profile #2820
Conversation
runAsNonRoot: true | ||
seccompProfile: | ||
type: RuntimeDefault | ||
{{- if eq .Values.installType "upstream" }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
1c4ab43
to
e4a9a66
Compare
19f2901
to
231aa71
Compare
In general it lgtm. Seems to hit all the right changes. Nicely done. I still suggest not giving the userId as a degree of freedom to the admin, at least not if we don't have a specific ask for it. The reasons being is that it could lead to backwards compatibility pain later on in case we need to fix the userId for some reason or another. If there's no clear customer need, we're also increasing the complexity of the code for no good reason. Even if we do needed it, I wonder if there's a nice library or pattern in go for storing this kind of configuration so it doesn't have to get passed all the way down the stack. |
Another couple of things that occured to me while I slept is that by default we need to have the runAsUser in the securityContext of the jobs and catalog source pods in the majority of cases. The reason being that we cannot guarantee that the catalog source image or the bundle image have users defined in their containers. But, we kept the default userId value to -1 (i.e. no runAsUser). This means that the new flag is always necessary (except in OCP where SCC will inject the value). Even if we still want to keep the user id flag (rather than use a toggle), we should probably have a different default so that it doesn't have to be used/defined in the majority case. |
8eb6517
to
d420925
Compare
I see your point, I've changed the default value to 1001.
With that ☝🏽 change in the context, the validation
We do have other args that we pass to the containers, and the same backward compatibility argument can be extended to them too no? In fact, if we hardcode the value, and have to change the value, we'll have to actually backport code changes. Whereas if we don't hardcode the values, and we need to change the UID for previous releases, all we need is release manifest changes for previous releases 🎉
On the contrary, if we don't leave this degree of freedom, besides the problem mentioned above, we'll possibly have to keep changing the hardcoded value with evolving PSA enablement effort org wide at present too. Hardcoding "run workload as UID" sounds like brings the exact problems you're trying to avoid in the first place @perdasilva , and the solution is to not harcode it. |
It's more a question of control. If we need to change it, we can, and it's not impact for the user. If we decide we need to pin to a user or another because of the container or other possible future changes, we can easily control that. Up and downstream. I'd just check that we aren't pulling from olm:latest for the controller. If that's not the case then the point is moot. But, just as a general guideline (and that's all it is), don't add more functionality than you need, because you'll end up having to support it. If you are confident it's needed, I'm good with it. Just making sure we're thinking it through. |
Oh, just came to me. Something we'd do a lot at Amazon for our services was to use env vars instead. Those can be easily picked up the code without having to traverse the whole stack. It also avoids all the unit testing issues of using a singleton. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @anik120 that is great but we can not use RunAsUser
We need to remove ALL RunAsUser fields added.
Option A)
Define the USER ID via the Dockerfile image. Therefore, we do NOT need to use the spec RunAsUser in the SecurityContext spec at all and we do not face the issue "container has runAsNonRoot and image will run as root
"
Example:
Docker Image
USER 1001
Then, SecurityContext:
spec:
securityContext:
runAsNonRoot: true
# Please ensure that you can use SeccompProfile and do not use
# if your project must work on old Kubernetes
# versions < 1.19 or on vendors versions which
# do NOT support this field by default (i.e. Openshift < 4.11 )
seccompProfile:
type: RuntimeDefault
...
containers:
- name: controller-manager
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
Result: the containers will be qualified to restricted-v2 in OCP and will run as restricted on k8s
Option B:
- We will need to ensure that where the ServiceAccount used will have the permissions for the required OCP SCC
- RunAsNonRoot should not be set ( must be empty )
labels: | ||
pod-security.kubernetes.io/enforce: baseline | ||
pod-security.kubernetes.io/enforce-version: latest | ||
pod-security.kubernetes.io/warn: baseline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pod-security.kubernetes.io/warn: baseline | |
pod-security.kubernetes.io/warn: restricted |
If the enforce is baseline then we can warn when any container on this namespace is not restricted OR not set up the warning at all. But the warning makes only sense if be less permissive than the enforcement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, we should enforce restricted on the olm namespace! Can remove the warn (and audit) label(s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the operators
namespace, I've updated it to be the following:
pod-security.kubernetes.io/enforce: baseline
pod-security.kubernetes.io/enforce-version: latest
pod-security.kubernetes.io/warn: privileged
pod-security.kubernetes.io/warn-version: latest
From what I've read, I think this means "allow baseline pods, but also allow privileged pod with a warning". Reason for keeping this configuration is that baseline should allow most of the CSVs out there to be installed in that namespace. If there are any operators that require privileged access, we need to allow those, but throw a warning. PLMK if the above configuration doesn't achieve what I think it does.
{{- if eq .Values.installType "upstream" }} | ||
runAsUser: {{ .Values.package.securityContext.runAsUser }} | ||
{{- end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah we should keep security as nerfed as possible. I don't think the controllers need anything above restricted.
@@ -76,6 +87,8 @@ spec: | |||
- --client-ca | |||
- /profile-collector-cert/tls.crt | |||
{{- end }} | |||
- --workload-user-id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here for the downstream is to omit the flag and let SCC inject the user id in the range it wants.
{{- if eq .Values.installType "upstream" }} | ||
runAsUser: {{ .Values.package.securityContext.runAsUser }} | ||
{{- end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point actually. Since we are baking the USER into the olm container we don't need runAsUser for the controller deployments. Only for the catalog source and unpack jobs, since we have less control there.
Regarding validation it's better to fail fast, rather than let PSA catch it. Might cause confusion. Idk that I'd change the runAsNonRoot behavior. More stuff to test. |
7fed4ac
to
b21ed5b
Compare
@perdasilva @camilamacedo86 updated the PR with the following updates:
That makes sense to me. It'll behoove us to take this conversation to the wg meeting and let the community know that we are exposing this flag, and based on the feedback we receive, if we see that there's appetite for needing to set a custom user id for those workloads, we can introduce a new flag and get rid of the boolean one. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: anik120, perdasilva The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
||
--- | ||
apiVersion: v1 | ||
kind: Namespace | ||
metadata: | ||
name: {{ .Values.operator_namespace }} | ||
labels: | ||
pod-security.kubernetes.io/enforce: baseline |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
@@ -2,9 +2,17 @@ apiVersion: v1 | |||
kind: Namespace | |||
metadata: | |||
name: {{ .Values.namespace }} | |||
labels: | |||
pod-security.kubernetes.io/enforce: restricted |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
b21ed5b
to
30eb053
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shows addressing all needs ihmo. It has my
/lgtm
as well.
With the introduction of [Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/#pod-security-admission-labels-for-namespaces), the reccomeneded best practice is to enforce the Restricted policy of admission (see [1] for more details). This PR *) Lables the olm namespace as `enforce:restricted` *) Labels the operators namespace as `enforce:baseline` (to allow existing CSV deployments without securityContext set to deploy in the namespace, which won't be possible with `enforce:resticted`) *) updates the securityContext of olm workload pods(olm-operator, catalog-operator, and CatalogSource registry pods) to adhere to the `Restricted` policy. *) updates the bundle unpacking job to create a pod that adheres to the `Restricted` policy, so that bundles can be unpacked in the `Restricted` namespace. Signed-off-by: Anik Bhattacharjee <anikbhattacharya93@gmail.com>
The test was modifying the `olm.operatornamespace` to an incorrect value, and checking to make sure that the CSV was garbage collected as a result. However, the olm-controller was copying a fresh copy back into the namespace, so whenever the test was able to get a yes reply to the question "is the CSV gone", in the brief window before it was copied back again, the test was passing. This commit fixes that by making sure that if find a CSV that we expected to be garbage collected, it passes if it determines that the CSV is a fresh copy, and not the one modified before. Signed-off-by: Anik Bhattacharjee <anikbhattacharya93@gmail.com>
30eb053
to
9cde3f0
Compare
/hold cancel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold
Because I found an nit to be sorted out it: see: https://github.com/operator-framework/operator-lifecycle-manager/pull/2820/files#r935770339
|
||
--- | ||
apiVersion: v1 | ||
kind: Namespace | ||
metadata: | ||
name: {{ .Values.operator_namespace }} | ||
labels: | ||
pod-security.kubernetes.io/enforce: baseline | ||
pod-security.kubernetes.io/enforce-version: latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anik120 is just a detail, if we add here the latest that means it will get the latest criteria available from the policy. therefore, in the future, it might no longer work with the changes in this code/release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we create an issue to address this scenario: #2827 and it is blocking the release and will get done as follow up so all fine.
/hold cancel
Motivation for the change:
With the introduction of Pod Security Admission, the recommended best practice is to enforce the
Restricted
policy of admission (see [1] for more details).[1] https://kubernetes.io/docs/concepts/security/pod-security-standards/#restricted
Description of the change:
This PR
*) Lables the olm namespace as
enforce:restricted
*) Labels the operators namespace as
enforce:baseline
(to allow existing CSV deploymentswithout securityContext set to deploy in the namespace, which won't be possible with
enforce:resticted
)*) updates the securityContext of olm workload pods(olm-operator, catalog-operator,
and CatalogSource registry pods) to adhere to the
Restricted
policy.*) updates the bundle unpacking job to create a pod that adheres to the
Restricted
policy,so that bundles can be unpacked in the
Restricted
namespaceWill also close #2644 as a byproduct of the need to fix test failures
Architectural changes:
Testing remarks:
Reviewer Checklist
/doc
[FLAKE]
are truly flaky and have an issue