(psa) make workloads compatible with psa:restricted profile #2820

anik120 · 2022-07-22T23:29:11Z

Motivation for the change:

With the introduction of Pod Security Admission, the recommended best practice is to enforce the Restricted policy of admission (see [1] for more details).

[1] https://kubernetes.io/docs/concepts/security/pod-security-standards/#restricted

Description of the change:

This PR
*) Lables the olm namespace as enforce:restricted
*) Labels the operators namespace as enforce:baseline (to allow existing CSV deployments
without securityContext set to deploy in the namespace, which won't be possible with
enforce:resticted)
*) updates the securityContext of olm workload pods(olm-operator, catalog-operator,
and CatalogSource registry pods) to adhere to the Restricted policy.
*) updates the bundle unpacking job to create a pod that adheres to the Restricted policy,
so that bundles can be unpacked in the Restricted namespace

Will also close #2644 as a byproduct of the need to fix test failures

Architectural changes:

Testing remarks:

Reviewer Checklist

perdasilva · 2022-07-25T14:59:15Z

deploy/chart/templates/_packageserver.deployment-spec.yaml

+        runAsNonRoot: true
+        seccompProfile:
+          type: RuntimeDefault
+        {{- if eq .Values.installType "upstream" }}


deploy/chart/templates/0000_50_olm_00-namespace.yaml

pkg/controller/registry/reconciler/reconciler.go

perdasilva · 2022-07-28T01:57:28Z

In general it lgtm. Seems to hit all the right changes. Nicely done.
In case I missed it, it's probably worthwhile adding input validation to the runAsUser. I think at the moment 0 is allowed (as input), but won't end up stamping out a runAsUser (which is fine). But, without the validation it could be a bit confusing for the user.

I still suggest not giving the userId as a degree of freedom to the admin, at least not if we don't have a specific ask for it. The reasons being is that it could lead to backwards compatibility pain later on in case we need to fix the userId for some reason or another. If there's no clear customer need, we're also increasing the complexity of the code for no good reason. Even if we do needed it, I wonder if there's a nice library or pattern in go for storing this kind of configuration so it doesn't have to get passed all the way down the stack.

perdasilva · 2022-07-28T13:22:03Z

Another couple of things that occured to me while I slept is that by default we need to have the runAsUser in the securityContext of the jobs and catalog source pods in the majority of cases. The reason being that we cannot guarantee that the catalog source image or the bundle image have users defined in their containers. But, we kept the default userId value to -1 (i.e. no runAsUser). This means that the new flag is always necessary (except in OCP where SCC will inject the value). Even if we still want to keep the user id flag (rather than use a toggle), we should probably have a different default so that it doesn't have to be used/defined in the majority case.

anik120 · 2022-07-28T20:33:59Z

Another couple of things that occured to me while I slept is that by default we need to have the runAsUser in the securityContext of the jobs and catalog source pods in the majority of cases. The reason being that we cannot guarantee that the catalog source image or the bundle image have users defined in their containers. But, we kept the default userId value to -1 (i.e. no runAsUser). This means that the new flag is always necessary (except in OCP where SCC will inject the value). Even if we still want to keep the user id flag (rather than use a toggle), we should probably have a different default so that it doesn't have to be used/defined in the majority case.

I see your point, I've changed the default value to 1001.

In case I missed it, it's probably worthwhile adding input validation to the runAsUser. I think at the moment 0 is allowed (as input), but won't end up stamping out a runAsUser (which is fine). But, without the validation it could be a bit confusing for the user.

With that ☝🏽 change in the context, the validation if runAsUser >= 0{ set UserID} allows for any positive numerical value, including 0 (root), which is fine since if someone sets runAsNonRoot but sets runAsUser to 0, PSA controller will help us out by denying admission. But also, if for whatever reason, we do need to change back to runAsRoot, we can easily set UID to 0 without any code change.

I still suggest not giving the userId as a degree of freedom to the admin, at least not if we don't have a specific ask for it. The reasons being is that it could lead to backwards compatibility pain later on in case we need to fix the userId for some reason or another.

We do have other args that we pass to the containers, and the same backward compatibility argument can be extended to them too no? In fact, if we hardcode the value, and have to change the value, we'll have to actually backport code changes. Whereas if we don't hardcode the values, and we need to change the UID for previous releases, all we need is release manifest changes for previous releases 🎉

If there's no clear customer need, we're also increasing the complexity of the code for no good reason

On the contrary, if we don't leave this degree of freedom, besides the problem mentioned above, we'll possibly have to keep changing the hardcoded value with evolving PSA enablement effort org wide at present too.

Hardcoding "run workload as UID" sounds like brings the exact problems you're trying to avoid in the first place @perdasilva , and the solution is to not harcode it.

perdasilva · 2022-07-28T21:50:04Z

It's more a question of control. If we need to change it, we can, and it's not impact for the user. If we decide we need to pin to a user or another because of the container or other possible future changes, we can easily control that. Up and downstream. I'd just check that we aren't pulling from olm:latest for the controller. If that's not the case then the point is moot. But, just as a general guideline (and that's all it is), don't add more functionality than you need, because you'll end up having to support it. If you are confident it's needed, I'm good with it. Just making sure we're thinking it through.

perdasilva · 2022-07-28T21:56:43Z

Oh, just came to me. Something we'd do a lot at Amazon for our services was to use env vars instead. Those can be easily picked up the code without having to traverse the whole stack. It also avoids all the unit testing issues of using a singleton.

camilamacedo86

Hi @anik120 that is great but we can not use RunAsUser
We need to remove ALL RunAsUser fields added.

Option A)

Define the USER ID via the Dockerfile image. Therefore, we do NOT need to use the spec RunAsUser in the SecurityContext spec at all and we do not face the issue "container has runAsNonRoot and image will run as root "

Example:
Docker Image
USER 1001

Then, SecurityContext:

    spec:
      securityContext:
        runAsNonRoot: true
        # Please ensure that you can use SeccompProfile and do not use
        # if your project must work on old Kubernetes
        # versions < 1.19 or on vendors versions which
        # do NOT support this field by default (i.e. Openshift < 4.11 )
        seccompProfile:
          type: RuntimeDefault
      ...
      containers:
      - name: controller-manager
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL

Result: the containers will be qualified to restricted-v2 in OCP and will run as restricted on k8s

Option B:

We will need to ensure that where the ServiceAccount used will have the permissions for the required OCP SCC
RunAsNonRoot should not be set ( must be empty )

deploy/chart/templates/0000_50_olm_00-namespace.yaml

camilamacedo86 · 2022-07-28T22:09:52Z

deploy/chart/templates/0000_50_olm_00-namespace.yaml

+  labels:
+    pod-security.kubernetes.io/enforce: baseline
+    pod-security.kubernetes.io/enforce-version: latest
+    pod-security.kubernetes.io/warn: baseline


Suggested change

pod-security.kubernetes.io/warn: baseline

pod-security.kubernetes.io/warn: restricted

If the enforce is baseline then we can warn when any container on this namespace is not restricted OR not set up the warning at all. But the warning makes only sense if be less permissive than the enforcement.

Oh, we should enforce restricted on the olm namespace! Can remove the warn (and audit) label(s)

For the operators namespace, I've updated it to be the following:

pod-security.kubernetes.io/enforce: baseline pod-security.kubernetes.io/enforce-version: latest pod-security.kubernetes.io/warn: privileged pod-security.kubernetes.io/warn-version: latest

From what I've read, I think this means "allow baseline pods, but also allow privileged pod with a warning". Reason for keeping this configuration is that baseline should allow most of the CSVs out there to be installed in that namespace. If there are any operators that require privileged access, we need to allow those, but throw a warning. PLMK if the above configuration doesn't achieve what I think it does.

perdasilva · 2022-07-28T22:14:45Z

deploy/chart/templates/0000_50_olm_08-catalog-operator.deployment.yaml

+        {{- if eq .Values.installType "upstream" }}
+        runAsUser: {{ .Values.package.securityContext.runAsUser }}
+        {{- end }}


Nah we should keep security as nerfed as possible. I don't think the controllers need anything above restricted.

perdasilva · 2022-07-28T22:15:41Z

deploy/chart/templates/0000_50_olm_08-catalog-operator.deployment.yaml

@@ -76,6 +87,8 @@ spec:
          - --client-ca
          - /profile-collector-cert/tls.crt
          {{- end }}
+          - --workload-user-id


The idea here for the downstream is to omit the flag and let SCC inject the user id in the range it wants.

perdasilva · 2022-07-28T22:17:43Z

deploy/chart/templates/0000_50_olm_07-olm-operator.deployment.yaml

+        {{- if eq .Values.installType "upstream" }}
+        runAsUser: {{ .Values.package.securityContext.runAsUser }}
+        {{- end }}


That's a fair point actually. Since we are baking the USER into the olm container we don't need runAsUser for the controller deployments. Only for the catalog source and unpack jobs, since we have less control there.

perdasilva · 2022-07-28T22:23:44Z

Regarding validation it's better to fail fast, rather than let PSA catch it. Might cause confusion. Idk that I'd change the runAsNonRoot behavior. More stuff to test.

cmd/catalog/start.go

anik120 · 2022-08-01T20:39:25Z

@perdasilva @camilamacedo86 updated the PR with the following updates:

The olm and catalog operator deployment does not explicitly set runAsUser, instead the images are built with user 1001.
The registry pods and the bundle unpacking job still absolutely needs their runAsUser set explicitly (whether manually or by another controller like the label syncer that'll go out and inject that value in OCP), since they're not being built from any image (check code of how they're created for more details)
The flag for the catalog-operator container arg is now a boolean (that sets the runAsUser as mentioned in 2 to 1001 if set to true)

I'm more worried about upstream users here. Changing a hard-coded value is trivial. Deprecating behavior is hard. Removing the degree of freedom here gives us control to do what we want without affecting users or having to take support promises into account

That makes sense to me. It'll behoove us to take this conversation to the wg meeting and let the community know that we are exposing this flag, and based on the feedback we receive, if we see that there's appetite for needing to set a custom user id for those workloads, we can introduce a new flag and get rid of the boolean one.

perdasilva · 2022-08-01T21:32:43Z

/approve

openshift-ci · 2022-08-01T21:32:49Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: anik120, perdasilva

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [perdasilva]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

deploy/chart/templates/0000_50_olm_00-namespace.yaml


 ---
 apiVersion: v1
 kind: Namespace
 metadata:
  name: {{ .Values.operator_namespace }}
+  labels:
+    pod-security.kubernetes.io/enforce: baseline


deploy/chart/templates/0000_50_olm_00-namespace.yaml

@@ -2,9 +2,17 @@ apiVersion: v1
 kind: Namespace
 metadata:
  name: {{ .Values.namespace }}
+  labels: 
+    pod-security.kubernetes.io/enforce: restricted


camilamacedo86

It shows addressing all needs ihmo. It has my
/lgtm

as well.

With the introduction of [Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/#pod-security-admission-labels-for-namespaces), the reccomeneded best practice is to enforce the Restricted policy of admission (see [1] for more details). This PR *) Lables the olm namespace as `enforce:restricted` *) Labels the operators namespace as `enforce:baseline` (to allow existing CSV deployments without securityContext set to deploy in the namespace, which won't be possible with `enforce:resticted`) *) updates the securityContext of olm workload pods(olm-operator, catalog-operator, and CatalogSource registry pods) to adhere to the `Restricted` policy. *) updates the bundle unpacking job to create a pod that adheres to the `Restricted` policy, so that bundles can be unpacked in the `Restricted` namespace. Signed-off-by: Anik Bhattacharjee <anikbhattacharya93@gmail.com>

The test was modifying the `olm.operatornamespace` to an incorrect value, and checking to make sure that the CSV was garbage collected as a result. However, the olm-controller was copying a fresh copy back into the namespace, so whenever the test was able to get a yes reply to the question "is the CSV gone", in the brief window before it was copied back again, the test was passing. This commit fixes that by making sure that if find a CSV that we expected to be garbage collected, it passes if it determines that the CSV is a fresh copy, and not the one modified before. Signed-off-by: Anik Bhattacharjee <anikbhattacharya93@gmail.com>

camilamacedo86 · 2022-08-02T15:48:05Z

/hold cancel

camilamacedo86

/lgtm

camilamacedo86

/hold

Because I found an nit to be sorted out it: see: https://github.com/operator-framework/operator-lifecycle-manager/pull/2820/files#r935770339

camilamacedo86 · 2022-08-02T15:59:32Z

deploy/chart/templates/0000_50_olm_00-namespace.yaml


 ---
 apiVersion: v1
 kind: Namespace
 metadata:
  name: {{ .Values.operator_namespace }}
+  labels:
+    pod-security.kubernetes.io/enforce: baseline
+    pod-security.kubernetes.io/enforce-version: latest


@anik120 is just a detail, if we add here the latest that means it will get the latest criteria available from the policy. therefore, in the future, it might no longer work with the changes in this code/release.

we create an issue to address this scenario: #2827 and it is blocking the release and will get done as follow up so all fine.

/hold cancel

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 22, 2022

openshift-ci bot requested review from awgreene and benluddy July 22, 2022 23:29

perdasilva reviewed Jul 25, 2022

View reviewed changes

deploy/chart/templates/0000_50_olm_00-namespace.yaml Show resolved Hide resolved

perdasilva reviewed Jul 25, 2022

View reviewed changes

deploy/chart/templates/0000_50_olm_00-namespace.yaml Outdated Show resolved Hide resolved

perdasilva reviewed Jul 25, 2022

View reviewed changes

pkg/controller/registry/reconciler/reconciler.go Outdated Show resolved Hide resolved

anik120 force-pushed the introduce-psa-restricted branch from 1c4ab43 to e4a9a66 Compare July 26, 2022 18:37

anik120 changed the title ~~WIP: placeholder~~ WIP: (psa) make workloads compatible with psa:restricted profile Jul 26, 2022

anik120 force-pushed the introduce-psa-restricted branch 4 times, most recently from 19f2901 to 231aa71 Compare July 27, 2022 19:16

anik120 changed the title ~~WIP: (psa) make workloads compatible with psa:restricted profile~~ (psa) make workloads compatible with psa:restricted profile Jul 27, 2022

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 27, 2022

anik120 force-pushed the introduce-psa-restricted branch 2 times, most recently from 8eb6517 to d420925 Compare July 28, 2022 20:33

camilamacedo86 suggested changes Jul 28, 2022

View reviewed changes

camilamacedo86 reviewed Jul 28, 2022

View reviewed changes

deploy/chart/templates/0000_50_olm_00-namespace.yaml Outdated Show resolved Hide resolved

camilamacedo86 reviewed Jul 28, 2022

View reviewed changes

perdasilva reviewed Jul 28, 2022

View reviewed changes

camilamacedo86 reviewed Jul 28, 2022

View reviewed changes

cmd/catalog/start.go Outdated Show resolved Hide resolved

anik120 force-pushed the introduce-psa-restricted branch 3 times, most recently from 7fed4ac to b21ed5b Compare August 1, 2022 20:11

anik120 requested review from camilamacedo86 and perdasilva August 1, 2022 20:39

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 1, 2022

camilamacedo86 reviewed Aug 2, 2022

View reviewed changes

deploy/chart/templates/0000_50_olm_00-namespace.yaml

---

apiVersion: v1

kind: Namespace

metadata:

name: {{ .Values.operator_namespace }}

labels:

pod-security.kubernetes.io/enforce: baseline

This comment was marked as resolved.

Sign in to view

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 2, 2022

camilamacedo86 reviewed Aug 2, 2022

View reviewed changes

deploy/chart/templates/0000_50_olm_00-namespace.yaml Outdated Show resolved Hide resolved

camilamacedo86 reviewed Aug 2, 2022

View reviewed changes

deploy/chart/templates/0000_50_olm_00-namespace.yaml

@@ -2,9 +2,17 @@ apiVersion: v1

kind: Namespace

metadata:

name: {{ .Values.namespace }}

labels:

pod-security.kubernetes.io/enforce: restricted

This comment was marked as resolved.

Sign in to view

anik120 force-pushed the introduce-psa-restricted branch from b21ed5b to 30eb053 Compare August 2, 2022 14:04

camilamacedo86 reviewed Aug 2, 2022

View reviewed changes

openshift-ci bot assigned camilamacedo86 Aug 2, 2022

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 2, 2022

anik120 added 2 commits August 2, 2022 11:45

anik120 force-pushed the introduce-psa-restricted branch from 30eb053 to 9cde3f0 Compare August 2, 2022 15:46

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 2, 2022

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 2, 2022

camilamacedo86 reviewed Aug 2, 2022

View reviewed changes

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 2, 2022

camilamacedo86 reviewed Aug 2, 2022

View reviewed changes

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 2, 2022

camilamacedo86 reviewed Aug 2, 2022

View reviewed changes

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 2, 2022

openshift-merge-robot merged commit 67177c0 into operator-framework:master Aug 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(psa) make workloads compatible with psa:restricted profile #2820

(psa) make workloads compatible with psa:restricted profile #2820

anik120 commented Jul 22, 2022 •

edited

Loading

perdasilva Jul 25, 2022

perdasilva commented Jul 28, 2022

perdasilva commented Jul 28, 2022

anik120 commented Jul 28, 2022

perdasilva commented Jul 28, 2022 •

edited

Loading

perdasilva commented Jul 28, 2022

camilamacedo86 left a comment •

edited

Loading

camilamacedo86 Jul 28, 2022

perdasilva Jul 28, 2022

camilamacedo86 Aug 1, 2022

anik120 Aug 1, 2022

perdasilva Jul 28, 2022

perdasilva Jul 28, 2022

perdasilva Jul 28, 2022

perdasilva commented Jul 28, 2022

anik120 commented Aug 1, 2022 •

edited

Loading

perdasilva commented Aug 1, 2022

openshift-ci bot commented Aug 1, 2022

This comment was marked as resolved.

This comment was marked as resolved.

camilamacedo86 left a comment

camilamacedo86 commented Aug 2, 2022

camilamacedo86 left a comment

camilamacedo86 left a comment •

edited

Loading

camilamacedo86 Aug 2, 2022

camilamacedo86 Aug 2, 2022

	pod-security.kubernetes.io/warn: baseline
	pod-security.kubernetes.io/warn: restricted

(psa) make workloads compatible with psa:restricted profile #2820

(psa) make workloads compatible with psa:restricted profile #2820

Conversation

anik120 commented Jul 22, 2022 • edited Loading

Choose a reason for hiding this comment

perdasilva commented Jul 28, 2022

perdasilva commented Jul 28, 2022

anik120 commented Jul 28, 2022

perdasilva commented Jul 28, 2022 • edited Loading

perdasilva commented Jul 28, 2022

camilamacedo86 left a comment • edited Loading

Choose a reason for hiding this comment

Option A)

Option B:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

perdasilva commented Jul 28, 2022

anik120 commented Aug 1, 2022 • edited Loading

perdasilva commented Aug 1, 2022

openshift-ci bot commented Aug 1, 2022

This comment was marked as resolved.

This comment was marked as resolved.

camilamacedo86 left a comment

Choose a reason for hiding this comment

camilamacedo86 commented Aug 2, 2022

camilamacedo86 left a comment

Choose a reason for hiding this comment

camilamacedo86 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anik120 commented Jul 22, 2022 •

edited

Loading

perdasilva commented Jul 28, 2022 •

edited

Loading

camilamacedo86 left a comment •

edited

Loading

anik120 commented Aug 1, 2022 •

edited

Loading

camilamacedo86 left a comment •

edited

Loading