Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(psa) allow legacy Catalogsources to run in non-restrcted namespaces #2845

Merged

Conversation

anik120
Copy link
Contributor

@anik120 anik120 commented Aug 24, 2022

This PR configures the Catalogsource reconciler to use the spec.GrpcPodConfig.SecurityContextConfig
field to determine if the pod.spec.securityContext and container[*].spec.SecurityContext for the registry
pod should be configured to be runnable in a PSA restrcited namespace or not, so that cluster admins can
indicate that they want to run legacy catalogsources in a non-resctricted (baseline/privileged) namespace.

This allows cluster admins to run catalogsources that are built with a version of opm that is less than
v1.23.2 (i.e a version of opm that does not contain this commit

Signed-off-by: Anik Bhattacharjee anikbhattacharya93@gmail.com

Description of the change:

Motivation for the change:

Architectural changes:

Testing remarks:

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Bug fixes are accompanied by regression test(s)
  • e2e tests and flake fixes are accompanied evidence of flake testing, e.g. executing the test 100(0) times
  • tech debt/todo is accompanied by issue link(s) in comments in the surrounding code
  • Tests are comprehensible, e.g. Ginkgo DSL is being used appropriately
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive
  • Tests marked as [FLAKE] are truly flaky and have an issue
  • Code is properly formatted

@anik120 anik120 changed the title (catsrc) allow catalogsource to be run as root WIP: (catsrc) allow catalogsource to be run as root Aug 24, 2022
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 24, 2022
@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Aug 29, 2022
@anik120 anik120 force-pushed the run-catsrc-root branch 2 times, most recently from f0f8749 to 1120776 Compare August 29, 2022 18:55
@anik120 anik120 changed the title WIP: (catsrc) allow catalogsource to be run as root (catsrc) allow catalogsource to be run as root Aug 29, 2022
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 29, 2022
@anik120 anik120 force-pushed the run-catsrc-root branch 2 times, most recently from 0140322 to 52566d1 Compare August 29, 2022 20:46
@benluddy
Copy link
Contributor

What does this field mean for CatalogSources that don't have any managed pods?

@perdasilva
Copy link
Collaborator

@operator-framework/team-fbc test

@anik120 anik120 force-pushed the run-catsrc-root branch 2 times, most recently from 16730df to 0713044 Compare September 6, 2022 19:32
deploy/chart/crds/0000_50_olm_00-catalogsources.crd.yaml Outdated Show resolved Hide resolved
pkg/controller/registry/reconciler/reconciler.go Outdated Show resolved Hide resolved
pod.Spec.SecurityContext.RunAsUser = &runAsUser
pod.Spec.SecurityContext.RunAsNonRoot = pointer.Bool(true)
if source.Spec.GrpcPodConfig != nil && source.Spec.GrpcPodConfig.RequirePrivilegedAccess {
pod.Spec.Containers[0].SecurityContext.AllowPrivilegeEscalation = pointer.Bool(true)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@perdasilva i think you had done the deepest investigation on what permissions our legacy catalogs/opm binary required to be able to run on a 4.12 cluster (in the process of you fixing the issues so we didn't require those permissions anymore), can you take a look at whether this seems right? It's not what i would have expected.

Copy link
Collaborator

@perdasilva perdasilva Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be sufficient to just set runAsNonRoot=false and runAsUser=0 should be sufficient. I'm running a test on nightly 4.12 to confirm...just waiting on cluster bot....

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what it came down to:

securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - MKNOD
      readOnlyRootFilesystem: false

and

securityContext:
    runAsNonRoot: false
    runAsUser: 0
    seccompProfile:
      type: RuntimeDefault

This ran on a baseline enforced namespace. Here's something curious though, @bparees, I only managed to change the PSA enforcement profile on a (new) openshit- namespace. I.e.

$ oc create ns test
$ oc describe ns test
...
Labels:  kubernetes.io/metadata.name=test
              pod-security.kubernetes.io/enforce=restricted
              pod-security.kubernetes.io/enforce-version=v1.24
...
$ oc label ns test pod-security.kubernetes.io/enforce=baseline --overwrite
$ oc describe ns test
...
Labels:  kubernetes.io/metadata.name=test
              pod-security.kubernetes.io/enforce=restricted
              pod-security.kubernetes.io/enforce-version=v1.24
...

So, I think we may still have problems even with the api override =S

Copy link
Contributor Author

@anik120 anik120 Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fyi @perdasilva not sure if you saw this conversation, but it might help fill in the context for why I switched from setting runAsUser: 0 to not setting it at all. Given the fact that we're saying to run legacy catalogs, admins need to make their namespaces look like what they were pre PSA restricted world, it does make sense to make the registry pods look like what they were pre PSA restricted world too, and in that world we did not set runAsUser:0, unless there are specific reasons why we should be setting runAsUser:0 now.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think we may still have problems even with the api override =S

@perdasilva i believe you need to explicitly label the NS as having the labelsyncer disabled, before you can set the PSA labels manually. It should be the same label you are setting on openshift-* NSes to enable the labelsyncer there (except you'd set it to false, of course, here)

non payload openshift-* NSes default to labelsyncer on (so you need to label the NS appropriately to disable it)
openshift-* NSes default to labelsyncer off (so you need to label the NS appropriately to enable it)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doh! should have inferred that 🤦

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fyi @perdasilva not sure if you saw this conversation, but it might help fill in the context for why I switched from setting runAsUser: 0 to not setting it at all. Given the fact that we're saying to run legacy catalogs, admins need to make their namespaces look like what they were pre PSA restricted world, it does make sense to make the registry pods look like what they were pre PSA restricted world too, and in that world we did not set runAsUser:0, unless there are specific reasons why we should be setting runAsUser:0 now.

So, not setting runAsUser seemed to push the restricted-v2 scc profile causing the perm issue. If we don't want to set it to 0, we'll probably need to set the fsGroup to 0 in lieu of that. I've also tested by removing runAsUser and runAsNonRoot and setting AllowPriviledgeEscalation to true and that worked as well. Maybe that's the best way forward, then.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, not setting runAsUser seemed to push the restricted-v2 scc profile causing the perm issue. If we don't want to set it to 0, we'll probably need to set the fsGroup to 0 in lieu of that.

if you do not set the seccompprofile, and you have access to the anyuid SCC, i believe you'll get the anyuid SCC which allows you to leave the runAsUser unset, and the container will run w/ the image's default UID, which i believe is what you want.

pkg/controller/registry/reconciler/reconciler.go Outdated Show resolved Hide resolved
Capabilities: &corev1.Capabilities{
Drop: []corev1.Capability{"ALL"},
},
ReadOnlyRootFilesystem: pointer.Bool(false),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to read the SQL Index DB which is the root file system so that needs to be true (ihmo)

@perdasilva

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readOnlyRootFilesystem: Mounts the container's root filesystem as read-only.

this needs to be false if we need to write to it (which i think we do, because i think the legacy catalogs write to the / path?).

It does not mean "set this to true if you need to read from the filesystem", you set it to true when you explicitly want to ensure nothing writes to the root filesystem. Which is (aiui) not the case for legacy catalogs (we need to write to the root filesystem)

@@ -182,11 +182,7 @@ func Pod(source *operatorsv1alpha1.CatalogSource, name string, image string, saN
},
},
SecurityContext: &corev1.SecurityContext{
ReadOnlyRootFilesystem: pointer.Bool(false),
AllowPrivilegeEscalation: pointer.Bool(false),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, because we need to read root file system, they away to grant permissions to that is setting true to AllowPrivilegeEscalation

And RunAsUser=0 is not required ( we need an user that have permissions to do what needs to be done as before, and if the bt setting true there the user will be part of the root group and have the same permissions.

Btw, RunAsUser=0 might not work on OCP (it will depends of the namespace and it probably will raise the error invalid value in the range)

c/c @perdasilva

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "AllowPrivilegeEscalation" have to do w/ reading the root filesystem?

allow privilege escalation means you can run a setuid binary or otherwise allow a process to "become root", does the opm registry binary do so?

I would simply expect the index.db to be readable (by any uid), or in general we should be ensuring that the container is run w/ the appropriate uid and fsgroup to allow us to read (and write, if necessary) to the appropriate directories. None of that should require privilege escalation.

allowPrivilegeEscalation: Controls whether a process can gain more privileges than its parent process.

https://kubernetes.io/docs/tasks/configure-pod-container/security-context/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd mean we don't need to specify AllowPrivilegeEscalation at all then, since opm doesn't require to be root to read or write to the sqllite db, and also in 4.10 we didn't specify AllowPrivilegeEscalation either (and opm not requiring to be root hasn't changed since 4.10, and is unlikely to change either).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fyi was updating the tests to use image hosted in the quay.io/olmtest registry instead of my personal registry, so took that opportunity to remove AllowPrivilegeEscalation:true too. cc: @perdasilva now I'm happy with the PR going in :)

@anik120 anik120 force-pushed the run-catsrc-root branch 3 times, most recently from 3e03339 to 59d230d Compare September 9, 2022 17:57
@anik120
Copy link
Contributor Author

anik120 commented Sep 12, 2022

Looks like we've finally converged to a spot where we're all on the same page. Let's merge this so that we can start the downstreaming process, and if we discover we need to make changes to the configs in this PR for downstream, we can come back upstream and make those changes.

In lieu of that, will need the labels for operator-framework/api#261 cc: @perdasilva

Signed-off-by: Anik Bhattacharjee <anikbhattacharya93@gmail.com>
@anik120 anik120 changed the title (catsrc) allow catalogsource to be run as root (psa) allow legacy Catalogsources to run in non-restrcted namespaces Sep 13, 2022
@anik120 anik120 force-pushed the run-catsrc-root branch 4 times, most recently from 61acd0a to dad760a Compare September 13, 2022 20:52
@perdasilva
Copy link
Collaborator

/approve

@openshift-ci
Copy link

openshift-ci bot commented Sep 14, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: anik120, perdasilva

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 14, 2022
if runAsUser > 0 {
pod.Spec.SecurityContext.RunAsUser = &runAsUser
pod.Spec.SecurityContext.RunAsNonRoot = pointer.Bool(true)
if source.Spec.GrpcPodConfig == nil || source.Spec.GrpcPodConfig.SecurityContextConfig != operatorsv1alpha1.Legacy {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be "if config==nil || config==restricted"

you don't want to treat unknown values as if they were "restricted", you want to treat unknown values as an error.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(so you also need an else here to handle unknown values. Yes validation should prevent it, but if someone adds a new valid enum and doesn't fix up this logic, you want to catch it.)

Copy link
Contributor Author

@anik120 anik120 Sep 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so spec.GrpcPodConfig.SecurityContextConfig is marked as optional, which means most of the CatalogSources created will not have the "GrpcPodConfig" struct populated, so config==restricted will actually not be true.

For example, this is the yaml of the operatorhub catalog applied to the cluster on local installation in Kind:

$ k get catsrc -n olm -o yaml 
apiVersion: v1
items:
- apiVersion: operators.coreos.com/v1alpha1
  kind: CatalogSource
  metadata:
    creationTimestamp: "2022-09-14T17:11:29Z"
    generation: 1
    name: operatorhubio-catalog
    namespace: olm
    resourceVersion: "757"
    uid: 54db9c12-be5e-4b0b-9e16-3d1d80acaf54
  spec:
    displayName: Community Operators
    image: quay.io/operatorhubio/catalog:latest
    publisher: OperatorHub.io
    sourceType: grpc
    updateStrategy:
      registryPoll:
        interval: 60m
  status:
    connectionState:
      address: operatorhubio-catalog.olm.svc:50051
      lastConnect: "2022-09-14T17:11:59Z"
      lastObservedState: READY
    registryService:
      createdAt: "2022-09-14T17:11:29Z"
      port: "50051"
      protocol: grpc
      serviceName: operatorhubio-catalog
      serviceNamespace: olm
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

I'm guessing that means when a third value is introduced, grpcpodConfig.SecurityContextConfig should be marked as required and then this can be changed to explicitly check config==restricted.

Coz the other option is "config==restricted" || config==nil", which doesn't sound that explicit anyway. Looks like even with defining a default value for it, if the top level grpcPodConfig isn't defined at all, the entire struct remains nil.

I've added a comment to prevent someone accidentally introducing faulty logic when a third enum is added, hopefully that's a good middle ground

@anik120 anik120 force-pushed the run-catsrc-root branch 2 times, most recently from ab0a931 to 2711e76 Compare September 14, 2022 19:06
This PR configures the Catalogsource reconciler to use the spec.GrpcPodConfig.SecurityContextConfig
field to determine if the pod.spec.securityContext and container[*].spec.SecurityContext for the registry
pod should be configured to be runnable in a PSA restrcited namespace or not, so that cluster admins can
indicate that they want to run legacy catalogsources in a non-resctricted (baseline/privileged) namespace.

This allows cluster admins to run catalogsources that are built with a version of opm that is less than
v1.23.2 (i.e a version of opm that does not contain [this commit](operator-framework/operator-registry#974)

Signed-off-by: Anik Bhattacharjee <anikbhattacharya93@gmail.com>
@bparees
Copy link

bparees commented Sep 14, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 14, 2022
@openshift-merge-robot openshift-merge-robot merged commit edffd9c into operator-framework:master Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants