-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(psa) allow legacy Catalogsources to run in non-restrcted namespaces #2845
(psa) allow legacy Catalogsources to run in non-restrcted namespaces #2845
Conversation
vendor/github.com/operator-framework/api/pkg/operators/v1alpha1/catalogsource_types.go
Outdated
Show resolved
Hide resolved
6aea204
to
2a1f182
Compare
2a1f182
to
8bab077
Compare
f0f8749
to
1120776
Compare
0140322
to
52566d1
Compare
What does this field mean for CatalogSources that don't have any managed pods? |
@operator-framework/team-fbc test |
16730df
to
0713044
Compare
pod.Spec.SecurityContext.RunAsUser = &runAsUser | ||
pod.Spec.SecurityContext.RunAsNonRoot = pointer.Bool(true) | ||
if source.Spec.GrpcPodConfig != nil && source.Spec.GrpcPodConfig.RequirePrivilegedAccess { | ||
pod.Spec.Containers[0].SecurityContext.AllowPrivilegeEscalation = pointer.Bool(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@perdasilva i think you had done the deepest investigation on what permissions our legacy catalogs/opm binary required to be able to run on a 4.12 cluster (in the process of you fixing the issues so we didn't require those permissions anymore), can you take a look at whether this seems right? It's not what i would have expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be sufficient to just set runAsNonRoot=false
and runAsUser=0
should be sufficient. I'm running a test on nightly 4.12 to confirm...just waiting on cluster bot....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's what it came down to:
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- MKNOD
readOnlyRootFilesystem: false
and
securityContext:
runAsNonRoot: false
runAsUser: 0
seccompProfile:
type: RuntimeDefault
This ran on a baseline
enforced namespace. Here's something curious though, @bparees, I only managed to change the PSA enforcement profile on a (new) openshit-
namespace. I.e.
$ oc create ns test
$ oc describe ns test
...
Labels: kubernetes.io/metadata.name=test
pod-security.kubernetes.io/enforce=restricted
pod-security.kubernetes.io/enforce-version=v1.24
...
$ oc label ns test pod-security.kubernetes.io/enforce=baseline --overwrite
$ oc describe ns test
...
Labels: kubernetes.io/metadata.name=test
pod-security.kubernetes.io/enforce=restricted
pod-security.kubernetes.io/enforce-version=v1.24
...
So, I think we may still have problems even with the api override =S
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fyi @perdasilva not sure if you saw this conversation, but it might help fill in the context for why I switched from setting runAsUser: 0
to not setting it at all. Given the fact that we're saying to run legacy catalogs, admins need to make their namespaces look like what they were pre PSA restricted world, it does make sense to make the registry pods look like what they were pre PSA restricted world too, and in that world we did not set runAsUser:0
, unless there are specific reasons why we should be setting runAsUser:0
now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I think we may still have problems even with the api override =S
@perdasilva i believe you need to explicitly label the NS as having the labelsyncer disabled, before you can set the PSA labels manually. It should be the same label you are setting on openshift-* NSes to enable the labelsyncer there (except you'd set it to false, of course, here)
non payload openshift-* NSes default to labelsyncer on (so you need to label the NS appropriately to disable it)
openshift-* NSes default to labelsyncer off (so you need to label the NS appropriately to enable it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doh! should have inferred that 🤦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fyi @perdasilva not sure if you saw this conversation, but it might help fill in the context for why I switched from setting
runAsUser: 0
to not setting it at all. Given the fact that we're saying to run legacy catalogs, admins need to make their namespaces look like what they were pre PSA restricted world, it does make sense to make the registry pods look like what they were pre PSA restricted world too, and in that world we did not setrunAsUser:0
, unless there are specific reasons why we should be settingrunAsUser:0
now.
So, not setting runAsUser
seemed to push the restricted-v2 scc profile causing the perm issue. If we don't want to set it to 0, we'll probably need to set the fsGroup
to 0 in lieu of that. I've also tested by removing runAsUser and runAsNonRoot and setting AllowPriviledgeEscalation to true and that worked as well. Maybe that's the best way forward, then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, not setting runAsUser seemed to push the restricted-v2 scc profile causing the perm issue. If we don't want to set it to 0, we'll probably need to set the fsGroup to 0 in lieu of that.
if you do not set the seccompprofile, and you have access to the anyuid SCC, i believe you'll get the anyuid SCC which allows you to leave the runAsUser unset, and the container will run w/ the image's default UID, which i believe is what you want.
0713044
to
a69d557
Compare
Capabilities: &corev1.Capabilities{ | ||
Drop: []corev1.Capability{"ALL"}, | ||
}, | ||
ReadOnlyRootFilesystem: pointer.Bool(false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to read the SQL Index DB which is the root file system so that needs to be true (ihmo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readOnlyRootFilesystem: Mounts the container's root filesystem as read-only.
this needs to be false if we need to write to it (which i think we do, because i think the legacy catalogs write to the /
path?).
It does not mean "set this to true if you need to read from the filesystem", you set it to true when you explicitly want to ensure nothing writes
to the root filesystem. Which is (aiui) not the case for legacy catalogs (we need to write to the root filesystem)
@@ -182,11 +182,7 @@ func Pod(source *operatorsv1alpha1.CatalogSource, name string, image string, saN | |||
}, | |||
}, | |||
SecurityContext: &corev1.SecurityContext{ | |||
ReadOnlyRootFilesystem: pointer.Bool(false), | |||
AllowPrivilegeEscalation: pointer.Bool(false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, because we need to read root file system, they away to grant permissions to that is setting true to AllowPrivilegeEscalation
And RunAsUser=0 is not required ( we need an user that have permissions to do what needs to be done as before, and if the bt setting true there the user will be part of the root group and have the same permissions.
Btw, RunAsUser=0 might not work on OCP (it will depends of the namespace and it probably will raise the error invalid value in the range)
c/c @perdasilva
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does "AllowPrivilegeEscalation" have to do w/ reading the root filesystem?
allow privilege escalation means you can run a setuid binary or otherwise allow a process to "become root", does the opm registry binary do so?
I would simply expect the index.db to be readable (by any uid), or in general we should be ensuring that the container is run w/ the appropriate uid and fsgroup to allow us to read (and write, if necessary) to the appropriate directories. None of that should require privilege escalation.
allowPrivilegeEscalation: Controls whether a process can gain more privileges than its parent process.
https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'd mean we don't need to specify AllowPrivilegeEscalation
at all then, since opm doesn't require to be root to read or write to the sqllite db, and also in 4.10 we didn't specify AllowPrivilegeEscalation
either (and opm not requiring to be root hasn't changed since 4.10, and is unlikely to change either).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fyi was updating the tests to use image hosted in the quay.io/olmtest registry instead of my personal registry, so took that opportunity to remove AllowPrivilegeEscalation:true
too. cc: @perdasilva now I'm happy with the PR going in :)
3e03339
to
59d230d
Compare
59d230d
to
12a3d54
Compare
Looks like we've finally converged to a spot where we're all on the same page. Let's merge this so that we can start the downstreaming process, and if we discover we need to make changes to the configs in this PR for downstream, we can come back upstream and make those changes. In lieu of that, will need the labels for operator-framework/api#261 cc: @perdasilva |
Signed-off-by: Anik Bhattacharjee <anikbhattacharya93@gmail.com>
12a3d54
to
866d445
Compare
61acd0a
to
dad760a
Compare
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: anik120, perdasilva The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
if runAsUser > 0 { | ||
pod.Spec.SecurityContext.RunAsUser = &runAsUser | ||
pod.Spec.SecurityContext.RunAsNonRoot = pointer.Bool(true) | ||
if source.Spec.GrpcPodConfig == nil || source.Spec.GrpcPodConfig.SecurityContextConfig != operatorsv1alpha1.Legacy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be "if config==nil || config==restricted"
you don't want to treat unknown values as if they were "restricted", you want to treat unknown values as an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(so you also need an else here to handle unknown values. Yes validation should prevent it, but if someone adds a new valid enum and doesn't fix up this logic, you want to catch it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so spec.GrpcPodConfig.SecurityContextConfig is marked as optional
, which means most of the CatalogSources created will not have the "GrpcPodConfig" struct populated, so config==restricted will actually not be true.
For example, this is the yaml of the operatorhub catalog applied to the cluster on local installation in Kind:
$ k get catsrc -n olm -o yaml
apiVersion: v1
items:
- apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
creationTimestamp: "2022-09-14T17:11:29Z"
generation: 1
name: operatorhubio-catalog
namespace: olm
resourceVersion: "757"
uid: 54db9c12-be5e-4b0b-9e16-3d1d80acaf54
spec:
displayName: Community Operators
image: quay.io/operatorhubio/catalog:latest
publisher: OperatorHub.io
sourceType: grpc
updateStrategy:
registryPoll:
interval: 60m
status:
connectionState:
address: operatorhubio-catalog.olm.svc:50051
lastConnect: "2022-09-14T17:11:59Z"
lastObservedState: READY
registryService:
createdAt: "2022-09-14T17:11:29Z"
port: "50051"
protocol: grpc
serviceName: operatorhubio-catalog
serviceNamespace: olm
kind: List
metadata:
resourceVersion: ""
selfLink: ""
I'm guessing that means when a third value is introduced, grpcpodConfig.SecurityContextConfig should be marked as required
and then this can be changed to explicitly check config==restricted.
Coz the other option is "config==restricted" || config==nil", which doesn't sound that explicit anyway. Looks like even with defining a default value for it, if the top level grpcPodConfig isn't defined at all, the entire struct remains nil.
I've added a comment to prevent someone accidentally introducing faulty logic when a third enum is added, hopefully that's a good middle ground
ab0a931
to
2711e76
Compare
This PR configures the Catalogsource reconciler to use the spec.GrpcPodConfig.SecurityContextConfig field to determine if the pod.spec.securityContext and container[*].spec.SecurityContext for the registry pod should be configured to be runnable in a PSA restrcited namespace or not, so that cluster admins can indicate that they want to run legacy catalogsources in a non-resctricted (baseline/privileged) namespace. This allows cluster admins to run catalogsources that are built with a version of opm that is less than v1.23.2 (i.e a version of opm that does not contain [this commit](operator-framework/operator-registry#974) Signed-off-by: Anik Bhattacharjee <anikbhattacharya93@gmail.com>
2711e76
to
acaf375
Compare
/lgtm |
This PR configures the Catalogsource reconciler to use the spec.GrpcPodConfig.SecurityContextConfig
field to determine if the pod.spec.securityContext and container[*].spec.SecurityContext for the registry
pod should be configured to be runnable in a PSA restrcited namespace or not, so that cluster admins can
indicate that they want to run legacy catalogsources in a non-resctricted (baseline/privileged) namespace.
This allows cluster admins to run catalogsources that are built with a version of opm that is less than
v1.23.2 (i.e a version of opm that does not contain this commit
Signed-off-by: Anik Bhattacharjee anikbhattacharya93@gmail.com
Description of the change:
Motivation for the change:
Architectural changes:
Testing remarks:
Reviewer Checklist
/doc
[FLAKE]
are truly flaky and have an issue