Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Narrow and explicit RBAC #1639

Closed
maanur opened this issue Aug 25, 2021 · 15 comments · Fixed by #2091
Closed

Narrow and explicit RBAC #1639

maanur opened this issue Aug 25, 2021 · 15 comments · Fixed by #2091

Comments

@maanur
Copy link
Contributor

maanur commented Aug 25, 2021

/kind feature

I am trying to deploy Katib in an enterprise environment and have some hard time explaining Katib's requested RBAC
rules. It would be much appreciated if ClusterRole explicitly declared only necessary verbs for each individual resource.

Look at this:

rules:
  - apiGroups:
      - ""
    resources:
      - configmaps
      - serviceaccounts
      - services
      - events
      - namespaces
      - persistentvolumes
      - persistentvolumeclaims
      - pods
      - pods/log
      - pods/status
    verbs:
      - "*"

Full access for Namespaces, Roles and RoleBindings effectively gives Katib unconstrained privileges to do anything it wants. This is plainly unacceptable in my case.

I suggest that ClusterRoles would be narrow and explicit like that:

rules:
  - apiGroups:
      - ""
    resources:
      - namespaces
    verbs:
      - get
      - list
      - watch
 - apiGroups:
      - ""
    resources:
      - roles
    verbs:
      - get
      - list
    ... e.t.c

E.g. let's get rid of those stars in rules.verbs and olso remove unnecessary verbs from there

@johnugeorge
Copy link
Member

Can you add more info?

@maanur
Copy link
Contributor Author

maanur commented Aug 26, 2021

Can you add more info?

Updated the OP

@andreyvelich
Copy link
Member

Thank you for creating this @maanur.
I agree, that is great improvement!
We should investigate which permissions do we need.

/priority p1

@maanur
Copy link
Contributor Author

maanur commented Nov 9, 2021

I'll look into the issue, but I may need help with testing because of lack of free time.
Firstly, I'm going to look at kubebuilder markers and try making them work, thus getting rid of "stars" in ClusterRoles.

After that I'd be willing to switch the operator's scope to multiple namespaces instead of a whole cluster, as the current level of privileges is not necessary in fact (remember katib-metricscollector-injection label on tracked namespaces). That would be a breaking change, so I'll make a separate proposal for thorough consideration.

Is this plan acceptable? @andreyvelich

@andreyvelich
Copy link
Member

That would be great, thank you @maanur!

After that I'd be willing to switch the operator's scope to multiple namespaces instead of a whole cluster

Are you referring to this issue: #1295 ?
Which runs Katib controller without cluster role access ?

@maanur
Copy link
Contributor Author

maanur commented Nov 9, 2021

Are you referring to this issue: #1295 ?\nWhich runs Katib controller without cluster role access ?

Yes, partially. I suppose the particular problem of deploying multiple controllers can be solved by OLM and its OperatorGroups, but that can be considered a separate feature.

@andreyvelich
Copy link
Member

@maanur I think that also requires additional proposal on how we want to implement this.
Currently, we don't have solution on how to enable our Webhooks on namespace mode.

Let's start with removing redundant access from the ClusterRole, after that we can discuss about role-based access.

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Jan 12, 2022

I have fixed the cluster role rules in #1768 and documented the remaining problems.

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Jan 12, 2022

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: katib-controller
# We only need the verb create and not delete because all things are owned transitively by the
# experiment, so everything is cascade-deleted on experiment deletion.
# Early stopping currently creates a serviceaccount, role and rolebinding to manage trials and trails/status.
# The roles and rolebinding creation is the most important potential security issue now.
# Via new roles, rolebindings and pods in kubernetes system namespaces the cluster 
# could become compromised if katib-controller is compromised.
rules:
  - apiGroups:
      - ""
      - apps
      - rbac.authorization.k8s.io
    resources:
      - services # suggestion-api and earlystopping-api
      - events # katib cries is you remove it for unknown reasons
      - persistentvolumeclaims # maybe katib creates volumes, i am not sure
      - deployments # the main deployment that serves suggestion-api and earlystopping-api
      - pods # maybe to tfjobs etc. I  am not sure
      - pods/log # maybe to tfjobs etc. I  am not sure
      - pods/status # maybe to tfjobs etc. I  am not sure
      - jobs # trials are executed via jobs
      - cronjobs # i do not know why it is there. maybe you can delay trials
      - serviceaccounts # earlystopping
      - roles # earlystopping
      - rolebindings # earlystopping
    verbs:
      - get
      - list
      - watch
      - patch
      - create
  - apiGroups:
      - batch
    resources:
      - jobs # trials are executed via jobs
      - cronjobs # i do not know why it is there. maybe you can delay trials
    verbs:
      - get
      - list
      - watch
      - patch
      - create
      - delete # this is essential
  - apiGroups:
      - ""
    resources:
      - configmaps # i do not have a clue
      - persistentvolumes # i do not have a clue
      - namespaces 
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - kubeflow.org
    resources: # everything here is obvious
      - experiments
      - experiments/status
      - experiments/finalizers
      - trials
      - trials/status
      - trials/finalizers
      - suggestions
      - suggestions/status
      - suggestions/finalizers
      - tfjobs
      - pytorchjobs
      - mpijobs
      - xgboostjobs
      - mxjobs
    verbs:
      - "*"

@maanur
Copy link
Contributor Author

maanur commented Mar 1, 2022

We use Katib with customized RBAC in a desperate attempt to fit in corporate information security requirements.
Our ClusterRoles are separated: the first declares cluster-level common resources access, the second declares declares namespace-level rules, and the third declares kubeflow.org-only cluster-level access.

I've never posted it here, because I believe that RBAC manifests should be autogenerated from in-code markings, for example, by kubebuilder.
rbac.yaml.zip

Take a look, please. @juliusvonkohout @andreyvelich

@johnugeorge
Copy link
Member

@maanur Can you create a PR which would be easy to review?

@maanur
Copy link
Contributor Author

maanur commented Mar 1, 2022

Can you create a PR which would be easy to review?

Ok, I'll create PR after adding some comments to the yaml, but I really see it as a temporary solution.

@johnugeorge
Copy link
Member

I meant, Do you have a narrowed set of RBAC rules?

@maanur
Copy link
Contributor Author

maanur commented Mar 14, 2022

Do you have a narrowed set of RBAC rules?

Yes, I have, but I'd better suggest a more sustainable solution.

@juliusvonkohout
Copy link
Member

Do you have a narrowed set of RBAC rules?

Yes, I have, but I'd better suggest a more sustainable solution.

What do you have in mind?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants