Skip to content
This repository has been archived by the owner on Jun 24, 2020. It is now read-only.

Restrict RBAC for Operator #291

Merged
merged 1 commit into from
Mar 30, 2020

Conversation

Cynocracy
Copy link
Contributor

@Cynocracy Cynocracy commented Feb 11, 2020

Fixes #282

#374 is a base of this, you can solely review the YAML here
(and not the reconciler change) though you're welcome to review
that as well.

Proposed Changes

By using the escalate verb on clusterroles, we allow the Operator
to have the ability to create any clusterrole. This prevents us from
needing to grant it cluster-admin.

Additionally, by having the Operator create an aggregated clusterrole
for itself (aggregating all clusteroles created by Knative Serving),
we pick up any permissions that the upstream dictate as necessary
by any component.

Any roles that are not created by upstream, but which are bound as
part of install are added with the bind verb to allow the Operator
to bind them without necessarily having the permissions they grant.

Some other permissions are needed to support controller infra
and are explicitly included, as well as some specific to the Operator
itself. An attempt at grouping these by sections is included, though
may be slightly out of sync (in particular: Controller infra / Specific
to this Operator may overlap).

Overall, this should keep the Operator roughly operating only on
Knative Serving resources (or things which can be operated
on by upstream).

Release Note

Change RBAC to minimize permissions of Operator service account.

@Cynocracy
Copy link
Contributor Author

/retest

(seems to be running OK locally)

@Cynocracy
Copy link
Contributor Author

Oh, I see the issue.

@Cynocracy
Copy link
Contributor Author

/retest

:(

@Cynocracy
Copy link
Contributor Author

Found out how to read the logs:

{"ts":"2020-02-12T01:09:32.279Z","knative.dev/controller":"knativeserving-controller","msg":"Reconcile succeeded. Time taken: 11.99198359s.","commit":"0b4d6f8","level":"info","logger":"serving_operator.knativeserving-controller","knative.dev/key":"operator-tests/knative-serving","caller":"controller…

Hmmmm

/retest

so that I can poke around more :)

@Cynocracy
Copy link
Contributor Author

raceid":"fd80c601-0e62-4773-bdd6-1ecb60dbbf78","knative.dev/key":"operator-tests/knative-serving"}
{"level":"info","ts":"2020-02-13T01:16:24.454Z","logger":"serving_operator.knativeserving-controller.event-broadcaster","caller":"record/event.go:274","msg":"Event(v1.ObjectReference{Kind:"KnativeServing", Namespace:"operator-tests", Name:"knative-serving", UID:"5e9ab652-b2da-431a-86cb-211efa5cbd70", APIVersion:"operator.knative.dev/v1alpha1", ResourceVersion:"4761", FieldPath:""}): type: 'Warning' reason: 'InternalError' apiservices.apiregistration.k8s.io "v1beta1.custom.metrics.k8s.io" is forbidden: User "system:serviceaccount:default:knative-serving-operator" cannot update resource "apiservices" in API group "apiregistration.k8s.io" at the cluster scope","commit":"b29f72a","knative.dev/controller":"knativeserving-controller"}

OK, so that was required.

@Cynocracy
Copy link
Contributor Author

This is ready for a preliminary review.

Let me know if more detail is necessary.

Copy link
Contributor

@trshafer trshafer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

role.yaml went from 60 lines to 200 lines, worth splitting into multiple files?

config/role.yaml Show resolved Hide resolved
config/role.yaml Show resolved Hide resolved
config/role.yaml Show resolved Hide resolved
@Cynocracy
Copy link
Contributor Author

I think splitting it into multiple files makes sense, especially since some of the sections will be drop-in equivalents between the serving and eventing operator.

I'm working on another WIP for that repo, let me take a stab at splitting once I have that working (it will be a good proof of concept that the general idea of 'escalate' + 'bind' maps well).

@Cynocracy
Copy link
Contributor Author

/retest

@Cynocracy
Copy link
Contributor Author

{"level":"info","ts":"2020-03-06T23:30:27.273Z","logger":"serving-operator.knativeserving-controller.event-broadcaster","caller":"record/event.go:274","msg":"Event(v1.ObjectReference{Kind:"KnativeServing", Namespace:"knative-serving", Name:"knative-serving", UID:"cbacee1e-5ed9-4bac-a8ba-d65b0ba9225a", APIVersion:"operator.knative.dev/v1alpha1", ResourceVersion:"2288", FieldPath:""}): type: 'Warning' reason: 'InternalError' serviceaccounts "controller" is forbidden: User "system:serviceaccount:default:knative-serving-operator" cannot get resource "serviceaccounts" in API group "" in the namespace "knative-serving"","commit":"1419ff1","knative.dev/controller":"knativeserving-controller"}

@Cynocracy
Copy link
Contributor Author

Wait.. it should have * * * in knative-serving.

@trshafer
Copy link
Contributor

trshafer commented Mar 9, 2020

Wait.. it should have * * * in knative-serving.

Are there more error messages or does it need GET serviceaccounts in knative-serving?

@Cynocracy
Copy link
Contributor Author

I believe the issue has been fixed (the issue was that the role and the rolebinding need to be in the relevant namespace, and the subjects needed to point to the default namespace).

This opens up a question I need to resolve though: do we need to support multiple KS installations in the same cluster, or can we assume the default namespace is used?

@Cynocracy
Copy link
Contributor Author

OK, the error this time: deleting fails, because we delete the role that gives us the ability to delete 'admissionregistration.k8s.io' resources before we delete those resources.

Going to take a stab at saving the deletion of the roles/rolebindings until last as we need to 'unbootstrap' ourselves from having these permissions.

Cynocracy pushed a commit to Cynocracy/serving-operator that referenced this pull request Mar 12, 2020
This allows us to take advantage of the permissions granted to cluster admins
when performing cleanup. If the approach in knative#291
is approved, this will become necessary. Additionally, it makes sense to remove roles as the final
step to allow human Operators to modify any resources they may have permissions on as a result of the
Knative installation (that is, we should not remove any access until we are 'almost done' cleaning up).
knative-prow-robot pushed a commit that referenced this pull request Mar 13, 2020
* Delete roles and rolebindings as the final step of cleanup.

This allows us to take advantage of the permissions granted to cluster admins
when performing cleanup. If the approach in #291
is approved, this will become necessary. Additionally, it makes sense to remove roles as the final
step to allow human Operators to modify any resources they may have permissions on as a result of the
Knative installation (that is, we should not remove any access until we are 'almost done' cleaning up).

* Gofmt

* More golang idiomatic var names.

* Remove redundant All()
@Cynocracy
Copy link
Contributor Author

/retest

@Cynocracy Cynocracy changed the title [WIP] Restrict RBAC for Operator Restrict RBAC for Operator Mar 17, 2020
@Cynocracy
Copy link
Contributor Author

/assign @jcrossley3

@Cynocracy Cynocracy changed the title Restrict RBAC for Operator [WIP] Restrict RBAC for Operator Mar 17, 2020
@Cynocracy
Copy link
Contributor Author

WIP as eventing change still is, and I'd like these to go in concurrently if at all.

@Cynocracy
Copy link
Contributor Author

/unassign @jcrossley3

Cynocracy pushed a commit to Cynocracy/serving-operator that referenced this pull request Mar 25, 2020
This order, Roles -> RoleBindings -> The Rest, prevents the operand from
needing to structure their manifest such that this ordering exists, while
allowing the Operator to use the bootstrapping mechanism described in
knative#291 to 'escalate' itself
into management of the Knative Serving installation without a *** clusterrole.
@Cynocracy Cynocracy changed the title [WIP] Restrict RBAC for Operator Restrict RBAC for Operator Mar 25, 2020
@Cynocracy
Copy link
Contributor Author

/assign @jcrossley3

Assuming my last minute diff-tweaking didn't break this, this should be ready to review :)

The only remaining diff between the role here and in eventing are the replacement of some labels,
a few explicit 'bind's (which now need to be added to roles the Operator binds to but
does not create), and the resource deletion section at the end.

@Cynocracy
Copy link
Contributor Author

One more comment re: the last deltas.

In the future, if we move to a job based cleanup, deletion roles can be defined one-off.
If the Operator repos and/or controllers are to merge, this dramatically simplifies the collapse of
the roles needed.

knative-prow-robot pushed a commit that referenced this pull request Mar 25, 2020
* Install from the manifest in a strict order.

This order, Roles -> RoleBindings -> The Rest, prevents the operand from
needing to structure their manifest such that this ordering exists, while
allowing the Operator to use the bootstrapping mechanism described in
#291 to 'escalate' itself
into management of the Knative Serving installation without a *** clusterrole.

* Fixing golint

* Just update the manifest as a final step.
By using the escalate verb on clusterroles, we allow the Operator
to have the ability to create any clusterrole. This prevents us from
needing to grant it cluster-admin.

Additionally, by having the Operator create an aggregated clusterrole
for itself (aggregating all clusteroles created by Knative Serving),
we pick up any permissions that the upstream dictate as necessary
by any component.

Any roles that are not created by upstream, but which are bound as
part of install are added with the bind verb to allow the Operator
to bind them without necessarily having the permissions they grant.

Some other permissions are needed to support controller infra
and are explicitly included, as well as some specific to the Operator
itself. An attempt at grouping these by sections is included, though
may be slightly out of sync (in particular: Controller infra / Specific
to this Operator may overlap).

Overall, this should keep the Operator roughly operating only on
Knative Serving resources (or things which can be operated
on by upstream).

This also minimizes the diff between eventing and serving RBAC.
@Cynocracy
Copy link
Contributor Author

Commits now squashed (squished?) into a single one :)

Thanks for the tip jcrossley3

@jcrossley3
Copy link
Contributor

Runs fine on OpenShift, so...
/lgtm

@Cynocracy
Copy link
Contributor Author

To double check the diff, I did

  1. ko apply config
  2. kubectl api-resources -oname | xargs -I% kubectl get -A % -oname | sort | uniq > /tmp/before
  3. Create the KS resource
  4. Wait for it to finish
  5. Delete the KS resource
  6. kubectl api-resources -oname | xargs -I% kubectl get -A % -oname | sort | uniq > /tmp/after
    7 diff /tmp/before /tmp/after

Which spat out

https://pastebin.com/tugjzdeU

The only oddness is the activator pod, which seems to be hanging around.

@Cynocracy
Copy link
Contributor Author

https://pastebin.com/bjuPParB

Before, no activator leak, curious.

@Cynocracy
Copy link
Contributor Author

A second attempt at a diff produced https://pastebin.com/N0w4R9QZ

So it's entirely possible I generated the last one while the pod was going down (I only diff the names, so who knows :)

Either way, this is ready for an approval, IMO.

@trshafer
Copy link
Contributor

/approve

@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Cynocracy, trshafer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot merged commit 2e3db9d into knative:master Mar 30, 2020
@Cynocracy Cynocracy deleted the restrict-rbac branch March 30, 2020 17:59
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Principle of least privilege should be applied in operator (Cluster)Roles
6 participants