Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 2086519: Introduce Pod Security Admission Label Synchronization controller #75

Merged
merged 5 commits into from
Jun 3, 2022

Conversation

stlaz
Copy link
Contributor

@stlaz stlaz commented Mar 23, 2022

This PR introduced a controller that implements the design from openshift/enhancements#1010

So far this is just a draft as we need to introduce optimizations to only ever consider rolebindings/roles that matter in terms of SCC access to prevent unnecessary number of calculations in case of a big number of Roles/SCCs/both.

/assign @s-urbaniak
/cc @openshift/openshift-team-auth-maintainers @deads2k

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 23, 2022
@openshift-ci openshift-ci bot requested a review from deads2k March 23, 2022 14:41
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 23, 2022

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@stlaz
Copy link
Contributor Author

stlaz commented Mar 23, 2022

/test all
just for the fun of it

@@ -103,6 +105,10 @@ func NewControllerContext(
if err != nil {
return nil, err
}
securityClient, err := securityclient.NewForConfig(clientConfig)
if err != nil {
return nil, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general nit: fmt.Errorf("error creating security client: %w", err)

so we have good stacktraces if things goe wonky.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 31, 2022
@stlaz
Copy link
Contributor Author

stlaz commented Apr 14, 2022

/test all

@stlaz stlaz marked this pull request as ready for review April 21, 2022 13:52
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 21, 2022
@stlaz
Copy link
Contributor Author

stlaz commented Apr 21, 2022

/hold
just in case a random person comes and lgtms, I want at least two pair of eyes on it

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 21, 2022
@stlaz
Copy link
Contributor Author

stlaz commented Apr 25, 2022

/retest
let's see whether go 1.18 builders were finally added

@stlaz
Copy link
Contributor Author

stlaz commented Apr 25, 2022

Note to reviewers: the controller won't be able to do anything until openshift/cluster-kube-controller-manager-operator#616 gets merged so you are only reviewing the code.

Once we are fine with the concept, we can merge the code and run CI on it again.


serviceAccounts := []string{}
for _, subject := range roleBinding.Subjects() {
if subject.APIGroup == "" && subject.Kind == "ServiceAccount" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I prefer switch/case and summarizing someting like

IsGroupBinding := subject.APIGroup == rbacv1.GroupName && subject.Kind == "Group"
isAllServiceAccount := subject.Name == serviceaccount.AllServiceAccountsGroup
isNamespacedServiceAccountGroup := strings.HasPrefix(subject.Name, serviceaccount.ServiceAccountGroupPrefix))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried, it turns out to be quite a lot of string comparisons that may be unnecessary if I wanted to keep summaries like that. I can keep a switch-case for a) direct SAs; b) groups but the rest I'd keep the same.


return factory.New().
WithSync(c.sync).
WithFilteredEventsInformersQueueKeysFunc(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you guys prefer the way these factor out while using this factory thing or do you prefer "normal" construction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the factoring is making it easier to observe what the expectations of the controller might be. Although I haven't written many "normal" kube-controllers, perhaps there would be a way to keep the inputs well readable, too.

c.usefulRolesLock.Lock()
defer c.usefulRolesLock.Unlock()

c.usefulRoles = map[string][]string{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing where this is updated over time as roles are updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, previously I had it tied to events that come for the controller but it's true that while thinking of it as a separate logical object, it should add its own event handlers to the informers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented it so that the informer events for roles and SCCs are handled by the cache, and the cache is capable of adding keys to an external queue.

return serviceAccounts, nil
}

func NewSAToSCCCache(rbacInformers rbacv1informers.Interface, sccInfomer securityv1informers.SecurityContextConstraintsInformer) *SAToSCCCache {
Copy link
Contributor

@deads2k deads2k Apr 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a potential scale problem to me. The for loops in here are pretty deeply nested. If this becomes a bottleneck, how would you make it an order of magnitude faster. The information is present to fully invert this logic.

I'm not saying this must be done immediately (unless it fails early in scale testing). I'm asking about how we'd handle it when it does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic expects minimal changes in terms of SCC creation, which is currently the most expensive operation as the role cache gets reinitialized. That also means that the initialization of the cache on the start is going to be quite expensive - each SCC addition means (n_SCCs * (n_roles * n_role_rules)) complexity. This could perhaps be improved by the cache dealing with only the SCC that was changed, and therefore not reinitializing the whole cache.

As for the roles and clusterroles - these are handled one-by-one and cached. The computation complexity there is not great - (n_SCCs * n_role_rules) - but I think that in this case it's either us or the kube-apiserver doing the computation.

As for the memory complexity, the role cache is just a string->[]strings where the string slice are the names of the SCCs. This means that the cache will of course grow with role and SCC numbers, but the number of SCCs should hopefully never be too great and so memory growth here should be rather linear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cost for adding an SCC has been improved to (n_roles * n_role_rules)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@s-urbaniak proposed to write a benchmark test as a follow-up

@stlaz
Copy link
Contributor Author

stlaz commented May 5, 2022

The last commit brings optimization of the SCCs as discussed in #75 (comment). It seems that, for some reason, this is blocking removal of the openshift finalizer from namespaces, I'll have to check why that could be.

@stlaz
Copy link
Contributor Author

stlaz commented May 10, 2022

/test all
the RBAC PR merged, let's see how this actually works

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 1, 2022
@stlaz
Copy link
Contributor Author

stlaz commented Jun 1, 2022

rebased on current master w/ kube 1.24

@wallylewis
Copy link

/retest-required

1 similar comment
@stlaz
Copy link
Contributor Author

stlaz commented Jun 2, 2022

/retest-required

@s-urbaniak
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 2, 2022
@stlaz
Copy link
Contributor Author

stlaz commented Jun 2, 2022

/retest

@stlaz
Copy link
Contributor Author

stlaz commented Jun 2, 2022

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 2, 2022
@stlaz
Copy link
Contributor Author

stlaz commented Jun 2, 2022

A CI payload with the required o/origin fixes has not been promoted yet 🙁

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 2 against base HEAD b0376b1 and 8 for PR HEAD 5a36194 in total

@stlaz
Copy link
Contributor Author

stlaz commented Jun 2, 2022

/retest-required
the fixed payload should have finally been promoted

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 1 against base HEAD b0376b1 and 7 for PR HEAD 5a36194 in total

@wallylewis
Copy link

/retest-required

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD b0376b1 and 6 for PR HEAD 5a36194 in total

@stlaz
Copy link
Contributor Author

stlaz commented Jun 3, 2022

/retest-required
infra

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 3, 2022
@s-urbaniak
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 3, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 3, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: s-urbaniak, stlaz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 3, 2022

@stlaz: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit f022305 into openshift:master Jun 3, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 3, 2022

@stlaz: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Bugzilla bug in order for it to move to the next state. Once unlinked, request a bug refresh with /bugzilla refresh.

Bugzilla bug 2086519 has not been moved to the MODIFIED state.

In response to this:

Bug 2086519: Introduce Pod Security Admission Label Synchronization controller

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants