-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 2086519: Introduce Pod Security Admission Label Synchronization controller #75
Conversation
Skipping CI for Draft Pull Request. |
/test all |
@@ -103,6 +105,10 @@ func NewControllerContext( | |||
if err != nil { | |||
return nil, err | |||
} | |||
securityClient, err := securityclient.NewForConfig(clientConfig) | |||
if err != nil { | |||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
general nit: fmt.Errorf("error creating security client: %w", err)
so we have good stacktraces if things goe wonky.
/test all |
/hold |
/retest |
Note to reviewers: the controller won't be able to do anything until openshift/cluster-kube-controller-manager-operator#616 gets merged so you are only reviewing the code. Once we are fine with the concept, we can merge the code and run CI on it again. |
pkg/psalabelsyncer/sccrolecache.go
Outdated
|
||
serviceAccounts := []string{} | ||
for _, subject := range roleBinding.Subjects() { | ||
if subject.APIGroup == "" && subject.Kind == "ServiceAccount" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: I prefer switch/case and summarizing someting like
IsGroupBinding := subject.APIGroup == rbacv1.GroupName && subject.Kind == "Group"
isAllServiceAccount := subject.Name == serviceaccount.AllServiceAccountsGroup
isNamespacedServiceAccountGroup := strings.HasPrefix(subject.Name, serviceaccount.ServiceAccountGroupPrefix))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried, it turns out to be quite a lot of string comparisons that may be unnecessary if I wanted to keep summaries like that. I can keep a switch-case for a) direct SAs; b) groups but the rest I'd keep the same.
|
||
return factory.New(). | ||
WithSync(c.sync). | ||
WithFilteredEventsInformersQueueKeysFunc( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you guys prefer the way these factor out while using this factory thing or do you prefer "normal" construction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the factoring is making it easier to observe what the expectations of the controller might be. Although I haven't written many "normal" kube-controllers, perhaps there would be a way to keep the inputs well readable, too.
pkg/psalabelsyncer/sccrolecache.go
Outdated
c.usefulRolesLock.Lock() | ||
defer c.usefulRolesLock.Unlock() | ||
|
||
c.usefulRoles = map[string][]string{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing where this is updated over time as roles are updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, previously I had it tied to events that come for the controller but it's true that while thinking of it as a separate logical object, it should add its own event handlers to the informers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented it so that the informer events for roles and SCCs are handled by the cache, and the cache is capable of adding keys to an external queue.
pkg/psalabelsyncer/sccrolecache.go
Outdated
return serviceAccounts, nil | ||
} | ||
|
||
func NewSAToSCCCache(rbacInformers rbacv1informers.Interface, sccInfomer securityv1informers.SecurityContextConstraintsInformer) *SAToSCCCache { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a potential scale problem to me. The for loops in here are pretty deeply nested. If this becomes a bottleneck, how would you make it an order of magnitude faster. The information is present to fully invert this logic.
I'm not saying this must be done immediately (unless it fails early in scale testing). I'm asking about how we'd handle it when it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic expects minimal changes in terms of SCC creation, which is currently the most expensive operation as the role cache gets reinitialized. That also means that the initialization of the cache on the start is going to be quite expensive - each SCC addition means (n_SCCs * (n_roles * n_role_rules))
complexity. This could perhaps be improved by the cache dealing with only the SCC that was changed, and therefore not reinitializing the whole cache.
As for the roles and clusterroles - these are handled one-by-one and cached. The computation complexity there is not great - (n_SCCs * n_role_rules)
- but I think that in this case it's either us or the kube-apiserver doing the computation.
As for the memory complexity, the role cache is just a string->[]strings
where the string slice are the names of the SCCs. This means that the cache will of course grow with role and SCC numbers, but the number of SCCs should hopefully never be too great and so memory growth here should be rather linear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cost for adding an SCC has been improved to (n_roles * n_role_rules)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@s-urbaniak proposed to write a benchmark test as a follow-up
The last commit brings optimization of the SCCs as discussed in #75 (comment). It seems that, for some reason, this is blocking removal of the openshift finalizer from namespaces, I'll have to check why that could be. |
/test all |
rebased on current master w/ kube 1.24 |
/retest-required |
1 similar comment
/retest-required |
/lgtm |
/retest |
/hold cancel |
A CI payload with the required o/origin fixes has not been promoted yet 🙁 |
/retest-required |
/retest-required |
/retest-required |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: s-urbaniak, stlaz The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@stlaz: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@stlaz: Some pull requests linked via external trackers have merged: The following pull requests linked via external trackers have not merged: These pull request must merge or be unlinked from the Bugzilla bug in order for it to move to the next state. Once unlinked, request a bug refresh with Bugzilla bug 2086519 has not been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This PR introduced a controller that implements the design from openshift/enhancements#1010
So far this is just a draft as we need to introduce optimizations to only ever consider rolebindings/roles that matter in terms of SCC access to prevent unnecessary number of calculations in case of a big number of Roles/SCCs/both.
/assign @s-urbaniak
/cc @openshift/openshift-team-auth-maintainers @deads2k