Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] refactor operator group cluster role name #2991

Closed

Conversation

perdasilva
Copy link
Collaborator

@perdasilva perdasilva commented Jul 13, 2023

Description of the change:
In its current implementation, OLM creates three cluster roles for and operator-group: -admin, -view, and -edit.

Motivation for the change:
OCPBUGS-14698

Architectural changes:

The cluster role name format was changed to: olm.operatorgroup.{admin|edit | view}

Testing remarks:

When this hits a running cluster, it will abandon the currently existing cluster roles in favor of new ones that respect the new format. This means that additional migration information will need to be provided in documentation.

The unit tests check that the appropriate cluster roles are created whether or not there is a currently existing. E2e tests responsible for the cluster role creation were also updated for the new format.

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Bug fixes are accompanied by regression test(s)
  • e2e tests and flake fixes are accompanied evidence of flake testing, e.g. executing the test 100(0) times
  • tech debt/todo is accompanied by issue link(s) in comments in the surrounding code
  • Tests are comprehensible, e.g. Ginkgo DSL is being used appropriately
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive
  • Tests marked as [FLAKE] are truly flaky and have an issue
  • Code is properly formatted

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 13, 2023
@openshift-ci
Copy link

openshift-ci bot commented Jul 13, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: perdasilva

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 13, 2023
@perdasilva perdasilva marked this pull request as draft July 13, 2023 14:32
@perdasilva perdasilva force-pushed the OCPBUGS-14698 branch 5 times, most recently from 560c977 to f895696 Compare July 18, 2023 12:41
@perdasilva perdasilva force-pushed the OCPBUGS-14698 branch 5 times, most recently from 21b16d0 to 399be1d Compare August 7, 2023 13:23
@perdasilva perdasilva force-pushed the OCPBUGS-14698 branch 2 times, most recently from d45829f to 1134900 Compare August 11, 2023 13:42
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 11, 2023
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 11, 2023
@perdasilva perdasilva force-pushed the OCPBUGS-14698 branch 9 times, most recently from 771c01e to 1785049 Compare August 14, 2023 16:28
@perdasilva perdasilva force-pushed the OCPBUGS-14698 branch 2 times, most recently from 736a801 to c06c834 Compare August 23, 2023 10:17
@awgreene
Copy link
Member

awgreene commented Aug 23, 2023

So the main takeaways are that this change:

  • Prefix’s the k8s resources with olm.operatorgroup.
  • Adds a random suffix to the k8s resources based on the UID of the OG
    -- Names look like olm.operatorgroup.(view|edit|admin)-######
  • Existing OG’s view, admin, edit clusterRoles/clusterRoleBindings will be abandoned. It would be difficult to ID these resources with Kubectl and without OLM specific knowledge

My initial thoughts:

  • Changing the name is fine, but basing it on the UID makes the OG’s name unpredictable. Could we sha the spec of the namespace/name which should always be unique for OGs due to the k8s api?
    -- Changing the name places existing clusterRole(Binding)s into three categories:
    --- Cat 1: resources that OLM owned alone.
    ---- OUTCOME: Abandoned resource that needs cleanup
    --- Cat 2: resources that OLM was fighting another controller over.
    ---- OUTCOME: Other controller wins and we’re good.
    ---Cat 3: resources that were created through a third party but isn’t watched by a controller and OLM updated.
    ---- OUTCOME: Basically the same as 2, left in state that OLM update it to BUT something is probably broken.
  • I would like to provide users with guidance to perform the manual migration:
    -- We could have solved this earlier by keeping track of resources that OLM creates and not updating resources created by third parties. Steve recently introduced a PR that adds some labels to non-OLM api resources created by OLM. This doesn’t solve our migration issue, but we should consider using labels and not updating label-less resources as a best practice going forward.
    -- The question now becomes how will users identify what resources to look at. Some options:
    --- We provide a script that a users points at a cluster, it finds existing OGs on cluster, and it prints out the clusterRole(Binding)s that need to be inspected and possibly deleted.
    -- Alternatively, we could perform a slow rollout in which we introduce labels and then change the name. I don’t see much value here since we can map existing OGs -> resources

@perdasilva
Copy link
Collaborator Author

@awgreene good call out on Steve's PR. I'll rebase and add that label to the role, if he hasn't already. We've created our own "owner refs" as labels on the ClusterRole. So, it should be straightforward to identify the ClusterRoles. Using ns/name was my first thought for a hash - @kevinrizza called out that it could leak information (i.e. people could use guessing attacks). The resource ID is stable and should lead to predictable values (at least for the lifetime of the object). wdyt?

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 24, 2023
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 30, 2023
@tmshort tmshort self-requested a review September 11, 2023 20:58
@tmshort tmshort self-assigned this Sep 11, 2023
@tmshort
Copy link
Contributor

tmshort commented Sep 11, 2023

In a lot of the test code, there's odd white-space changes, which can probably be undone.

Copy link
Contributor

@tmshort tmshort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits. and a bunch of unrelated changes that ought to be resolved.

I guess my biggest concern is with the new naming of the ClusterRoles

pkg/controller/operators/olm/operatorgroup.go Outdated Show resolved Hide resolved
pkg/controller/operators/olm/operator_test.go Show resolved Hide resolved
pkg/controller/operators/olm/operatorgroup.go Outdated Show resolved Hide resolved
pkg/controller/operators/olm/operatorgroup.go Outdated Show resolved Hide resolved
pkg/controller/operators/olm/operatorgroup.go Show resolved Hide resolved
pkg/lib/ownerutil/util.go Outdated Show resolved Hide resolved
@perdasilva perdasilva force-pushed the OCPBUGS-14698 branch 2 times, most recently from 99a490a to a8c1772 Compare September 14, 2023 11:25
if err == nil {

if existingRole != nil {
// if the existing role conforms to the naming convention, check for skew
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer semantic.Equality for checks on k8s objects. Sometimes thing like order in arrays matters.

Copy link
Member

@stevekuznetsov stevekuznetsov Sep 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, alternatively, simply use server-side apply to assert that the state you want for the object is right. k8s server will do any diffs you need and even create if not existing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this block this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updating using equality.Semantic - I haven't used SSA. I'd need a bit more time. If you want I can create an issue to move to SSA.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine for now. SSA issue for the future would be good. More important that we do it in every case for v1 than we go back to improve every possible case in v0.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locked in - I'll keep that in mind for reviews! Ty

@tmshort
Copy link
Contributor

tmshort commented Sep 15, 2023

Still looking to get those unnecessary formatting changes in operator_test.go reverted.

Per Goncalves da Silva added 6 commits September 18, 2023 16:28
Signed-off-by: Per Goncalves da Silva <pegoncal@redhat.com>
Signed-off-by: Per Goncalves da Silva <pegoncal@redhat.com>
Signed-off-by: Per Goncalves da Silva <pegoncal@redhat.com>
Signed-off-by: Per Goncalves da Silva <pegoncal@redhat.com>
…the cluster role

Signed-off-by: Per Goncalves da Silva <pegoncal@redhat.com>
Signed-off-by: Per Goncalves da Silva <pegoncal@redhat.com>
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 21, 2023
@openshift-merge-robot
Copy link
Collaborator

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tmshort
Copy link
Contributor

tmshort commented Oct 18, 2023

Closing due to #3035

@tmshort tmshort closed this Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants