Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce protectedCopiedCSVNamespaces flag #2811

Conversation

awgreene
Copy link
Member

@awgreene awgreene commented Jul 5, 2022

Description of the change:

Introduce a runtime flag that prevents OLM from deleting Copied CSVs in a given set of namespaces.

Motivation for the change:

Problem: Users rely on Copied CSVs in order to understand which
operators are available in a given namespace. When installing All
Namespace operators, a Copied CSV is created in every namespace which
can place a huge performance strain on clusters with many namespaces.
OLM introduced the ability to disable Copied CSVs for All Namespace mode
operators in an effort to resolve the performance issues on large clusters,
unfortunately removing the ability for users to identify which operators are
available in a given namespace.

Solution: The protectedCopiedCSVNamespaces runtime flag can be used to
prevent Copied CSVs from being deleted even when Copied CSVs are
disabled. An admin can then provide users with the proper RBAC to view
which operators are running in All Namespace mode.
Architectural changes:

Testing remarks:

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Bug fixes are accompanied by regression test(s)
  • e2e tests and flake fixes are accompanied evidence of flake testing, e.g. executing the test 100(0) times
  • tech debt/todo is accompanied by issue link(s) in comments in the surrounding code
  • Tests are comprehensible, e.g. Ginkgo DSL is being used appropriately
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive
  • Tests marked as [FLAKE] are truly flaky and have an issue
  • Code is properly formatted

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 5, 2022
@openshift-ci
Copy link

openshift-ci bot commented Jul 5, 2022

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot requested review from anik120 and ecordell July 5, 2022 19:27
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 5, 2022
@awgreene awgreene force-pushed the disabled-copied-csvs-console-support branch 3 times, most recently from 4468c64 to e1dcc94 Compare July 6, 2022 14:49
@awgreene awgreene marked this pull request as ready for review July 6, 2022 14:50
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 6, 2022
@awgreene awgreene changed the title Disabled copied csvs console support Introduce protectedCopiedCSVNamespaces flag Jul 6, 2022
@awgreene
Copy link
Member Author

awgreene commented Jul 6, 2022

/hold

Should probably add a test, but I'm not sure how to do that yet given the runtime flag.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 6, 2022
@awgreene awgreene force-pushed the disabled-copied-csvs-console-support branch 3 times, most recently from dd29f31 to 67ab31a Compare July 11, 2022 18:50
Copy link
Contributor

@timflannagan timflannagan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm generally onboard with these changes, although I would prefer an API-based approach (e.g. OLMConfig?) but it's unclear whether that's overkill right now.

Do we need to accompany these changes with a test in some capacity? Do we need any upstream documentation changes here too?

pkg/controller/operators/olm/config.go Outdated Show resolved Hide resolved
@awgreene awgreene force-pushed the disabled-copied-csvs-console-support branch from 67ab31a to 2da2b81 Compare July 15, 2022 19:36
@awgreene
Copy link
Member Author

I think I'm generally onboard with these changes, although I would prefer an API-based approach (e.g. OLMConfig?) but it's unclear whether that's overkill right now.

I agree that it would have been nice to introduce this via the OLMConfig resource, but:

  • This is a workaround to support a very specific use case.
  • Ideally, most users will never touch this this feature.
  • Introducing it through the olmConfig API would encourage customers to use this feature.
  • There is no reason it can't be added to the olmConfig api in the future if requested.

Do we need to accompany these changes with a test in some capacity? Do we need any upstream documentation changes here too?

I've updated existing tests to account for this feature, but it's only tested if the operator is ran with the protectedCopiedCSVNamespaces flag.

@awgreene awgreene force-pushed the disabled-copied-csvs-console-support branch 2 times, most recently from 3020bd9 to 7dc8dca Compare July 20, 2022 11:47
@awgreene
Copy link
Member Author

/retest

@timflannagan
Copy link
Contributor

timflannagan commented Jul 20, 2022

Timed out after 60.001s.
  Expected success, but got an error:
      <*errors.errorString | 0xc0003f9a20>: {
          s: "condition does not have expected reason, message, and status. Expected { True 0 0001-01-01 00:00:00 +0000 UTC CopiedCSVsDisabled Copied CSVs are disabled and no unexpected copied CSVs were found for operators installed in AllNamespace mode}, got &Condition{Type:DisabledCopiedCSVs,Status:False,ObservedGeneration:0,LastTransitionTime:2022-07-20 14:14:10 +0000 UTC,Reason:CopiedCSVsDisabled,Message:Copied CSVs are disabled and at least one unexpected copied CSV was found for an operator installed in AllNamespace mode,}",
      }
      condition does not have expected reason, message, and status. Expected { True 0 0001-01-01 00:00:00 +0000 UTC CopiedCSVsDisabled Copied CSVs are disabled and no unexpected copied CSVs were found for operators installed in AllNamespace mode}, got &Condition{Type:DisabledCopiedCSVs,Status:False,ObservedGeneration:0,LastTransitionTime:2022-07-20 14:14:10 +0000 UTC,Reason:CopiedCSVsDisabled,Message:Copied CSVs are disabled and at least one unexpected copied CSV was found for an operator installed in AllNamespace mode,}
In [It] at: /home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/test/e2e/disabling_copied_csv_e2e_test.go:193

@awgreene That seems like a legitimate e2e failure?

@@ -1361,15 +1372,14 @@ func getCopiedCSVsCondition(isDisabled, csvIsRequeued bool) metav1.Condition {
return condition
}

condition.Reason = "CopiedCSVsDisabled"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to live as a constant? Where are the reasons typically defined, o-f/api?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably a good idea, any objections to doing this in a followup PR if I create an issue to track it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw this: that's fine with me, but I'd like to see us tackle this sooner than later.

Copy link
Contributor

@tylerslaton tylerslaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation looks good to me so I'm sending over a preemptive lgtm with the assumption that the E2E tests will get fixed up + Tim's comments.

/lgtm

test/e2e/disabling_copied_csv_e2e_test.go Show resolved Hide resolved
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 20, 2022
@awgreene awgreene force-pushed the disabled-copied-csvs-console-support branch from 7dc8dca to af91f27 Compare July 20, 2022 18:37
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 20, 2022
@awgreene
Copy link
Member Author

/hold

@awgreene awgreene force-pushed the disabled-copied-csvs-console-support branch 2 times, most recently from a38ff54 to 9566f98 Compare July 20, 2022 19:44
@awgreene
Copy link
Member Author

@awgreene That seems like a legitimate e2e failure?

@timflannagan this was a legitimate error where namespaces in a terminated state were included in the expected copied CSV count, fixed.

@awgreene awgreene force-pushed the disabled-copied-csvs-console-support branch 3 times, most recently from cd37d67 to cf0e432 Compare July 20, 2022 21:07
Copy link
Contributor

@tylerslaton tylerslaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of non-blocking comments that could be addressed later if it becomes essential.

/lgtm

pkg/controller/operators/olm/config.go Show resolved Hide resolved
pkg/controller/operators/olm/operator.go Show resolved Hide resolved
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 21, 2022
pkg/controller/operators/olm/operator.go Show resolved Hide resolved
pkg/controller/operators/olm/operator.go Outdated Show resolved Hide resolved
Comment on lines 148 to 160
if numCSVs := len(copiedCSVs.Items); numCSVs != len(protectedCopiedCSVNamespaces) {
return fmt.Errorf("Found %d copied CSVs, should be %d", numCSVs, len(protectedCopiedCSVNamespaces))
}

for k := range protectedCopiedCSVNamespaces {
found := false
for _, csv := range copiedCSVs.Items {
if csv.GetNamespace() == k {
found = true
break
}
}
if !found {
return fmt.Errorf("could not find copied CSV in protected namespace %s", k)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not blocking: I think we can refactor this to lean into gomega pattern matchers more, but these changes are just extending the pre-existing logic.

@@ -159,8 +178,8 @@ var _ = Describe("Disabling copied CSVs", func() {
}

expectedCondition := metav1.Condition{
Reason: "NoCopiedCSVsFound",
Message: "Copied CSVs are disabled and none were found for operators installed in AllNamespace mode",
Reason: "CopiedCSVsDisabled",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is why I'd like to see a follow-up that moves this reason to a constant variable that can be shared around.


func getProtectedCopiedCSVNamespaces(protectedCopiedCSVNamespaces map[string]struct{}) error {
var olmDeployment appsv1.Deployment
if err := ctx.Ctx().Client().Get(context.TODO(), apitypes.NamespacedName{Name: "olm-operator", Namespace: operatorNamespace}, &olmDeployment); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/not blocking: avoid hardcoding the context and namespaced name fields and pass those as parameters to this function instead?

@openshift-ci
Copy link

openshift-ci bot commented Jul 21, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: awgreene, timflannagan, tylerslaton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [awgreene,timflannagan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@timflannagan
Copy link
Contributor

I had a couple of comments, but nothing that was truly blocking. Feel free to remove the hold.

Problem: Users rely on Copied CSVs in order to understand which
operators are available in a given namespace. When installing All
Namespace operators, a Copied CSV is created in every namespace which
can place a huge performance strain on clusters with many namespaces.
OLM introduced the ability to disable Copied CSVs for All Namespace mode
operators  in an effort to resolve the performance issues on large clusters,
unfortunately removing the ability for users to identify which operators are
available in a given namespace.

Solution: The protectedCopiedCSVNamespaces runtime flag can be used to
prevent Copied CSVs from being deleted even when Copied CSVs are
disabled. An admin can then provide users with the proper RBAC to view
which operators are running in All Namespace mode.

Signed-off-by: Alexander Greene <greene.al1991@gmail.com>
@awgreene awgreene force-pushed the disabled-copied-csvs-console-support branch from cf0e432 to 03f8118 Compare July 21, 2022 14:42
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 21, 2022
@timflannagan
Copy link
Contributor

Nicely done. Feel free to remove the hold.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 21, 2022
@awgreene
Copy link
Member Author

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 21, 2022
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@timflannagan timflannagan merged commit 6c4e779 into operator-framework:master Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants