Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test/e2e: Reduce subscription e2e test pollution #2438

Conversation

timflannagan
Copy link
Contributor

Description of the change:
Reduce the overall testing pollution in the subscription e2e suite by generating a testing namespace for individual tests.

Motivation for the change:
Various flakes within this e2e package are popping up more frequently. Recent changes made to reduce the flake tolerance budget from three attempts to a single attempt were hiding inadequate isolation and cleanup between individual tests. A simple example is a test that created test fixtures in a shared namespace but neglected to cleanup those resources. Another, unrelated test will build up it's own set of text fixtures, and fail that test's individual assertions as the previously created and improperly cleaned up resources have unattended consequences, ex: a subscription fails resolution as another test had created a subscription in the same namespace, waited for an installplan to be created and report a completed state, that installplan is removed after that test had finished but the subscription still exists and the new test fails assertions that expect the newer subscription to report a ready state.

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive

All test cases in subscription suite shares the same test namespace.
In some cases, the resource cleanup which doesn't work properly from
previous test causes failure on subsequent test.

This commit will create new namespace for each test so it can be run
in insolation.

Signed-off-by: Vu Dinh <vudinh@outlook.com>
@timflannagan timflannagan force-pushed the test/reduce-subscription-e2e-test-pollution branch from ca56d21 to aff1154 Compare November 9, 2021 02:22
- Update the helper function SetUpGeneratedTestNamespace -> SetupGeneratedTestNamespace.
- Update SetupGeneratedTestNamespace to only return a corev1.Namespace
  object. Call sites can access the generated namespace by indexing into
  the object's GetName() method or the metdata.Name field.
- Update SetupGeneratedTestNamespace and remove the logic that generates
  the namespace's metadata.name in the function. Call sites are
  responsible for generating a unique name now.
- Update any subscription e2e tests to use the returned corev1.Namespace
  resource's GetName() method.
- Remove the usage of the TearDown helper function in the subscription
  e2e suite. Generated namespaces (and their underlying testing
  resources) are cleaned up by deleting the namespace itself.

Signed-off-by: timflannagan <timflannagan@gmail.com>
Remove duplicate package import for the operator v1alpha1 API library.

Signed-off-by: timflannagan <timflannagan@gmail.com>
@timflannagan timflannagan force-pushed the test/reduce-subscription-e2e-test-pollution branch from aff1154 to de4e4b3 Compare November 9, 2021 02:47
Update the subscription and util e2e packages and replace any instances
of context.TODO() with context.Background().

Signed-off-by: timflannagan <timflannagan@gmail.com>
@timflannagan
Copy link
Contributor Author

Hmm interesting:

2021-11-09T03:05:17.728613219Z stderr F E1109 03:05:17.728509       1 queueinformer_operator.go:290] sync {"update" "subscription-e2e-9bxh2/install-22vn4"} failed: attenuated service account query failed - more than one operator group(s) are managing this namespace count=2
2021-11-09T03:05:17.72864542Z stderr F time="2021-11-09T03:05:17Z" level=info msg=syncing id=5d+CC ip=install-22vn4 namespace=subscription-e2e-9bxh2 phase=Installing
2021-11-09T03:05:17.731261273Z stderr F time="2021-11-09T03:05:17Z" level=info msg="attenuated service account query failed - more than one operator group(s) are managing this namespace count=2" id=5d+CC ip=install-22vn4 namespace=subscription-e2e-9bxh2 phase=Installing
2021-11-09T03:05:18.130980249Z stderr F E1109 03:05:18.130887       1 queueinformer_operator.go:290] sync {"update" "subscription-e2e-9bxh2/install-mqx5p"} failed: attenuated service account query failed - more than one operator group(s) are managing this namespace count=2
2021-11-09T03:05:18.130992349Z stderr F time="2021-11-09T03:05:18Z" level=info msg=syncing id=XXo2z ip=install-mqx5p namespace=subscription-e2e-9bxh2 phase=Installing
2021-11-09T03:05:18.133394998Z stderr F time="2021-11-09T03:05:18Z" level=info msg="attenuated service account query failed - more than one operator group(s) are managing this namespace count=2" id=XXo2z ip=install-mqx5p namespace=subscription-e2e-9bxh2 phase=Installing
2021-11-09T03:05:18.723910477Z stderr F I1109 03:05:18.723671       1 request.go:665] Waited for 1.183574108s due to client-side throttling, not priority and fairness, request: PUT:https://10.96.0.1:443/apis/operators.coreos.com/v1alpha1/namespaces/subscription-e2e-9bxh2/subscriptions/sub-ljktx/status
2021-11-09T03:05:18.729620894Z stderr F time="2021-11-09T03:05:18Z" level=warning msg="an error was encountered during reconciliation" error="Operation cannot be fulfilled on subscriptions.operators.coreos.com \"sub-ljktx\": the object has been modified; please apply your changes to the latest version and try again" event=update reconciling="*v1alpha1.Subscription" selflink=
2021-11-09T03:05:18.729733696Z stderr F E1109 03:05:18.729561       1 queueinformer_operato

Likely also creating a custom OG for that test case.

…concile InstallPlan status' test

The 'can reconcile InstallPlan status' test case creates a custom OG
that targets the namespace that was created. This leads to the following
error message being logged in CI:

``log
2021-11-09T03:05:18.130980249Z stderr F E1109 03:05:18.130887       1 queueinformer_operator.go:290] sync {"update" "subscription-e2e-9bxh2/install-mqx5p"} failed: attenuated service account query failed - more than one operator group(s) are managing this namespace count=2
2021-11-09T03:05:18.130992349Z stderr F time="2021-11-09T03:05:18Z" level=info msg=syncing id=XXo2z ip=install-mqx5p namespace=subscription-e2e-9bxh2 phase=Installing
```

Remove this custom OG and use the OG that was created in the BeforeEach
clause instead.

Signed-off-by: timflannagan <timflannagan@gmail.com>
@kevinrizza
Copy link
Member

/approve

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 9, 2021
Copy link
Member

@dinhxuanvu dinhxuanvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 9, 2021
@openshift-ci
Copy link

openshift-ci bot commented Nov 9, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dinhxuanvu, kevinrizza, timflannagan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [dinhxuanvu,kevinrizza]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dinhxuanvu
Copy link
Member

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 9, 2021
@timflannagan
Copy link
Contributor Author

Poked around that podconfig test failure further and found the following in the olm operator logs:

2021-11-09T20:09:48.588988052Z stderr F 2021-11-09T20:09:48.588Z	ERROR	controllers.operatorcondition	Error ensuring OperatorCondition Deployment EnvVars	{"request": "subscription-e2e-nhbpn/nginx-b", "error": "Deployment.apps \"podconfig-dep-hd5ll\" not found"}
2021-11-09T20:09:48.588997052Z stderr F sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
2021-11-09T20:09:48.589000252Z stderr F 	/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298
2021-11-09T20:09:48.589003352Z stderr F sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
2021-11-09T20:09:48.589006252Z stderr F 	/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253
2021-11-09T20:09:48.589009252Z stderr F sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
2021-11-09T20:09:48.589012152Z stderr F 	/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214
2021-11-09T20:09:48.589155653Z stderr F 2021-11-09T20:09:48.589Z	ERROR	controller-runtime.manager.controller.operatorcondition	Reconciler error	{"reconciler group": "operators.coreos.com", "reconciler kind": "OperatorCondition", "name": "nginx-b", "namespace": "subscription-e2e-nhbpn", "error": "Deployment.apps \"podconfig-dep-hd5ll\" not found"}
2021-11-09T20:09:48.589163153Z stderr F sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
2021-11-09T20:09:48.589166653Z stderr F 	/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253
2021-11-09T20:09:48.589169753Z stderr F sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
2021-11-09T20:09:48.589172953Z stderr F 	/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214
2021-11-09T20:09:48.589920458Z stderr F time="2021-11-09T20:09:48Z" level=debug msg="permissions/requirements not met" minKubeMet=true permMet=false reqMet=true
2021-11-09T20:09:48.589961058Z stderr F time="2021-11-09T20:09:48Z" level=debug msg="checking if csv is replacing an older version"
2021-11-09T20:09:48.602826837Z stderr F time="2021-11-09T20:09:48Z" level=debug msg="unable to get previous csv" error="clusterserviceversions.operators.coreos.com \"nginx-a\" not found" replacing=nginx-a
2021-11-09T20:09:48.602894138Z stderr F time="2021-11-09T20:09:48Z" level=info msg="requirements were not met" csv=nginx-b id=RyIg4 namespace=subscription-e2e-nhbpn phase=Pending
2021-11-09T20:09:48.603593442Z stderr F I1109 20:09:48.603522       1 event.go:282] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"subscription-e2e-nhbpn", Name:"nginx-b", UID:"40515abd-7816-453d-9300-085f57e8b64f", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"18716", FieldPath:""}): type: 'Normal' reason: 'RequirementsNotMet' one or more requirements couldn't be found

It looks like the InstallPlan generated is reported a completed state, but the underlying nginx deploymentspec wasn't able to be created. We use that helper function for generating a CSV based on the nginx deployment quite a bit so I'm not too sure how to diagnose further.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 9, 2021
@timflannagan timflannagan force-pushed the test/reduce-subscription-e2e-test-pollution branch 2 times, most recently from cd0e512 to 2563e69 Compare November 10, 2021 01:34
@dinhxuanvu
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 10, 2021
@timflannagan timflannagan removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 10, 2021
@openshift-merge-robot openshift-merge-robot merged commit 6be7d55 into operator-framework:master Nov 10, 2021
@timflannagan timflannagan deleted the test/reduce-subscription-e2e-test-pollution branch November 10, 2021 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants