Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump controller-runtime, k8s libraries, openshift/library go, adjust CO to the new versions #337

Merged
merged 9 commits into from
Jul 25, 2023

Conversation

jhrozek
Copy link

@jhrozek jhrozek commented May 30, 2023

  • fix(deps): update module sigs.k8s.io/controller-runtime to v0.15.0
  • Bump openshift/library-go
  • bump k8s.io/pod-security-admission to v0.27.2
  • bump sigs.k8s.io/controller-tools to v0.12.0
  • Adjust CO reconcile loop setup and unit tests for the new controller-manager version

Adjust CO reconcile loop setup and unit tests for the new controller-manager version

In order to use the latest release of controller-manager we need to use
the new style of setting up controller loops that uses
ctrl.NewControllerManagedBy() and then the builder pattern instead of
adding c.Watch for each resource.

The tests had to be adjusted, too, to specifically name resources that
can update its status.

@jhrozek
Copy link
Author

jhrozek commented May 31, 2023

/test e2e-aws-parallel

1 similar comment
@jhrozek
Copy link
Author

jhrozek commented Jun 1, 2023

/test e2e-aws-parallel

@rhmdnd
Copy link

rhmdnd commented Jun 12, 2023

The parallel test failure could be related to the dependency bump:

{"level":"error","ts":"2023-06-01T11:19:05.841Z","logger":"remediationctrl","msg":"Retriable error","Request.Namespace":"osdk-e2e-80bb02e5-58c4-4085-a66c-0f85259b2217","Request.Name":"test-generic-remediation-fails-unkown","error":"failed to get API group resources: unable to retrieve the complete list of server APIs: foo.bar/v1: the server could not find the requested resource","stacktrace":"github.com/ComplianceAsCode/compliance-operator/pkg/controller/common.ReturnWithRetriableError\n\tgithub.com/ComplianceAsCode/compliance-operator/pkg/controller/common/errors.go:117\ngithub.com/ComplianceAsCode/compliance-operator/pkg/controller/complianceremediation.(*ReconcileComplianceRemediation).Reconcile\n\tgithub.com/ComplianceAsCode/compliance-operator/pkg/controller/complianceremediation/complianceremediation_controller.go:173\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226"}
{"level":"error","ts":"2023-06-01T11:19:05.841Z","msg":"Reconciler error","controller":"complianceremediation-controller","controllerGroup":"compliance.openshift.io","controllerKind":"ComplianceRemediation","ComplianceRemediation":{"name":"test-generic-remediation-fails-unkown","namespace":"osdk-e2e-80bb02e5-58c4-4085-a66c-0f85259b2217"},"namespace":"osdk-e2e-80bb02e5-58c4-4085-a66c-0f85259b2217","name":"test-generic-remediation-fails-unkown","reconcileID":"dc56fce9-df55-490f-843c-020571cf8557","error":"failed to get API group resources: unable to retrieve the complete list of server APIs: foo.bar/v1: the server could not find the requested resource","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226"}

The test times out after 30 minutes waiting for the remediation to error out, which it does to some extent, but doesn't register the remeidation state.

@jhrozek
Copy link
Author

jhrozek commented Jun 28, 2023

hmm, that test passes locally when I run that test only. But the last three CI runs of this PR fail always on that single test. Running more tests locally.

@jhrozek
Copy link
Author

jhrozek commented Jun 28, 2023

hmm all the parallel tests are passing locally. I'm going to re-run the tests again and watch them interactively to try and catch the error..

@jhrozek
Copy link
Author

jhrozek commented Jun 28, 2023

/test e2e-aws-parallel

@jhrozek
Copy link
Author

jhrozek commented Jun 28, 2023

I /think/ I managed to catch the issue:

{"level":"error","ts":"2023-06-28T14:38:36.312Z","logger":"remediationctrl","msg":"Retriable error","Request.Namespace":"osdk-e2e-ac518a1e-a339-4401-9a13-72e5
b2162618","Request.Name":"test-generic-remediation-fails-unkown","error":"failed to get API group resources: unable to retrieve the complete list of server APIs: foo.bar/v1: the server could not find the requested resource","stacktrace":"github.com/ComplianceAsCode/compliance-operator/pkg/controller/common.ReturnWi
thRetriableError\n\tgithub.com/ComplianceAsCode/compliance-operator/pkg/controller/common/errors.go:117\ngithub.com/ComplianceAsCode/compliance-operator/pkg/controller/complianceremediation.(*ReconcileComplianceRemediation).Reconcile\n\tgithub.com/ComplianceAsCode/compliance-operator/pkg/controller/complianceremedi
ation/complianceremediation_controller.go:173\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/cont
roller-runtime@v0.15.0/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.f
unc2.2\n\tsigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226"}                                                                  {"level":"error","ts":"2023-06-28T14:38:36.312Z","msg":"Reconciler error","controller":"complianceremediation-controller","controllerGroup":"compliance.opensh
ift.io","controllerKind":"ComplianceRemediation","ComplianceRemediation":{"name":"test-generic-remediation-fails-unkown","namespace":"osdk-e2e-ac518a1e-a339-4401-9a13-72e5b2162618"},"namespace":"osdk-e2e-ac518a1e-a339-4401-9a13-72e5b2162618","name":"test-generic-remediation-fails-unkown","reconcileID":"7c2cba67-3cd
4-4d64-9fd2-fa02cc484099","error":"failed to get API group resources: unable to retrieve the complete list of server APIs: foo.bar/v1: the server could not find the requested resource","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runti
me@v0.15.0/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsi
gs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226"}                                                                 

I guess the problem is that this should not be a Retriable Error..

@jhrozek
Copy link
Author

jhrozek commented Jul 7, 2023

/hold
because the commit I added is a dirty hack and we should do better. But I want to see how the CI fares now (manual tests passed with the workaround)

@@ -219,7 +219,7 @@ func (r *ReconcileComplianceRemediation) reconcileRemediation(instance *compv1al
return common.NewNonRetriableCtrlError(
"Unable to get fix object from ComplianceRemediation. "+
"Please update the compliance-operator's permissions: %w", err)
} else if runtime.IsNotRegisteredError(err) || meta.IsNoMatchError(err) {
} else if runtime.IsNotRegisteredError(err) || meta.IsNoMatchError(err) || (err != nil && strings.Contains(err.Error(), "the server could not find the requested resource")) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so this is the ugly workaround. I cannot for the life of me figure out how to catch the proper error. In the error string chain I see unable to retrieve the complete list of server APIs which would suggest the error is ErrGroupDiscoveryFailed which should be testable for with IsGroupDiscoveryFailedError from k8s.io/client-go/discovery/discovery_client.go but that doesn't seem to work for me. Which is especially odd with go 1.20 where errors.Is should even work across wrapped errors. I'll continue trying to get to the root of the issue, but as the worst case solution, maybe this workaround would at least pass CI.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it worked for CI at least.

Copy link

@rhmdnd rhmdnd Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange - after looking at the implementation in client-go I'm not sure why we wouldn't be able to use:

IsGroupDiscoveryFailedError(err)

Did you happen to check the type when you were poking at this?

renovate bot and others added 9 commits July 18, 2023 13:41
…manager version

In order to use the latest release of controller-manager we need to use
the new style of setting up controller loops that uses
ctrl.NewControllerManagedBy() and then the builder pattern instead of
adding c.Watch for each resource.

The tests had to be adjusted, too, to specifically name resources that
can update its status.
This is pretty much
fluxcd/flux2@aa65589
reused.

The new controller-runtime version requires that the logger is set or
else there's a stack trace printed. Since even the original code doesn't
use any logger for tests, let's just use null logger with the new
version as well.
@jhrozek
Copy link
Author

jhrozek commented Jul 18, 2023

I included the patch from PR #370 as a separate patch because go mod tidy was also needed and squashing would be too messy.

@rhmdnd
Copy link

rhmdnd commented Jul 19, 2023

/test e2e-aws-parallel

@rhmdnd
Copy link

rhmdnd commented Jul 20, 2023

I'm fine merging this for now and cleaning up the error after so we can at least get updated dependencies. But, yeah, that is odd...

Copy link

@rhmdnd rhmdnd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci
Copy link

openshift-ci bot commented Jul 20, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jhrozek, rhmdnd

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rhmdnd
Copy link

rhmdnd commented Jul 20, 2023

Adding other appropriate labels since this is a dep bump.

@rhmdnd
Copy link

rhmdnd commented Jul 20, 2023

Making sure @Vincent056 and @yuumasato have an opportunity to take a look at this before we remove the hold.

@yuumasato
Copy link
Member

Making sure @Vincent056 and @yuumasato have an opportunity to take a look at this before we remove the hold.

@rhmdnd Thanks. LGTM, based on my little knowledge of Go and the operator.

@rhmdnd
Copy link

rhmdnd commented Jul 25, 2023

Getting this merged so we can cleanup the dep bump patches.

@openshift-merge-robot openshift-merge-robot merged commit 13ec44b into ComplianceAsCode:master Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants