Make generic controller for cluster-based resources #259

andrewstucki · 2024-10-02T19:19:20Z

This tries to extract the core of the Users controller into a generic controller implementation that wraps a ResourceReconciler interface that controls the guts of the three operations that we pretty much need to do for every cluster-based resource:

Patching finalizers (this could probably be removed if we want to do it in an even more generic way)
Syncing (upserting) a resource
Deleting a resource

Hence the interface looks like this:

	FinalizerPatch(request ResourceRequest[T]) client.Patch
	SyncResource(ctx context.Context, request ResourceRequest[T]) (client.Patch, error)
	DeleteResource(ctx context.Context, request ResourceRequest[T]) error

Most of the sync/delete logic can either then be inlined into the controllers, or, ideally wrapped in a "high-level" client/synchronizer (see the ACL synchronization code) in the clients package.

I'm intending on using this for CRD-defined schemas, and if y'all like the structure, would want to take a pass at refactoring TopicReconciler to use this as well for consistency.

This allows us to follow a really simple pattern for any additional CRDs. Basically anything that we're controlling via CRD and putting in a cluster should:

Add a ClusterSource field.
Generate some applyconfiguration structures
If it's complex, implement a high-level synchronizer that handles the heavy lifting of the Redpanda operations
Add a reconciler that implements the above interface

Doing so would give you:

Consistent status conditions
Consistent ways of connecting to a referenced cluster
Server-side apply pretty much everywhere
Similarly structured code patterns

RafalKorepta

I didn't finish review.

RafalKorepta · 2024-10-03T13:18:22Z

operator/api/redpanda/v1alpha2/common.go

+	ResourceConditionReasonSynced               = "Synced"
+	ResourceConditionReasonClusterRefInvalid    = "ClusterRefInvalid"
+	ResourceConditionReasonConfigurationInvalid = "ConfigurationInvalid"
+	ResourceConditionReasonTerminalClientError  = "TerminalClientError"


Would terminal client error prevent from further reconciliation? What action should be taken by the user of the controller/custom resource? Could the error constant have brief description or explanation when they should be used?

Maybe I'm overexaggerate over the word Terminal.

Maybe, I use the notion of a "terminal" error in two places currently, which is how the UserReconciler works:

redpanda-operator/operator/internal/controller/redpanda/resource_controller.go

Lines 122 to 137 in f61b187

func ignoreAllConnectionErrors(logger logr.Logger, err error) error {

// If we have known errors where we're unable to actually establish

// a connection to the cluster due to say, invalid connection parameters

// we're going to just skip the cleanup phase since we likely won't be

// able to clean ourselves up anyway.

if internalclient.IsTerminalClientError(err) ||

internalclient.IsConfigurationError(err) ||

internalclient.IsInvalidClusterError(err) {

// We use Info rather than Error here because we don't want

// to ignore the verbosity settings. This is really only for

// debugging purposes.

logger.V(2).Info("Ignoring non-retryable client error", "error", err)

return nil

}

return err

}

redpanda-operator/operator/internal/controller/redpanda/resource_controller.go

Lines 139 to 153 in f61b187

func handleResourceSyncErrors(err error) (metav1.Condition, error) {

// If we have a known terminal error, just set the sync condition and don't re-run reconciliation.

if internalclient.IsInvalidClusterError(err) {

return redpandav1alpha2.ResourceNotSyncedCondition(redpandav1alpha2.ResourceConditionReasonClusterRefInvalid, err), nil

}

if internalclient.IsConfigurationError(err) {

return redpandav1alpha2.ResourceNotSyncedCondition(redpandav1alpha2.ResourceConditionReasonConfigurationInvalid, err), nil

}

if internalclient.IsTerminalClientError(err) {

return redpandav1alpha2.ResourceNotSyncedCondition(redpandav1alpha2.ResourceConditionReasonTerminalClientError, err), nil

}

// otherwise, set a generic unexpected error and return an error so we can re-reconcile.

return redpandav1alpha2.ResourceNotSyncedCondition(redpandav1alpha2.ResourceConditionReasonUnexpectedError, err), err

}

The first is basically if you want to ignore errors that shouldn't ever be retried (i.e. you have a non-existent cluster that's been blown away and you're trying to delete yourself from it, cluster configuration now has SASL auth disabled and you're trying to delete a user, etc.). This is used during the cleanup routine to do "best-effort" style cleanup, if something is "terminal" we basically just say, "oh well" and let the resource be GC'd if we hit one of these.

The second is in the case of you trying to sync a resource. In that case we just set the status as having encountered a terminal error with the error message in the status condition message and then stop reconciling because this isn't an "ephemeral"-type error.

Also 👍 on documenting this stuff more.

RafalKorepta · 2024-10-03T13:30:22Z

operator/internal/controller/redpanda/index.go

 }

-func registerUserClusterIndex(ctx context.Context, mgr ctrl.Manager) error {
-	return mgr.GetFieldIndexer().IndexField(ctx, &redpandav1alpha2.User{}, userClusterIndex, indexUserCluster)
+func registerClusterSourceIndex[T client.Object, U clientList[T]](ctx context.Context, mgr ctrl.Manager, name string, o T, l U) (handler.EventHandler, error) {


NIT: For me the o and l is hard to read. Maybe the following suggestion could make it more readable 👇

Suggested change

func registerClusterSourceIndex[T client.Object, U clientList[T]](ctx context.Context, mgr ctrl.Manager, name string, o T, l U) (handler.EventHandler, error) {

func registerClusterSourceIndex[T client.Object, U clientList[T]](ctx context.Context, mgr ctrl.Manager, name string, obj T, list U) (handler.EventHandler, error) {

RafalKorepta · 2024-10-03T13:32:10Z

operator/internal/controller/redpanda/index.go

+			return nil
+		}
+		return requests
+	})
 }

 func clusterForHelmManagedObject(o client.Object) (types.NamespacedName, bool) {


Not related to this PR

role := labels["app.kubernetes.io/name"] if !slices.Contains([]string{"redpanda", "console"}, role) { return types.NamespacedName{}, false }

Should include connectors too.

operator/internal/controller/redpanda/index.go

RafalKorepta · 2024-10-04T11:34:14Z

operator/internal/controller/redpanda/user_controller.go

+		// Every 5 minutes try and check to make sure no manual modifications
+		// happened on the resource synced to the cluster and attempt to correct
+		// any drift.
+		Complete(controller.PeriodicallyReconcile(5 * time.Minute))


NIT: Maybe make the period configurable.

RafalKorepta

LGTM

RafalKorepta · 2024-10-04T13:29:14Z

operator/internal/controller/redpanda/resource_controller_test.go

+		_ = testEnv.Stop()
+	})
+
+	container, err := redpanda.Run(ctx, "docker.redpanda.com/redpandadata/redpanda:v23.2.8",


NIT: As a follow up we should bump this version in all places in our test suite

andrewstucki · 2024-10-04T14:01:39Z

Going to go ahead and merge and then I'll address some of the documentation in a follow-up.

Make generic controller for cluster resources

9d959f9

andrewstucki requested review from RafalKorepta and chrisseto as code owners October 2, 2024 19:19

add error check to test

f61b187

RafalKorepta reviewed Oct 3, 2024

View reviewed changes

RafalKorepta reviewed Oct 4, 2024

View reviewed changes

RafalKorepta approved these changes Oct 4, 2024

View reviewed changes

RafalKorepta reviewed Oct 4, 2024

View reviewed changes

andrewstucki merged commit e897bcf into main Oct 4, 2024
5 checks passed

andrewstucki deleted the generalized-controller branch October 4, 2024 14:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make generic controller for cluster-based resources #259

Make generic controller for cluster-based resources #259

andrewstucki commented Oct 2, 2024

RafalKorepta left a comment

RafalKorepta Oct 3, 2024

andrewstucki Oct 3, 2024

andrewstucki Oct 3, 2024

RafalKorepta Oct 3, 2024

RafalKorepta Oct 3, 2024

RafalKorepta Oct 4, 2024

RafalKorepta left a comment

RafalKorepta Oct 4, 2024

andrewstucki commented Oct 4, 2024

	func ignoreAllConnectionErrors(logger logr.Logger, err error) error {
	// If we have known errors where we're unable to actually establish
	// a connection to the cluster due to say, invalid connection parameters
	// we're going to just skip the cleanup phase since we likely won't be
	// able to clean ourselves up anyway.
	if internalclient.IsTerminalClientError(err) \|\|
	internalclient.IsConfigurationError(err) \|\|
	internalclient.IsInvalidClusterError(err) {
	// We use Info rather than Error here because we don't want
	// to ignore the verbosity settings. This is really only for
	// debugging purposes.
	logger.V(2).Info("Ignoring non-retryable client error", "error", err)
	return nil
	}
	return err
	}

	func handleResourceSyncErrors(err error) (metav1.Condition, error) {
	// If we have a known terminal error, just set the sync condition and don't re-run reconciliation.
	if internalclient.IsInvalidClusterError(err) {
	return redpandav1alpha2.ResourceNotSyncedCondition(redpandav1alpha2.ResourceConditionReasonClusterRefInvalid, err), nil
	}
	if internalclient.IsConfigurationError(err) {
	return redpandav1alpha2.ResourceNotSyncedCondition(redpandav1alpha2.ResourceConditionReasonConfigurationInvalid, err), nil
	}
	if internalclient.IsTerminalClientError(err) {
	return redpandav1alpha2.ResourceNotSyncedCondition(redpandav1alpha2.ResourceConditionReasonTerminalClientError, err), nil
	}

	// otherwise, set a generic unexpected error and return an error so we can re-reconcile.
	return redpandav1alpha2.ResourceNotSyncedCondition(redpandav1alpha2.ResourceConditionReasonUnexpectedError, err), err
	}

	func registerClusterSourceIndex[T client.Object, U clientList[T]](ctx context.Context, mgr ctrl.Manager, name string, o T, l U) (handler.EventHandler, error) {
	func registerClusterSourceIndex[T client.Object, U clientList[T]](ctx context.Context, mgr ctrl.Manager, name string, obj T, list U) (handler.EventHandler, error) {

Make generic controller for cluster-based resources #259

Make generic controller for cluster-based resources #259

Conversation

andrewstucki commented Oct 2, 2024

RafalKorepta left a comment

Choose a reason for hiding this comment

RafalKorepta Oct 3, 2024

Choose a reason for hiding this comment

andrewstucki Oct 3, 2024

Choose a reason for hiding this comment

andrewstucki Oct 3, 2024

Choose a reason for hiding this comment

RafalKorepta Oct 3, 2024

Choose a reason for hiding this comment

RafalKorepta Oct 3, 2024

Choose a reason for hiding this comment

RafalKorepta Oct 4, 2024

Choose a reason for hiding this comment

RafalKorepta left a comment

Choose a reason for hiding this comment

RafalKorepta Oct 4, 2024

Choose a reason for hiding this comment

andrewstucki commented Oct 4, 2024