-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make generic controller for cluster-based resources #259
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't finish review.
ResourceConditionReasonSynced = "Synced" | ||
ResourceConditionReasonClusterRefInvalid = "ClusterRefInvalid" | ||
ResourceConditionReasonConfigurationInvalid = "ConfigurationInvalid" | ||
ResourceConditionReasonTerminalClientError = "TerminalClientError" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would terminal client error prevent from further reconciliation? What action should be taken by the user of the controller/custom resource? Could the error constant have brief description or explanation when they should be used?
Maybe I'm overexaggerate over the word Terminal
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, I use the notion of a "terminal" error in two places currently, which is how the UserReconciler
works:
redpanda-operator/operator/internal/controller/redpanda/resource_controller.go
Lines 122 to 137 in f61b187
func ignoreAllConnectionErrors(logger logr.Logger, err error) error { // If we have known errors where we're unable to actually establish // a connection to the cluster due to say, invalid connection parameters // we're going to just skip the cleanup phase since we likely won't be // able to clean ourselves up anyway. if internalclient.IsTerminalClientError(err) || internalclient.IsConfigurationError(err) || internalclient.IsInvalidClusterError(err) { // We use Info rather than Error here because we don't want // to ignore the verbosity settings. This is really only for // debugging purposes. logger.V(2).Info("Ignoring non-retryable client error", "error", err) return nil } return err } redpanda-operator/operator/internal/controller/redpanda/resource_controller.go
Lines 139 to 153 in f61b187
func handleResourceSyncErrors(err error) (metav1.Condition, error) { // If we have a known terminal error, just set the sync condition and don't re-run reconciliation. if internalclient.IsInvalidClusterError(err) { return redpandav1alpha2.ResourceNotSyncedCondition(redpandav1alpha2.ResourceConditionReasonClusterRefInvalid, err), nil } if internalclient.IsConfigurationError(err) { return redpandav1alpha2.ResourceNotSyncedCondition(redpandav1alpha2.ResourceConditionReasonConfigurationInvalid, err), nil } if internalclient.IsTerminalClientError(err) { return redpandav1alpha2.ResourceNotSyncedCondition(redpandav1alpha2.ResourceConditionReasonTerminalClientError, err), nil } // otherwise, set a generic unexpected error and return an error so we can re-reconcile. return redpandav1alpha2.ResourceNotSyncedCondition(redpandav1alpha2.ResourceConditionReasonUnexpectedError, err), err }
The first is basically if you want to ignore errors that shouldn't ever be retried (i.e. you have a non-existent cluster that's been blown away and you're trying to delete yourself from it, cluster configuration now has SASL auth disabled and you're trying to delete a user, etc.). This is used during the cleanup routine to do "best-effort" style cleanup, if something is "terminal" we basically just say, "oh well" and let the resource be GC'd if we hit one of these.
The second is in the case of you trying to sync a resource. In that case we just set the status as having encountered a terminal error with the error message in the status condition message and then stop reconciling because this isn't an "ephemeral"-type error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also 👍 on documenting this stuff more.
} | ||
|
||
func registerUserClusterIndex(ctx context.Context, mgr ctrl.Manager) error { | ||
return mgr.GetFieldIndexer().IndexField(ctx, &redpandav1alpha2.User{}, userClusterIndex, indexUserCluster) | ||
func registerClusterSourceIndex[T client.Object, U clientList[T]](ctx context.Context, mgr ctrl.Manager, name string, o T, l U) (handler.EventHandler, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: For me the o
and l
is hard to read. Maybe the following suggestion could make it more readable 👇
func registerClusterSourceIndex[T client.Object, U clientList[T]](ctx context.Context, mgr ctrl.Manager, name string, o T, l U) (handler.EventHandler, error) { | |
func registerClusterSourceIndex[T client.Object, U clientList[T]](ctx context.Context, mgr ctrl.Manager, name string, obj T, list U) (handler.EventHandler, error) { |
return nil | ||
} | ||
return requests | ||
}) | ||
} | ||
|
||
func clusterForHelmManagedObject(o client.Object) (types.NamespacedName, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR
role := labels["app.kubernetes.io/name"]
if !slices.Contains([]string{"redpanda", "console"}, role) {
return types.NamespacedName{}, false
}
Should include connectors too.
// Every 5 minutes try and check to make sure no manual modifications | ||
// happened on the resource synced to the cluster and attempt to correct | ||
// any drift. | ||
Complete(controller.PeriodicallyReconcile(5 * time.Minute)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Maybe make the period configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
_ = testEnv.Stop() | ||
}) | ||
|
||
container, err := redpanda.Run(ctx, "docker.redpanda.com/redpandadata/redpanda:v23.2.8", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: As a follow up we should bump this version in all places in our test suite
Going to go ahead and merge and then I'll address some of the documentation in a follow-up. |
This tries to extract the core of the Users controller into a generic controller implementation that wraps a
ResourceReconciler
interface that controls the guts of the three operations that we pretty much need to do for every cluster-based resource:Hence the interface looks like this:
Most of the sync/delete logic can either then be inlined into the controllers, or, ideally wrapped in a "high-level" client/synchronizer (see the ACL synchronization code) in the
clients
package.I'm intending on using this for CRD-defined schemas, and if y'all like the structure, would want to take a pass at refactoring
TopicReconciler
to use this as well for consistency.This allows us to follow a really simple pattern for any additional CRDs. Basically anything that we're controlling via CRD and putting in a cluster should:
ClusterSource
field.Doing so would give you: