Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the recreateOption to the object template #253

Merged

Conversation

mprahl
Copy link
Member

@mprahl mprahl commented May 23, 2024

When a user needs to update an object's immutable fields, the object must be replaced. The user may opt-in to setting recreateOption on an object template to "IfRequired" and "Always". When set to "IfRequired", normal updates proceed when possible.

Relates:
https://issues.redhat.com/browse/ACM-11846

} else {
removeFieldsForComparison(dryRunUpdatedObj)

if reflect.DeepEqual(dryRunUpdatedObj.Object, existingObjectCopy.Object) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this log mismatch message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if reflect.DeepEqual(dryRunUpdatedObj.Object, existingObjectCopy.Object) then the log message is "A mismatch was detected but a dry run update didn't make any changes. Assuming the object " + "is compliant.", is this intended?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is existing code but it was moved in this PR.

This code is detecting the case where the config-policy-controller thought there was a difference but the dry run update request showed that it was not different after all. This can happen when empty values are not shown in the API output but are set in the policy.

@mprahl
Copy link
Member Author

mprahl commented May 28, 2024

/hold for reviews

}

if time.Since(start) > time.Second*10 {
message = fmt.Sprintf(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this for loop, it doesn't delete the obj. so the message should be changed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message is correct because this only happens if there is an error and the error is because the object still exists.

Copy link
Member

@JustinKuli JustinKuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly I'm wondering if the loop to retry the Create call is necessary.

Comment on lines 345 to 352
// RecreateOption describes the condition when to delete and recreate an object when an update is required. IfRequired
// will recreate the object when updating an immutable field. Always will always recreate the object if a mismatch is
// detected. RecreateOption has no effect when the remediationAction is inform. IfRequired has no effect on clusters
// without dry run update support. Default is None.
// +kubebuilder:validation:Enum=None;IfRequired;Always
type RecreateOption string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of moving this docstring to inside the ObjectTemplate struct? I don't think it makes a difference for the CRD, but I think it will appear more often via CodeLens when it's defined on a struct field, as opposed to the type.

Also, are you opposed to //+kubebuilder:default=None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. The docstring works in the suggested location.

I'll go ahead and do //+kubebuilder:default=None but we did this in pruneObjectBehavior and it led to unintended consequences such as tests failing that check the whole ConfigurationPolicy and the template-sync needing to change how it determines if the ConfigurationPolicy on the cluster matches.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does template-sync handle it nicely now, or do you think there will be more changes needed? I thought it had been working nicely for some OperatorPolicy fields like this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the code that needs updating but it's not a big deal. I have the local changes queued up. Just need to run the tests once this merges:
https://github.com/open-cluster-management-io/governance-policy-framework-addon/blob/f299b7020823b3a09ab872ea4731080b42ff09d8/controllers/templatesync/template_sync.go#L959-L967

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading that makes me think that template-sync might be thinking it needs to do more updates to OperatorPolicy than it does. I thought I had checked and it wasn't stuck in a loop at least, but I'll need to investigate more

controllers/configurationpolicy_controller.go Show resolved Hide resolved

start := time.Now()

for {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about policy latency (if that's the right term) with this loop.

Say I have x number of these recreate policies, but the objects they work with have finalizers. Then I think each ends up waiting 10 seconds every config-policy-controller evaluation loop. There is some concurrency, c, (by default 2 goroutines I think), but it means that the loop takes a minimum of floor(x/c)*10 seconds. They could degrade the performance of the other policies in the cluster, since they would have to wait that long between evaluations.

What happens if there isn't a loop, can it just try the Create immediately after the Delete call returns, and if it fails just get it on the next evaluation? I think the shouldEvaluatePolicy logic could check for this to ensure it keeps getting evaluated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JustinKuli I'll have it try three times instead and then give up. I'm worried about the deletion just taking a couple of seconds but then it leading to a long time before the object is recreated if the config-policy-controller is saturated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shouldEvaluatePolicy logic already immediately schedules a policy with the timeout status message.

api/v1/configurationpolicy_types.go Outdated Show resolved Hide resolved
controllers/configurationpolicy_controller.go Outdated Show resolved Hide resolved
controllers/configurationpolicy_controller.go Show resolved Hide resolved
When a user needs to update an object's immutable fields, the object
must be replaced. The user may opt-in to setting recreateOption on an
object template to "IfRequired" and "Always". When set to "IfRequired",
normal updates proceed when possible.

Relates:
https://issues.redhat.com/browse/ACM-11846

Signed-off-by: mprahl <mprahl@users.noreply.github.com>
Copy link

openshift-ci bot commented May 28, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JustinKuli, mprahl

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mprahl
Copy link
Member Author

mprahl commented May 28, 2024

/unhold

@openshift-merge-bot openshift-merge-bot bot merged commit 0bb6329 into open-cluster-management-io:main May 28, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants