-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the recreateOption to the object template #253
Add the recreateOption to the object template #253
Conversation
} else { | ||
removeFieldsForComparison(dryRunUpdatedObj) | ||
|
||
if reflect.DeepEqual(dryRunUpdatedObj.Object, existingObjectCopy.Object) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this log mismatch message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if reflect.DeepEqual(dryRunUpdatedObj.Object, existingObjectCopy.Object)
then the log message is "A mismatch was detected but a dry run update didn't make any changes. Assuming the object " + "is compliant.",
is this intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is existing code but it was moved in this PR.
This code is detecting the case where the config-policy-controller thought there was a difference but the dry run update request showed that it was not different after all. This can happen when empty values are not shown in the API output but are set in the policy.
/hold for reviews |
} | ||
|
||
if time.Since(start) > time.Second*10 { | ||
message = fmt.Sprintf( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this for loop, it doesn't delete the obj. so the message should be changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message is correct because this only happens if there is an error and the error is because the object still exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly I'm wondering if the loop to retry the Create
call is necessary.
api/v1/configurationpolicy_types.go
Outdated
// RecreateOption describes the condition when to delete and recreate an object when an update is required. IfRequired | ||
// will recreate the object when updating an immutable field. Always will always recreate the object if a mismatch is | ||
// detected. RecreateOption has no effect when the remediationAction is inform. IfRequired has no effect on clusters | ||
// without dry run update support. Default is None. | ||
// +kubebuilder:validation:Enum=None;IfRequired;Always | ||
type RecreateOption string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of moving this docstring to inside the ObjectTemplate
struct? I don't think it makes a difference for the CRD, but I think it will appear more often via CodeLens when it's defined on a struct field, as opposed to the type.
Also, are you opposed to //+kubebuilder:default=None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. The docstring works in the suggested location.
I'll go ahead and do //+kubebuilder:default=None
but we did this in pruneObjectBehavior and it led to unintended consequences such as tests failing that check the whole ConfigurationPolicy and the template-sync needing to change how it determines if the ConfigurationPolicy on the cluster matches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does template-sync handle it nicely now, or do you think there will be more changes needed? I thought it had been working nicely for some OperatorPolicy fields like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the code that needs updating but it's not a big deal. I have the local changes queued up. Just need to run the tests once this merges:
https://github.com/open-cluster-management-io/governance-policy-framework-addon/blob/f299b7020823b3a09ab872ea4731080b42ff09d8/controllers/templatesync/template_sync.go#L959-L967
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading that makes me think that template-sync might be thinking it needs to do more updates to OperatorPolicy than it does. I thought I had checked and it wasn't stuck in a loop at least, but I'll need to investigate more
|
||
start := time.Now() | ||
|
||
for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried about policy latency (if that's the right term) with this loop.
Say I have x
number of these recreate
policies, but the objects they work with have finalizers. Then I think each ends up waiting 10 seconds every config-policy-controller evaluation loop. There is some concurrency, c
, (by default 2 goroutines I think), but it means that the loop takes a minimum of floor(x/c)*10
seconds. They could degrade the performance of the other policies in the cluster, since they would have to wait that long between evaluations.
What happens if there isn't a loop, can it just try the Create
immediately after the Delete
call returns, and if it fails just get it on the next evaluation? I think the shouldEvaluatePolicy
logic could check for this to ensure it keeps getting evaluated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JustinKuli I'll have it try three times instead and then give up. I'm worried about the deletion just taking a couple of seconds but then it leading to a long time before the object is recreated if the config-policy-controller is saturated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shouldEvaluatePolicy
logic already immediately schedules a policy with the timeout status message.
When a user needs to update an object's immutable fields, the object must be replaced. The user may opt-in to setting recreateOption on an object template to "IfRequired" and "Always". When set to "IfRequired", normal updates proceed when possible. Relates: https://issues.redhat.com/browse/ACM-11846 Signed-off-by: mprahl <mprahl@users.noreply.github.com>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JustinKuli, mprahl The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/unhold |
0bb6329
into
open-cluster-management-io:main
When a user needs to update an object's immutable fields, the object must be replaced. The user may opt-in to setting recreateOption on an object template to "IfRequired" and "Always". When set to "IfRequired", normal updates proceed when possible.
Relates:
https://issues.redhat.com/browse/ACM-11846