-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle preexisting operator #192
Handle preexisting operator #192
Conversation
095cc77
to
2d800eb
Compare
/cc @mprahl |
@@ -1741,7 +1741,7 @@ func (r *ConfigurationPolicyReconciler) handleSingleObj( | |||
|
|||
if exists && !obj.shouldExist { | |||
// it is a mustnothave but it exist, so it must be deleted | |||
if strings.EqualFold(string(remediation), string(policyv1.Enforce)) { | |||
if remediation.IsEnforce() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JustinKuli is the plan to keep mustnothave to remove what the operator created but nothing beyond that or is the plant to still remove mustnothave until the whole uninstall logic is added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan for OperatorPolicy is to only support musthave in the initial release. I believe I removed all logic for it in the operator-policy-controller code, although there could still be remnants somewhere.
(This comment is attached to config-policy code, which of course will not be changed here)
|
||
return reconcile.Result{}, err | ||
foundOpGroups, err := r.DynamicWatcher.List( | ||
watcher, operatorGroupGVK, desiredOpGroup.Namespace, labels.Everything()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is labels.Everything()
basically no selector? If so, you can optionally provide nil
instead for the label selector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I felt like labels.Everything()
was more descriptive. I find I often pause and check what a nil
selector does, because an empty selector would match nothing, but nil != empty
in this case.
exists := subExists && ogExists | ||
shouldExist := strings.EqualFold(string(policy.Spec.ComplianceType), string(policyv1.MustHave)) | ||
if policy.Spec.RemediationAction.IsEnforce() { | ||
err = r.Create(ctx, desiredOpGroup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you verify that this creates the object with strict field validation? I couldn't find anything in the controller-runtime code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strict field validation would return an error if there were any unknown or duplicated fields. (Any request will return an error if the provided fields fail CRD validation, eg if a required field is missing.) Since this request is using an "actual" operatorv1.OperatorGroup
typed object for the request, it won't have any unknown or duplicate fields.
Extra context for other reviewers: we have been discussing what to do to handle version skew around the "embedded" types like OperatorGroup and Subscription. The OperatorPolicy CRD now uses a RawExtension to possibly help us handle the situation in the future. The kind of change we will need to worry about is if a field is added to the type we import, and a policy using that field is deployed on a cluster using an older CRD. In that case, I think the field would be dropped by the API server without an error.
Neither the OperatorGroup type not the Subscription type have had any recent changes that would cause version skews we need to worry about right now.
Instead of field validation on the API request, I think we should do the same kind of validation when we unmarshal the RawExtension into the type. (I think I missed that in this PR.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JustinKuli the oldest version we need to support is OCP 4.12 (https://access.redhat.com/articles/7027073). This seems to map to OLM API v0.17.3:
https://github.com/openshift/operator-framework-olm/blob/release-4.12/go.mod#L16
If you go to this link to compare v0.17.3 to the latest version, you can see that a couple of minor fields were added to the subscription:
operator-framework/api@v0.17.3...v0.21.0
It's easiest to check the crds/operators.coreos.com_subscriptions.yaml file.
So this would mean that using the latest github.com/operator-framework/api is fine but you need to error if those new fields were provided on OCP 4.12.
if policy.Spec.RemediationAction.IsEnforce() { | ||
desiredOpGroup.ResourceVersion = opGroup.GetResourceVersion() | ||
|
||
err := r.Update(ctx, desiredOpGroup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like "must only have" since I don't see any checking to see if the desired operator group is a subset of the existing operator group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, I missed considering that.
if policy.Spec.RemediationAction.IsEnforce() { | ||
desiredOpGroup.ResourceVersion = opGroup.GetResourceVersion() | ||
|
||
err := r.Update(ctx, desiredOpGroup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above, please ensure field validation is strict on the update.
// Future implementation: Verify the specs of the object matches the one on the cluster | ||
OpLog.Info("The object already exists. Checking fields to verify matching specs") | ||
if idx == -1 || existingCond.Status == metav1.ConditionFalse { | ||
err := r.updateStatus(ctx, policy, matchesCond("OperatorGroup"), matchedObj(&opGroup)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to include in a message that there is a preexisting operator group that was not verified since it was not specified in the policy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can adjust the condition for that easily (and then it will be reflected in the compliance event as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created a condition for this case with the message the policy does not specify an OperatorGroup but one already exists in the namespace - assuming that OperatorGroup is correct
r.setCompliance(ctx, policy, policyv1.NonCompliant) | ||
} | ||
subNamespace, ok := sub["namespace"].(string) | ||
if !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you also want to check the length too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure that it is a valid kubernetes namespace identifier? Yeah, that seems reasonable.
|
||
return reconcile.Result{}, err | ||
} | ||
opGroup := make(map[string]interface{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as for the subscription. I recommend using unstructured. It's the same amount of code and you don't have to type assert.
// Fallback to the Subscription namespace if the OperatorGroup namespace is not specified in the policy. | ||
ogNamespace := subNamespace | ||
|
||
if specifiedNS, ok := opGroup["namespace"].(string); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if specifiedNS, ok := opGroup["namespace"].(string); ok { | |
if specifiedNS, ok := opGroup["namespace"].(string); ok || specifiedNS == "" { |
|
||
spec := new(operatorv1.OperatorGroupSpec) | ||
|
||
err = json.Unmarshal(policy.Spec.OperatorGroup.Raw, spec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you follow my other comments, you can use unstructured for the conversion but it does create an extra json.Marshal that your implementation does not.
err = k8sruntime.DefaultUnstructuredConverter.FromUnstructured(opgroup.UnstructuredContent(), &spec)
if err != nil { | ||
OpLog.Info(fmt.Sprintf("Failed to update policy status: %s", err)) | ||
return fmt.Errorf("error listing OperatorGroups: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return fmt.Errorf("error listing OperatorGroups: %w", err) | |
return fmt.Errorf("error getting the subscription: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good eye.
} | ||
|
||
if policy.Spec.RemediationAction.IsEnforce() { | ||
err := r.Create(ctx, desiredSub) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about field validation.
return fmt.Errorf("error updating the status for a created Subscription: %w", err) | ||
} | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: You could return nil at the end of the if block to prevent the indentation of else.
return fmt.Errorf("error converting desired Subscription to an Unstructured: %w", err) | ||
} | ||
|
||
if reflect.DeepEqual(desiredUnstruct["spec"], foundSub.Object["spec"]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a "must only have" check. Is the intention to only support "must only have"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't my intention. It seems like I will need to bring in the heavy machinery from config-policy to handle this.
return fmt.Errorf("error converting desired Subscription to an Unstructured: %w", err) | ||
} | ||
|
||
if reflect.DeepEqual(desiredUnstruct["spec"], foundSub.Object["spec"]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, are there any omitempty
that the user could specify in desiredSub
but not be returned in foundSub
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are dealing with unstructured here, I think you could move out some of the config-policy-controller code from checkAndUpdateResource
to do the comparison for both must have and must only have.
// For now, (conditionally) mark it as compliant | ||
idx, existingCond := policy.Status.GetCondition(subConditionType) | ||
|
||
if idx == -1 || existingCond.Status == metav1.ConditionFalse { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does updateStatus already check this? If so, you can remove this and I believe there's another place where this comment is relevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is slightly different than what updateStatus
checks. The code comment in the OperatorGroup section explains it more, and I should copy it down here.
This check prevents a previous "created" message from being overwritten with a "matches" message. My feeling was that the created message is more context and is generally more useful, but now I'm thinking that it might be confusing this way, because that's not how the config-policy status and events do it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed the if
preventing the “matches” condition from being added if the previous condition was “created”. This should match how the config-policy works. The tests will need some changes to check this.
if policy.Spec.RemediationAction.IsEnforce() { | ||
desiredSub.ResourceVersion = foundSub.GetResourceVersion() | ||
|
||
err := r.Update(ctx, desiredSub) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about field validation.
// | ||
// Note that only changing the related objects will not emit a new compliance event, but will update | ||
// the status. | ||
func (r *OperatorPolicyReconciler) updateStatus( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the controller should send one compliance event per compliance change like the config-policy-controller to prevent so many compliance changes on the policy.
The config-policy-controller does it with "event batches" at the end of handleObjectTemplates
. I imagine it'd be a lot simpler in the operator policy case since there is no namespace selector and not multiple object templates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of status updates and compliance events is a valid critique. I wanted to send the updates and event immediately, because I felt it simplified error handling and limited the amount of information that needed to be returned by other functions.
I find what the config policy does with event batches very difficult to reason through... Can you carefully define "one compliance event per compliance change"?
Imagine a fresh enforced policy. With this implementation we get 4 status updates and events that look kinda like this:
- NonCompliant; the operator group is missing (subscription details omitted because they are still unknown)
- NonCompliant; the operator group was created (subscription details omitted because they are still unknown)
- NonCompilant; the operator group was created and the subscription is missing
- Compliant; the operator group was created and the subscription was created
Are you asking for just two events like these?
- NonCompliant; the operator group is missing, and the subscription is missing,
- Compliant; the operator group was created, and the subscription was created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, exactly. If I have time today, I can try to make the event batching logic more generic since I wrote that code.
The gist of the approach is that each handle
method would return a list of events. So say operator group returned three events (two noncompliant) and subscription returned two event (one noncompliant), you would end up with three events in this way:
- noncompliant operator group + existing state
- noncompliant operator group and noncompliant subscription
- compliant operator group and compliant subscription
|
||
OpLog.Info("Creating the subscription", "kind", subscription.Kind, | ||
"namespace", subscription.Namespace) | ||
err := json.Unmarshal(policy.Spec.Subscription.Raw, &sub) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: Similar comments as in buildOperatorGroup
for using unstructured.
|
||
err := r.updatePolicyStatus(ctx, policy) | ||
err = json.Unmarshal(policy.Spec.Subscription.Raw, spec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: Like I mentioned in buildOperatorGroup
, are you able to just use this immediately rather than go the map route and then the typed route?
|
||
err := r.updatePolicyStatus(ctx, policy) | ||
err = json.Unmarshal(policy.Spec.Subscription.Raw, spec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will catch invalid/unknown fields. It looks like the unstructured package's converter to typed has some unknown field validation.
patches: | ||
- path: ns-selector-validation.json | ||
# The OperatorPolicy currently only supports "musthave" | ||
- path: allowed-compliance-types.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good workaround.
ComplianceType and RemediationActions were previously checked inline with string comparisons. This is more easily readable, and potentially better able to be changed in the future. Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
It was decided to use RawExtension for the "embedded" Subscription and OperatorGroup, this can better allow for possible version skew in the future. The CRD validation would be limited inside the policy framework anyway because in a Policy, the raw content has no validation. This also removes some fields which are not expected to be functional in the initial release. The `build*` functions have been updated to use the RawExtension, and may have been adjusted for some other incoming changes. Some other temporary changes were made to resolve compilation errors and unit tests, but the OperatorPolicy "e2e" tests were not considered. Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
The previous version of this function was only applicable to a Policy owning a ConfigurationPolicy. Now it is more broadly applicable. Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
2d800eb
to
a8fc211
Compare
Work Item 1: As pointed out, some of the logic currently works more like "mustonlyhave" than "musthave", and with some of the type/untyped changes more recently, defaulted fields need to be considered more carefully. I will bring in some machinery from config-policy to help with this, and I expect it will also bring options for strict field validation. The Work Item 2: fewer compliance events / status update API calls. @mprahl in your opinion, is this required for this implementation, or is it something we could look at improving in the future? |
I think work item 2 can be delayed. It shouldn't change QE's tests so it should be safe to handle early next sprint and get it in by the release. I'd be happy to do it if you create a ticket. |
097b7c8
to
3546ee8
Compare
// Type is found. | ||
func (status OperatorPolicyStatus) GetCondition(condType string) (int, metav1.Condition) { | ||
for i, cond := range status.Conditions { | ||
if cond.Type == condType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if multiple conditions match? is it okay to return only the first match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok to just return the first match. Ideally we would have some way of enforcing that there are not multiple conditions of the same type (the actual k8s APIs can use listType=map like this, but that didn't seem to be available for CRDs). We'll just need to be careful not to accidentally put the same condition twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to learn Thanks
The general plan is to continue adding `handle*` functions for the other resources that an OperatorPolicy needs to examine. Each handle function will update the policy's status (with conditions and relatedObjects), and possibly emit compliance events. This may cause more compliance events "than usual" compared to other controllers, but I think the separation of concerns will help each function be more maintainable. My hope is that some of the `*Cond` and `*Obj` functions in the status section can be reused in the future handlers. There was already overlap between the Subscription and OperatorGroup, so this seemed reasonable. To check if the Subscription/OperatorGroup on the cluster matches what is desired by the policy, a function `handleKeys` was created that can be used for OperatorPolicies and ConfigurationPolicies. Refs: - https://issues.redhat.com/browse/ACM-9283 Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
3546ee8
to
c358a12
Compare
) | ||
}) | ||
It("Should match when the OperatorGroup is manually corrected", func() { | ||
utils.Kubectl("delete", "operatorgroup", incorrectOpGroupName, "-n", opPolTestNS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OperatorGroup is specified and another OperatorGroup also exists, then what happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add one more case(it)
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JustinKuli, yiraeChristineKim The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
56ace86
into
open-cluster-management-io:main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking a look at the new PR also, but I wanted to give this one a look first. 🙂
func (ra RemediationAction) IsInform() bool { | ||
return strings.EqualFold(string(ra), string(Inform)) | ||
} | ||
|
||
func (ra RemediationAction) IsEnforce() bool { | ||
return strings.EqualFold(string(ra), string(Enforce)) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh, these are nice. 😄
mergedAnnotations, _, _ := unstructured.NestedStringMap(mdMap, "annotations") | ||
mergedLabels, _, _ := unstructured.NestedStringMap(mdMap, "labels") | ||
if updateNeeded { | ||
mismatchLog := "Detected value mismatch" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the key was removed from the log, which I think is fine since I was debating the value of providing a root key anyway. The ideal in my mind would to be able to accumulate the JSON path through handleKeys()/handleSingleKey()
and log that, but that's just my own musing.
For https://issues.redhat.com/browse/ACM-9283.
In addition to handling more situations around whether the Subscription or OperatorGroup already exist, this adds condition reporting, related objects, and compliance events.
This PR also slims the CRD to what we believe can be functional in an initial release.