Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle preexisting operator #192

Conversation

JustinKuli
Copy link
Member

For https://issues.redhat.com/browse/ACM-9283.

In addition to handling more situations around whether the Subscription or OperatorGroup already exist, this adds condition reporting, related objects, and compliance events.

This PR also slims the CRD to what we believe can be functional in an initial release.

@mprahl
Copy link
Member

mprahl commented Jan 25, 2024

/cc @mprahl

@openshift-ci openshift-ci bot requested a review from mprahl January 25, 2024 22:07
@@ -1741,7 +1741,7 @@ func (r *ConfigurationPolicyReconciler) handleSingleObj(

if exists && !obj.shouldExist {
// it is a mustnothave but it exist, so it must be deleted
if strings.EqualFold(string(remediation), string(policyv1.Enforce)) {
if remediation.IsEnforce() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JustinKuli is the plan to keep mustnothave to remove what the operator created but nothing beyond that or is the plant to still remove mustnothave until the whole uninstall logic is added?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan for OperatorPolicy is to only support musthave in the initial release. I believe I removed all logic for it in the operator-policy-controller code, although there could still be remnants somewhere.

(This comment is attached to config-policy code, which of course will not be changed here)


return reconcile.Result{}, err
foundOpGroups, err := r.DynamicWatcher.List(
watcher, operatorGroupGVK, desiredOpGroup.Namespace, labels.Everything())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is labels.Everything() basically no selector? If so, you can optionally provide nil instead for the label selector.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I felt like labels.Everything() was more descriptive. I find I often pause and check what a nil selector does, because an empty selector would match nothing, but nil != empty in this case.

exists := subExists && ogExists
shouldExist := strings.EqualFold(string(policy.Spec.ComplianceType), string(policyv1.MustHave))
if policy.Spec.RemediationAction.IsEnforce() {
err = r.Create(ctx, desiredOpGroup)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you verify that this creates the object with strict field validation? I couldn't find anything in the controller-runtime code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strict field validation would return an error if there were any unknown or duplicated fields. (Any request will return an error if the provided fields fail CRD validation, eg if a required field is missing.) Since this request is using an "actual" operatorv1.OperatorGroup typed object for the request, it won't have any unknown or duplicate fields.

Extra context for other reviewers: we have been discussing what to do to handle version skew around the "embedded" types like OperatorGroup and Subscription. The OperatorPolicy CRD now uses a RawExtension to possibly help us handle the situation in the future. The kind of change we will need to worry about is if a field is added to the type we import, and a policy using that field is deployed on a cluster using an older CRD. In that case, I think the field would be dropped by the API server without an error.

Neither the OperatorGroup type not the Subscription type have had any recent changes that would cause version skews we need to worry about right now.

Instead of field validation on the API request, I think we should do the same kind of validation when we unmarshal the RawExtension into the type. (I think I missed that in this PR.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JustinKuli the oldest version we need to support is OCP 4.12 (https://access.redhat.com/articles/7027073). This seems to map to OLM API v0.17.3:
https://github.com/openshift/operator-framework-olm/blob/release-4.12/go.mod#L16

If you go to this link to compare v0.17.3 to the latest version, you can see that a couple of minor fields were added to the subscription:
operator-framework/api@v0.17.3...v0.21.0

It's easiest to check the crds/operators.coreos.com_subscriptions.yaml file.

So this would mean that using the latest github.com/operator-framework/api is fine but you need to error if those new fields were provided on OCP 4.12.

if policy.Spec.RemediationAction.IsEnforce() {
desiredOpGroup.ResourceVersion = opGroup.GetResourceVersion()

err := r.Update(ctx, desiredOpGroup)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like "must only have" since I don't see any checking to see if the desired operator group is a subset of the existing operator group.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, I missed considering that.

if policy.Spec.RemediationAction.IsEnforce() {
desiredOpGroup.ResourceVersion = opGroup.GetResourceVersion()

err := r.Update(ctx, desiredOpGroup)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above, please ensure field validation is strict on the update.

// Future implementation: Verify the specs of the object matches the one on the cluster
OpLog.Info("The object already exists. Checking fields to verify matching specs")
if idx == -1 || existingCond.Status == metav1.ConditionFalse {
err := r.updateStatus(ctx, policy, matchesCond("OperatorGroup"), matchedObj(&opGroup))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to include in a message that there is a preexisting operator group that was not verified since it was not specified in the policy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can adjust the condition for that easily (and then it will be reflected in the compliance event as well).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created a condition for this case with the message the policy does not specify an OperatorGroup but one already exists in the namespace - assuming that OperatorGroup is correct

r.setCompliance(ctx, policy, policyv1.NonCompliant)
}
subNamespace, ok := sub["namespace"].(string)
if !ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you also want to check the length too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure that it is a valid kubernetes namespace identifier? Yeah, that seems reasonable.


return reconcile.Result{}, err
}
opGroup := make(map[string]interface{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for the subscription. I recommend using unstructured. It's the same amount of code and you don't have to type assert.

// Fallback to the Subscription namespace if the OperatorGroup namespace is not specified in the policy.
ogNamespace := subNamespace

if specifiedNS, ok := opGroup["namespace"].(string); ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if specifiedNS, ok := opGroup["namespace"].(string); ok {
if specifiedNS, ok := opGroup["namespace"].(string); ok || specifiedNS == "" {


spec := new(operatorv1.OperatorGroupSpec)

err = json.Unmarshal(policy.Spec.OperatorGroup.Raw, spec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you follow my other comments, you can use unstructured for the conversion but it does create an extra json.Marshal that your implementation does not.

err = k8sruntime.DefaultUnstructuredConverter.FromUnstructured(opgroup.UnstructuredContent(), &spec)

if err != nil {
OpLog.Info(fmt.Sprintf("Failed to update policy status: %s", err))
return fmt.Errorf("error listing OperatorGroups: %w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return fmt.Errorf("error listing OperatorGroups: %w", err)
return fmt.Errorf("error getting the subscription: %w", err)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good eye.

}

if policy.Spec.RemediationAction.IsEnforce() {
err := r.Create(ctx, desiredSub)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about field validation.

return fmt.Errorf("error updating the status for a created Subscription: %w", err)
}
}
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: You could return nil at the end of the if block to prevent the indentation of else.

return fmt.Errorf("error converting desired Subscription to an Unstructured: %w", err)
}

if reflect.DeepEqual(desiredUnstruct["spec"], foundSub.Object["spec"]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a "must only have" check. Is the intention to only support "must only have"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't my intention. It seems like I will need to bring in the heavy machinery from config-policy to handle this.

return fmt.Errorf("error converting desired Subscription to an Unstructured: %w", err)
}

if reflect.DeepEqual(desiredUnstruct["spec"], foundSub.Object["spec"]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, are there any omitempty that the user could specify in desiredSub but not be returned in foundSub?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are dealing with unstructured here, I think you could move out some of the config-policy-controller code from checkAndUpdateResource to do the comparison for both must have and must only have.

// For now, (conditionally) mark it as compliant
idx, existingCond := policy.Status.GetCondition(subConditionType)

if idx == -1 || existingCond.Status == metav1.ConditionFalse {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does updateStatus already check this? If so, you can remove this and I believe there's another place where this comment is relevant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly different than what updateStatus checks. The code comment in the OperatorGroup section explains it more, and I should copy it down here.

This check prevents a previous "created" message from being overwritten with a "matches" message. My feeling was that the created message is more context and is generally more useful, but now I'm thinking that it might be confusing this way, because that's not how the config-policy status and events do it...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the if preventing the “matches” condition from being added if the previous condition was “created”. This should match how the config-policy works. The tests will need some changes to check this.

if policy.Spec.RemediationAction.IsEnforce() {
desiredSub.ResourceVersion = foundSub.GetResourceVersion()

err := r.Update(ctx, desiredSub)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about field validation.

//
// Note that only changing the related objects will not emit a new compliance event, but will update
// the status.
func (r *OperatorPolicyReconciler) updateStatus(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the controller should send one compliance event per compliance change like the config-policy-controller to prevent so many compliance changes on the policy.

The config-policy-controller does it with "event batches" at the end of handleObjectTemplates. I imagine it'd be a lot simpler in the operator policy case since there is no namespace selector and not multiple object templates.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of status updates and compliance events is a valid critique. I wanted to send the updates and event immediately, because I felt it simplified error handling and limited the amount of information that needed to be returned by other functions.

I find what the config policy does with event batches very difficult to reason through... Can you carefully define "one compliance event per compliance change"?

Imagine a fresh enforced policy. With this implementation we get 4 status updates and events that look kinda like this:

  • NonCompliant; the operator group is missing (subscription details omitted because they are still unknown)
  • NonCompliant; the operator group was created (subscription details omitted because they are still unknown)
  • NonCompilant; the operator group was created and the subscription is missing
  • Compliant; the operator group was created and the subscription was created

Are you asking for just two events like these?

  • NonCompliant; the operator group is missing, and the subscription is missing,
  • Compliant; the operator group was created, and the subscription was created

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, exactly. If I have time today, I can try to make the event batching logic more generic since I wrote that code.

The gist of the approach is that each handle method would return a list of events. So say operator group returned three events (two noncompliant) and subscription returned two event (one noncompliant), you would end up with three events in this way:

  1. noncompliant operator group + existing state
  2. noncompliant operator group and noncompliant subscription
  3. compliant operator group and compliant subscription


OpLog.Info("Creating the subscription", "kind", subscription.Kind,
"namespace", subscription.Namespace)
err := json.Unmarshal(policy.Spec.Subscription.Raw, &sub)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: Similar comments as in buildOperatorGroup for using unstructured.


err := r.updatePolicyStatus(ctx, policy)
err = json.Unmarshal(policy.Spec.Subscription.Raw, spec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: Like I mentioned in buildOperatorGroup, are you able to just use this immediately rather than go the map route and then the typed route?


err := r.updatePolicyStatus(ctx, policy)
err = json.Unmarshal(policy.Spec.Subscription.Raw, spec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will catch invalid/unknown fields. It looks like the unstructured package's converter to typed has some unknown field validation.

patches:
- path: ns-selector-validation.json
# The OperatorPolicy currently only supports "musthave"
- path: allowed-compliance-types.json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good workaround.

ComplianceType and RemediationActions were previously checked inline
with string comparisons. This is more easily readable, and potentially
better able to be changed in the future.

Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
It was decided to use RawExtension for the "embedded" Subscription and
OperatorGroup, this can better allow for possible version skew in the
future. The CRD validation would be limited inside the policy framework
anyway because in a Policy, the raw content has no validation.

This also removes some fields which are not expected to be functional
in the initial release.

The `build*` functions have been updated to use the RawExtension, and
may have been adjusted for some other incoming changes. Some other
temporary changes were made to resolve compilation errors and unit
tests, but the OperatorPolicy "e2e" tests were not considered.

Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
The previous version of this function was only applicable to a Policy
owning a ConfigurationPolicy. Now it is more broadly applicable.

Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
@JustinKuli JustinKuli force-pushed the 9283-handle-preexisting-operator branch from 2d800eb to a8fc211 Compare January 26, 2024 19:36
@JustinKuli
Copy link
Member Author

JustinKuli commented Jan 26, 2024

I'm expecting the e2e tests to be failing. One of my first revisions to the PR was to add debugging to the test, but actually the way I tried to do that made it so the test could never register its failure 😆 (resolved as of 2024-01-30)

Work Item 1: As pointed out, some of the logic currently works more like "mustonlyhave" than "musthave", and with some of the type/untyped changes more recently, defaulted fields need to be considered more carefully. I will bring in some machinery from config-policy to help with this, and I expect it will also bring options for strict field validation. The build functions might return Unstructureds in the future.

Work Item 2: fewer compliance events / status update API calls. @mprahl in your opinion, is this required for this implementation, or is it something we could look at improving in the future?

@mprahl
Copy link
Member

mprahl commented Jan 26, 2024

I'm expecting the e2e tests to be failing. One of my first revisions to the PR was to add debugging to the test, but actually the way I tried to do that made it so the test could never register its failure 😆

Work Item 1: As pointed out, some of the logic currently works more like "mustonlyhave" than "musthave", and with some of the type/untyped changes more recently, defaulted fields need to be considered more carefully. I will bring in some machinery from config-policy to help with this, and I expect it will also bring options for strict field validation. The build functions might return Unstructureds in the future.

Work Item 2: fewer compliance events / status update API calls. @mprahl in your opinion, is this required for this implementation, or is it something we could look at improving in the future?

I think work item 2 can be delayed. It shouldn't change QE's tests so it should be safe to handle early next sprint and get it in by the release. I'd be happy to do it if you create a ticket.

@JustinKuli JustinKuli force-pushed the 9283-handle-preexisting-operator branch 4 times, most recently from 097b7c8 to 3546ee8 Compare January 30, 2024 19:30
// Type is found.
func (status OperatorPolicyStatus) GetCondition(condType string) (int, metav1.Condition) {
for i, cond := range status.Conditions {
if cond.Type == condType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if multiple conditions match? is it okay to return only the first match?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok to just return the first match. Ideally we would have some way of enforcing that there are not multiple conditions of the same type (the actual k8s APIs can use listType=map like this, but that didn't seem to be available for CRDs). We'll just need to be careful not to accidentally put the same condition twice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to learn Thanks

The general plan is to continue adding `handle*` functions for the other
resources that an OperatorPolicy needs to examine. Each handle function
will update the policy's status (with conditions and relatedObjects),
and possibly emit compliance events. This may cause more compliance
events "than usual" compared to other controllers, but I think the
separation of concerns will help each function be more maintainable.

My hope is that some of the `*Cond` and `*Obj` functions in the status
section can be reused in the future handlers. There was already overlap
between the Subscription and OperatorGroup, so this seemed reasonable.

To check if the Subscription/OperatorGroup on the cluster matches what
is desired by the policy, a function `handleKeys` was created that
can be used for OperatorPolicies and ConfigurationPolicies.

Refs:
 - https://issues.redhat.com/browse/ACM-9283

Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
@JustinKuli JustinKuli force-pushed the 9283-handle-preexisting-operator branch from 3546ee8 to c358a12 Compare January 30, 2024 22:03
)
})
It("Should match when the OperatorGroup is manually corrected", func() {
utils.Kubectl("delete", "operatorgroup", incorrectOpGroupName, "-n", opPolTestNS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OperatorGroup is specified and another OperatorGroup also exists, then what happens?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add one more case(it)

Copy link

openshift-ci bot commented Jan 31, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JustinKuli, yiraeChristineKim

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [JustinKuli,yiraeChristineKim]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@dhaiducek dhaiducek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a look at the new PR also, but I wanted to give this one a look first. 🙂

Comment on lines +45 to +52
func (ra RemediationAction) IsInform() bool {
return strings.EqualFold(string(ra), string(Inform))
}

func (ra RemediationAction) IsEnforce() bool {
return strings.EqualFold(string(ra), string(Enforce))
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh, these are nice. 😄

mergedAnnotations, _, _ := unstructured.NestedStringMap(mdMap, "annotations")
mergedLabels, _, _ := unstructured.NestedStringMap(mdMap, "labels")
if updateNeeded {
mismatchLog := "Detected value mismatch"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the key was removed from the log, which I think is fine since I was debating the value of providing a root key anyway. The ideal in my mind would to be able to accumulate the JSON path through handleKeys()/handleSingleKey() and log that, but that's just my own musing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants