-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/metrics: Add Deployment ownerReference to metrics Service object #1037
pkg/metrics: Add Deployment ownerReference to metrics Service object #1037
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some concerns about corner cases, but overall approach LGTM
Re: the edge cases
I think that we probably just need to degrade gracefully IMO. |
@shawn-hurley Good points!
Right now we don't error out just log the errors coming from exposing the metrics port:
Instead erroring out if no |
If it is just a pod, could we not just create a service to only select that pod, and set the owner ref to that pod? I would just like the metrics to always be exposed by a service for all the ways that one could run an operator but if the above is not possible then this makes sense to me. |
It could also be a custom resource that owns the Pod (or the Pod's parent, grandparent, etc.), so this gets a bit tricky. I haven't totally thought this through, but maybe we go up the owner chain looking for specific types (Deployments, DaemonSets, StatefulSets). If we find one of those, we stop and return that, but if not, we traverse until we find an object that doesn't have an owner reference and use that instead? If the pod itself doesn't have an owner reference, I agree with Shawn that we could create the service to select and be owned by just the Pod. |
Yes, agree as well.
"If we find one of those, we stop and return that" that part should be easy enough, already have a POC implemented locally, as for the other part guess lots of use of the |
Could we just use a Replica Set? that seems like the controller that actually manages the pods that the Deployments and DaemonSets and StatefulSets create correct? I think that a Custom one "should" use that as well. If the pod is not managed by a ReplicaSet, maybe just degrade to the pod? This would make the logic simpler and I think more understandable for a user who may be seeing some odd behavior and cover the 95% use cases? |
@shawn-hurley I'm pretty sure DaemonSets and StatefulSets manage Pods directly with no ReplicaSet involved. Another problem with ReplicaSets (when used with Deployments at least) is that the Deployment scales them down to 0, but does not always delete them when a new rollout occurs. This is based on the Deployment's I'm not sure how the Deployment control loop works, but I can envision a problem where the ReplicaSet that owns the Service is 3 or 4 revisions old. If it gets cleaned up by the Deployment controller after the current Pods for the current ReplicaSet have started, the Service will get garbage-collected and no new Pods would start to re-create it. |
This makes sense; I was hoping because I think we need to figure out how to handle |
I think tracing up until a known last k8s native Object is found, e.g. I don't like defaulting to a This would only happen if a user is deploying the operator using a CR, which I am not sure how often this would happen. |
@lilic that sounds reasonable to me. @shawn-hurley to solve the problem with |
297f85f
to
5489c40
Compare
@shawn-hurley @joelanford this should be ready for another look. Thanks! |
} | ||
|
||
// findFinalOwnerRef tries to locate the final controller/owner based on the owner reference provided. | ||
func findFinalOwnerRef(ctx context.Context, client crclient.Client, ns string, ownerRef *metav1.OwnerReference) (*metav1.OwnerReference, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really think that this function has to handle DeploymentConfigs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but I think we should try to do it as generically as possible. I wonder if we could consolidate the switch statement cases for known types to be:
case "ReplicaSet", "ReplicationController":
rsrc := &unstructured.Unstructured{}
key := crclient.ObjectKey{Namespace: ns, Name: ownerRef.Name}
if err := client.Get(ctx, key, rsrc); err != nil {
return nil, err
}
rsrcOwner := metav1.GetControllerOf(rsrc)
// If we find an owner for the RS/RC, return that.
// Otherwise, just return the ownerRef directly.
if rsrcOwner != nil {
return rsrcOwner, nil
}
return ownerRef, nil
case "DaemonSet", "StatefulSet", "Job":
// I think we can just return this directly.
// Any reason to fetch the object first?
return ownerRef, nil
I think this would cover Deployments and DeploymentConfigs generically, and it would be a bit more future-proof since we're being less explicit about some of the types we care about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @joelanford. A generic approach will reduce code significantly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Any reason to fetch the object first?
return ownerRef, nil
Agreed! 🤦♀️ Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would then return the RS/RC as the owner directly, above we said we want to avoid doing that? Did I misunderstand your suggestion above maybe #1037 (comment)
// If we find an owner for the RS/RC, return that.
// Otherwise, just return the ownerRef directly.
if rsrcOwner != nil {
return rsrcOwner, nil
}
The reason was to avoid returning RS/RC as the owner and to avoid returning non k8s controllers owners as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lilic I think if the ReplicaSet
or ReplicationController
has an owner we should return its owner, regardless of the Kind of the owner. That would cover the cases for Deployment
and DeploymentConfig
implicitly and it would mean other custom or future native K8s or Openshift types that own and manage ReplicaSets
and ReplicationControllers
would be supported without any code changes.
If the ReplicaSet
or ReplicationController
does not have an owner, I think it makes the most sense to use the ReplicaSet
or ReplicationController
as the owner directly, since it would be the highest level Kind we found from our traversal of the owner references from the pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM just depends on if we all agree on supporting DeploymentConfigs
|
||
pod := &corev1.Pod{ | ||
TypeMeta: metav1.TypeMeta{ | ||
APIVersion: "v1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This suggestion isn't strictly necessary but using corev1.SchemaGroupVersion.Version
is a better way of setting this value IMO, as it tracks dependency version changes. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to set the TypeMeta for a concrete type when using the controller-runtime client? I thought this was only necessary with unstructured.Unstructured
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea why this was done this way. I can remove it, but it comes originally and is used still in the leader.go
myOwnerRef
function, so not sure if they need to set this maybe. I would leave it, as it doesn't hurt. https://github.com/operator-framework/operator-sdk/pull/1037/files#diff-a0eaeead6981b3b2f753c1ecbba711ccL145
3047f78
to
3860b22
Compare
so it can be used outside of leader package.
This allows the Service cleanup to happen when deployment is deleted.
Tracing up until a known last k8s native controller is found, e.g. Deployment or StatefulSet or DeamonSet or if none are found a Pod even if the last known is a ReplicaSet.
3860b22
to
23b0946
Compare
Also added updating the |
When Service already exists but the OwnerRef might change, we must make sure to the propagate those changes to the Service object by updating the object.
e4c3198
to
814be07
Compare
@joelanford As the suggestions were already very close to this, I decided to implement a fully generic solution, which covers all the cases we talked about: core Kubernetes objects, OpenShift objects (like DeploymentConfigs), and potentially any custom controllers. We simply walk up the tree of The question we had earlier about I believe this solves all the problems/concerns raised before and is even simpler code. Please have another look, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good to me. I like the completely generic approach for finding the root owner reference 👍
A couple of nits and questions.
|
||
// Get Owner that the Pod belongs to | ||
ownerRef := metav1.GetControllerOf(pod) | ||
finalOwnerRef, err := findFinalOwnerRef(ctx, client, ns, ownerRef) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We could simplify things further if we declare a new gvkObject
interface and pass the object itself into findFinalOwnerRef
:
func getPodOwnerRef(ctx context.Context, client crclient.Client, ns string) (*metav1.OwnerReference, error) {
// Get current Pod the operator is running in
pod, err := k8sutil.GetPod(ctx, client, ns)
if err != nil {
return nil, err
}
return findFinalOwnerRef(ctx, client, ns, pod)
}
type gvkObject interface {
metav1.Object
GroupVersionKind() schema.GroupVersionKind
}
func findFinalOwnerRef(ctx context.Context, client crclient.Client, ns string, obj gvkObject) (*metav1.OwnerReference, error) {
ownerRef := metav1.GetControllerOf(obj)
if ownerRef == nil {
log.V(1).Info("Pods owner found", "Kind", ownerRef.Kind, "Name", ownerRef.Name, "Namespace", ns)
return metav1.NewControllerRef(obj, obj.GroupVersionKind()), nil
}
ownerObj := &unstructured.Unstructured{}
ownerObj.SetAPIVersion(ownerRef.APIVersion)
ownerObj.SetKind(ownerRef.Kind)
err := client.Get(ctx, types.NamespacedName{Namespace: ns, Name: ownerRef.Name}, ownerObj)
if err != nil {
return nil, err
}
return findFinalOwnerRef(ctx, client, ns, ownerObj)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to leave the function as is, as it has a clear entry point and you can see from the declaration what it does.
|
||
pod := &corev1.Pod{ | ||
TypeMeta: metav1.TypeMeta{ | ||
APIVersion: "v1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to set the TypeMeta for a concrete type when using the controller-runtime client? I thought this was only necessary with unstructured.Unstructured
.
@@ -97,11 +103,14 @@ func initOperatorService(port int32, portName string) (*v1.Service, error) { | |||
if err != nil { | |||
return nil, err | |||
} | |||
|
|||
label := map[string]string{"name": operatorName} | |||
|
|||
service := &v1.Service{ | |||
ObjectMeta: metav1.ObjectMeta{ | |||
Name: operatorName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. So I'm wondering if your original instinct to give the operator developer a chance to tweak the service before submitting it to the cluster was a good one.
As is, if an operator developer wants to expose any other ports via a service, they will either have to create a separate service with a different name, or won't be able to enable metrics.
Another option could be to have ExposeMetricsPort
take a service as an input parameter, and if its nil initialize it with the metrics port. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree, my only fear is that it will break the API. Can open an issue to discuss this, as it's not strictly related to this PR anyways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done #1107
Co-Authored-By: LiliC <cosiclili@gmail.com>
Co-Authored-By: LiliC <cosiclili@gmail.com>
34954b3
to
b2ec0f3
Compare
Think this is ready for another look @joelanford, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Description of the change:
Add Deployment ownerReference to metrics Service object, by getting the Pod the operator is currently running in and via ReplicaSet getting the Deployment Object.
Motivation for the change:
When operator Deployment is deleted currently the metrics Service operator creates was not deleted.
Closes #459