Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/metrics: Add Deployment ownerReference to metrics Service object #1037

Merged
merged 11 commits into from
Feb 20, 2019
3 changes: 1 addition & 2 deletions commands/operator-sdk/cmd/test/cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@ import (
k8sInternal "github.com/operator-framework/operator-sdk/internal/util/k8sutil"
"github.com/operator-framework/operator-sdk/internal/util/projutil"
"github.com/operator-framework/operator-sdk/pkg/k8sutil"
"github.com/operator-framework/operator-sdk/pkg/leader"
"github.com/operator-framework/operator-sdk/pkg/scaffold"
"github.com/operator-framework/operator-sdk/pkg/scaffold/ansible"
"github.com/operator-framework/operator-sdk/pkg/test"
Expand Down Expand Up @@ -113,7 +112,7 @@ func testClusterFunc(cmd *cobra.Command, args []string) error {
Name: k8sutil.OperatorNameEnvVar,
Value: "test-operator",
}, {
Name: leader.PodNameEnv,
Name: k8sutil.PodNameEnvVar,
ValueFrom: &v1.EnvVarSource{FieldRef: &v1.ObjectFieldSelector{FieldPath: "metadata.name"}},
}},
}},
Expand Down
6 changes: 5 additions & 1 deletion pkg/k8sutil/constants.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@ const (
WatchNamespaceEnvVar = "WATCH_NAMESPACE"

// OperatorNameEnvVar is the constant for env variable OPERATOR_NAME
// wich is the name of the current operator
// which is the name of the current operator
OperatorNameEnvVar = "OPERATOR_NAME"

// PodNameEnvVar is the contast for env variable POD_NAME
// which is the name of the current pod.
PodNameEnvVar = "POD_NAME"
)
33 changes: 33 additions & 0 deletions pkg/k8sutil/k8sutil.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,16 @@
package k8sutil

import (
"context"
"fmt"
"io/ioutil"
"os"
"strings"

corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
discovery "k8s.io/client-go/discovery"
crclient "sigs.k8s.io/controller-runtime/pkg/client"
logf "sigs.k8s.io/controller-runtime/pkg/runtime/log"
)

Expand Down Expand Up @@ -83,3 +87,32 @@ func ResourceExists(dc discovery.DiscoveryInterface, apiGroupVersion, kind strin
}
return false, nil
}

// GetPod returns a Pod object that corresponds to the pod in which the code
// is currently running.
// It expects the environment variable POD_NAME to be set by the downwards API.
func GetPod(ctx context.Context, client crclient.Client, ns string) (*corev1.Pod, error) {
podName := os.Getenv(PodNameEnvVar)
if podName == "" {
return nil, fmt.Errorf("required env %s not set, please configure downward API", PodNameEnvVar)
}

log.V(1).Info("Found podname", "Pod.Name", podName)

pod := &corev1.Pod{
TypeMeta: metav1.TypeMeta{
APIVersion: "v1",
Copy link
Member

@estroz estroz Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggestion isn't strictly necessary but using corev1.SchemaGroupVersion.Version is a better way of setting this value IMO, as it tracks dependency version changes. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to set the TypeMeta for a concrete type when using the controller-runtime client? I thought this was only necessary with unstructured.Unstructured.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea why this was done this way. I can remove it, but it comes originally and is used still in the leader.go myOwnerRef function, so not sure if they need to set this maybe. I would leave it, as it doesn't hurt. https://github.com/operator-framework/operator-sdk/pull/1037/files#diff-a0eaeead6981b3b2f753c1ecbba711ccL145

Kind: "Pod",
},
}
key := crclient.ObjectKey{Namespace: ns, Name: podName}
err := client.Get(ctx, key, pod)
if err != nil {
log.Error(err, "Failed to get Pod", "Pod.Namespace", ns, "Pod.Name", podName)
return nil, err
}

log.V(1).Info("Found Pod", "Pod.Namespace", ns, "Pod.Name", pod.Name)

return pod, nil
}
22 changes: 1 addition & 21 deletions pkg/leader/leader.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,6 @@ package leader

import (
"context"
"fmt"
"os"
"time"

"github.com/operator-framework/operator-sdk/pkg/k8sutil"
Expand All @@ -37,8 +35,6 @@ var log = logf.Log.WithName("leader")
// attempts to become the leader.
const maxBackoffInterval = time.Second * 16

const PodNameEnv = "POD_NAME"

// Become ensures that the current pod is the leader within its namespace. If
// run outside a cluster, it will skip leader election and return nil. It
// continuously tries to create a ConfigMap with the provided name and the
Expand Down Expand Up @@ -143,24 +139,8 @@ func Become(ctx context.Context, lockName string) error {
// this code is currently running.
// It expects the environment variable POD_NAME to be set by the downwards API
func myOwnerRef(ctx context.Context, client crclient.Client, ns string) (*metav1.OwnerReference, error) {
podName := os.Getenv(PodNameEnv)
if podName == "" {
return nil, fmt.Errorf("required env %s not set, please configure downward API", PodNameEnv)
}

log.V(1).Info("Found podname", "Pod.Name", podName)

myPod := &corev1.Pod{
TypeMeta: metav1.TypeMeta{
APIVersion: "v1",
Kind: "Pod",
},
}

key := crclient.ObjectKey{Namespace: ns, Name: podName}
err := client.Get(ctx, key, myPod)
myPod, err := k8sutil.GetPod(ctx, client, ns)
if err != nil {
log.Error(err, "Failed to get pod", "Pod.Namespace", ns, "Pod.Name", podName)
return nil, err
}

Expand Down
117 changes: 95 additions & 22 deletions pkg/metrics/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,72 +23,78 @@ import (
v1 "k8s.io/api/core/v1"
apierrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
"k8s.io/apimachinery/pkg/types"
"k8s.io/apimachinery/pkg/util/intstr"
"k8s.io/client-go/rest"
crclient "sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/client/config"
logf "sigs.k8s.io/controller-runtime/pkg/runtime/log"
)

var log = logf.Log.WithName("metrics")

// PrometheusPortName defines the port name used in kubernetes deployment and service resources
const PrometheusPortName = "metrics"
var trueVar = true

const (
// PrometheusPortName defines the port name used in the metrics Service.
PrometheusPortName = "metrics"
)

// ExposeMetricsPort creates a Kubernetes Service to expose the passed metrics port.
func ExposeMetricsPort(ctx context.Context, port int32) (*v1.Service, error) {
client, err := createClient()
if err != nil {
return nil, fmt.Errorf("failed to create new client: %v", err)
}
// We do not need to check the validity of the port, as controller-runtime
// would error out and we would never get to this stage.
s, err := initOperatorService(port, PrometheusPortName)
s, err := initOperatorService(ctx, client, port, PrometheusPortName)
if err != nil {
if err == k8sutil.ErrNoNamespace {
log.Info("Skipping metrics Service creation; not running in a cluster.")
return nil, nil
}
return nil, fmt.Errorf("failed to initialize service object for metrics: %v", err)
}
service, err := createService(ctx, s)
service, err := createOrUpdateService(ctx, client, s)
if err != nil {
return nil, fmt.Errorf("failed to create or get service for metrics: %v", err)
}

return service, nil
}

func createService(ctx context.Context, s *v1.Service) (*v1.Service, error) {
config, err := rest.InClusterConfig()
if err != nil {
return nil, err
}

client, err := crclient.New(config, crclient.Options{})
if err != nil {
return nil, err
}

func createOrUpdateService(ctx context.Context, client crclient.Client, s *v1.Service) (*v1.Service, error) {
if err := client.Create(ctx, s); err != nil {
if !apierrors.IsAlreadyExists(err) {
return nil, err
}
// Get existing Service and return it
// Service already exists, we want to update it
// as we do not know if any fields might have changed.
existingService := &v1.Service{}
err := client.Get(ctx, types.NamespacedName{
Name: s.Name,
Namespace: s.Namespace,
}, existingService)

s.ResourceVersion = existingService.ResourceVersion
if existingService.Spec.Type == v1.ServiceTypeClusterIP {
s.Spec.ClusterIP = existingService.Spec.ClusterIP
}
err = client.Update(ctx, s)
if err != nil {
return nil, err
}
log.Info("Metrics Service object already exists", "name", existingService.Name)
log.V(1).Info("Metrics Service object updated", "Service.Name", s.Name, "Service.Namespace", s.Namespace)
return existingService, nil
}

log.Info("Metrics Service object created", "name", s.Name)
log.Info("Metrics Service object created", "Service.Name", s.Name, "Service.Namespace", s.Namespace)
return s, nil
}

// initOperatorService returns the static service which exposes specifed port.
func initOperatorService(port int32, portName string) (*v1.Service, error) {
func initOperatorService(ctx context.Context, client crclient.Client, port int32, portName string) (*v1.Service, error) {
operatorName, err := k8sutil.GetOperatorName()
if err != nil {
return nil, err
Expand All @@ -97,11 +103,14 @@ func initOperatorService(port int32, portName string) (*v1.Service, error) {
if err != nil {
return nil, err
}

label := map[string]string{"name": operatorName}

service := &v1.Service{
ObjectMeta: metav1.ObjectMeta{
Name: operatorName,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. So I'm wondering if your original instinct to give the operator developer a chance to tweak the service before submitting it to the cluster was a good one.

As is, if an operator developer wants to expose any other ports via a service, they will either have to create a separate service with a different name, or won't be able to enable metrics.

Another option could be to have ExposeMetricsPort take a service as an input parameter, and if its nil initialize it with the metrics port. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree, my only fear is that it will break the API. Can open an issue to discuss this, as it's not strictly related to this PR anyways?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done #1107

Namespace: namespace,
Labels: map[string]string{"name": operatorName},
Labels: label,
},
TypeMeta: metav1.TypeMeta{
Kind: "Service",
Expand All @@ -119,8 +128,72 @@ func initOperatorService(port int32, portName string) (*v1.Service, error) {
Name: portName,
},
},
Selector: map[string]string{"name": operatorName},
Selector: label,
},
}

ownRef, err := getPodOwnerRef(ctx, client, namespace)
if err != nil {
return nil, err
}
service.SetOwnerReferences([]metav1.OwnerReference{*ownRef})

return service, nil
}

func getPodOwnerRef(ctx context.Context, client crclient.Client, ns string) (*metav1.OwnerReference, error) {
// Get current Pod the operator is running in
pod, err := k8sutil.GetPod(ctx, client, ns)
if err != nil {
return nil, err
}
podOwnerRefs := metav1.NewControllerRef(pod, pod.GroupVersionKind())
// Get Owner that the Pod belongs to
ownerRef := metav1.GetControllerOf(pod)
finalOwnerRef, err := findFinalOwnerRef(ctx, client, ns, ownerRef)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We could simplify things further if we declare a new gvkObject interface and pass the object itself into findFinalOwnerRef:

func getPodOwnerRef(ctx context.Context, client crclient.Client, ns string) (*metav1.OwnerReference, error) {
	// Get current Pod the operator is running in
	pod, err := k8sutil.GetPod(ctx, client, ns)
	if err != nil {
		return nil, err
	}

	return findFinalOwnerRef(ctx, client, ns, pod)
}

type gvkObject interface {
	metav1.Object
	GroupVersionKind() schema.GroupVersionKind
}

func findFinalOwnerRef(ctx context.Context, client crclient.Client, ns string, obj gvkObject) (*metav1.OwnerReference, error) {
	ownerRef := metav1.GetControllerOf(obj)
	if ownerRef == nil {
		log.V(1).Info("Pods owner found", "Kind", ownerRef.Kind, "Name", ownerRef.Name, "Namespace", ns)
		return metav1.NewControllerRef(obj, obj.GroupVersionKind()), nil
	}
	ownerObj := &unstructured.Unstructured{}
	ownerObj.SetAPIVersion(ownerRef.APIVersion)
	ownerObj.SetKind(ownerRef.Kind)
	err := client.Get(ctx, types.NamespacedName{Namespace: ns, Name: ownerRef.Name}, ownerObj)
	if err != nil {
		return nil, err
	}
	return findFinalOwnerRef(ctx, client, ns, ownerObj)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to leave the function as is, as it has a clear entry point and you can see from the declaration what it does.

if err != nil {
return nil, err
}
if finalOwnerRef != nil {
return finalOwnerRef, nil
}

// Default to returning Pod as the Owner
return podOwnerRefs, nil
}

// findFinalOwnerRef tries to locate the final controller/owner based on the owner reference provided.
func findFinalOwnerRef(ctx context.Context, client crclient.Client, ns string, ownerRef *metav1.OwnerReference) (*metav1.OwnerReference, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think that this function has to handle DeploymentConfigs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but I think we should try to do it as generically as possible. I wonder if we could consolidate the switch statement cases for known types to be:

case "ReplicaSet", "ReplicationController":
	rsrc := &unstructured.Unstructured{}
	key := crclient.ObjectKey{Namespace: ns, Name: ownerRef.Name}
	if err := client.Get(ctx, key, rsrc); err != nil {
		return nil, err
	}
	rsrcOwner := metav1.GetControllerOf(rsrc)

	// If we find an owner for the RS/RC, return that.
	// Otherwise, just return the ownerRef directly.
	if rsrcOwner != nil {
		return rsrcOwner, nil
	}
	return ownerRef, nil
case "DaemonSet", "StatefulSet", "Job":
	// I think we can just return this directly.
	// Any reason to fetch the object first?
	return ownerRef, nil

I think this would cover Deployments and DeploymentConfigs generically, and it would be a bit more future-proof since we're being less explicit about some of the types we care about.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @joelanford. A generic approach will reduce code significantly.

Copy link
Member Author

@lilic lilic Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

	// Any reason to fetch the object first?
	return ownerRef, nil

Agreed! 🤦‍♀️ Done!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would then return the RS/RC as the owner directly, above we said we want to avoid doing that? Did I misunderstand your suggestion above maybe #1037 (comment)

// If we find an owner for the RS/RC, return that.
	// Otherwise, just return the ownerRef directly.
	if rsrcOwner != nil {
		return rsrcOwner, nil
	}

The reason was to avoid returning RS/RC as the owner and to avoid returning non k8s controllers owners as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lilic I think if the ReplicaSet or ReplicationController has an owner we should return its owner, regardless of the Kind of the owner. That would cover the cases for Deployment and DeploymentConfig implicitly and it would mean other custom or future native K8s or Openshift types that own and manage ReplicaSets and ReplicationControllers would be supported without any code changes.

If the ReplicaSet or ReplicationController does not have an owner, I think it makes the most sense to use the ReplicaSet or ReplicationController as the owner directly, since it would be the highest level Kind we found from our traversal of the owner references from the pod.

if ownerRef == nil {
return nil, nil
}

obj := &unstructured.Unstructured{}
obj.SetAPIVersion(ownerRef.APIVersion)
obj.SetKind(ownerRef.Kind)
err := client.Get(ctx, types.NamespacedName{Namespace: ns, Name: ownerRef.Name}, obj)
if err != nil {
return nil, err
}
newOwnerRef := metav1.GetControllerOf(obj)
if newOwnerRef != nil {
return findFinalOwnerRef(ctx, client, ns, newOwnerRef)
}

log.V(1).Info("Pods owner found", "Kind", ownerRef.Kind, "Name", ownerRef.Name, "Namespace", ns)
return ownerRef, nil
}

func createClient() (crclient.Client, error) {
config, err := config.GetConfig()
if err != nil {
return nil, err
}

client, err := crclient.New(config, crclient.Options{})
if err != nil {
return nil, err
}

return client, nil
}