Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics Duplication in Upbound UXP Integration #2605

Open
ewertonhm opened this issue Feb 17, 2025 · 1 comment
Open

Metrics Duplication in Upbound UXP Integration #2605

ewertonhm opened this issue Feb 17, 2025 · 1 comment

Comments

@ewertonhm
Copy link

Description
We are using the Upbound UXP integration:
https://github.com/DataDog/integrations-extras/tree/master/upbound_uxp

After integrating it into our Kubernetes environment, we identified an issue with the collected metrics.

Problem
Our Upbound Crossplane environment is highly available, meaning we have multiple upbound_crossplane pods running. Each pod exposes the /metrics endpoint with its metrics.

Datadog collects values from all endpoints of all pods via a discovery process, as we only set the namespace of the pods. These metrics are then divided by kube_node, but since upbound_crossplane runs as a Deployment (not a DaemonSet), multiple pods can run on the same node. This leads to duplicated values in these cases.

Additionally, this approach complicates dashboard usage. Nodes and pods are dynamically created and terminated based on cluster demand. Since kube_node changes throughout the day, we cannot use it as a filter—only as a grouping key. This significantly limits our ability to use these metrics effectively in dashboards and monitors.

Expected Behavior
Ideally, Datadog should collect the /metrics endpoint from only one of the Upbound replicas instead of all of them, avoiding duplicated values.

Suggested Solutions

  • Implement a way to collect metrics from only one replica of Upbound Crossplane.

Would appreciate any suggestions or guidance on resolving this issue!

Thanks!

@ewertonhm
Copy link
Author

I guess something like this:

v1 = client.CoreV1Api()

FILTER_ONE_POD_PER_REPLICA = True  

pods = []

try:
    pods = v1.list_namespaced_pod(namespace="seu-namespace")
except Exception as e:
    print("Unable to list pods. Please check the apiserver cluster role configuration.")
    print(e)
    sys.exit(1)

port_forward_target = 8080

if FILTER_ONE_POD_PER_REPLICA:
    selected_pods = {}

    for pod in pods.items:
        labels = pod.metadata.labels
        
        owner_references = pod.metadata.owner_references
        owner_name = None
        for owner in owner_references:
            if owner.kind == "ReplicaSet":
                owner_name = owner.name  # O nome do ReplicaSet
                break
        
        if owner_name:
            if owner_name not in selected_pods:
                selected_pods[owner_name] = pod

    filtered_pods = list(selected_pods.values())
else:
    filtered_pods = pods.items

for pod in filtered_pods:
    print(f"Selected Pod: {pod.metadata.name}")

in here: https://github.com/DataDog/integrations-extras/blob/master/upbound_uxp/datadog_checks/upbound_uxp/check.py#L497

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant