Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

container/crio: fix network metrics collection logic #3582

Closed

Conversation

sohankunkerkar
Copy link

@sohankunkerkar sohankunkerkar commented Aug 27, 2024

Fixes #3577
This change modifies the logic for gathering network metrics in the CRI-O handler to ensure that metrics are collected from all containers. Since all containers share the same network namespace, gathering metrics from every container is crucial to account for scenarios where the infra container may have empty network metrics.

Future improvements:

We can use a caching mechanism to maintain a designated container per pod. This cache will allow us to track which container to use for metrics collection efficiently. If the designated container dies, we should dynamically find another running container within the same pod.

This change modifies the logic for gathering network metrics in the
CRI-O handler to ensure that metrics are collected from all containers.
Since all containers share the same network namespace, it is crucial to
gather metrics from every container to account for scenarios where the
infra container may have empty network metrics.

Signed-off-by: Sohan Kunkerkar <sohank2602@gmail.com>
@sohankunkerkar
Copy link
Author

cc @haircommander @mrunalp

@haircommander
Copy link
Contributor

LGTM. thanks! @iwankgb PTAL

@kannon92
Copy link
Contributor

Can you add a Fixes to the description so we correctly link what issue this solves?

@kannon92
Copy link
Contributor

Fixes #3577.

@kolyshkin
Copy link
Contributor

So, if there are many containers in the pod, we collect stats for them all (and they are the same)?

@haircommander
Copy link
Contributor

So, if there are many containers in the pod, we collect stats for them all (and they are the same)?

ah I was slightly wrong about how we're falling back to the pod. I think there's another bug here. The pod cgroup instance should have the PID of one of the containers (reported by cri-o) and should replace that pid if the container exits. We shouldn't need to collect the network metrics on pod level for them to show up...

@haircommander
Copy link
Contributor

/close

I believe ca820b6 fixes this better

@iwankgb
Copy link
Collaborator

iwankgb commented Sep 7, 2024

I don't think that removing network stats filtering completely is an acceptable solution. I also believe that @kolyshkin is right: same metrics might be reported for more than one container.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

POD Networking metrics missing with crio-1.30 and kubelet-1.30
5 participants