container/crio: fix network metrics collection logic #3582

sohankunkerkar · 2024-08-27T05:16:47Z

Fixes #3577
This change modifies the logic for gathering network metrics in the CRI-O handler to ensure that metrics are collected from all containers. Since all containers share the same network namespace, gathering metrics from every container is crucial to account for scenarios where the infra container may have empty network metrics.

Future improvements:

We can use a caching mechanism to maintain a designated container per pod. This cache will allow us to track which container to use for metrics collection efficiently. If the designated container dies, we should dynamically find another running container within the same pod.

This change modifies the logic for gathering network metrics in the CRI-O handler to ensure that metrics are collected from all containers. Since all containers share the same network namespace, it is crucial to gather metrics from every container to account for scenarios where the infra container may have empty network metrics. Signed-off-by: Sohan Kunkerkar <sohank2602@gmail.com>

sohankunkerkar · 2024-08-27T05:17:07Z

cc @haircommander @mrunalp

haircommander · 2024-08-27T13:54:35Z

LGTM. thanks! @iwankgb PTAL

kannon92 · 2024-08-27T16:16:51Z

Can you add a Fixes to the description so we correctly link what issue this solves?

kannon92 · 2024-08-27T16:17:26Z

Fixes #3577.

kolyshkin · 2024-08-28T04:57:29Z

So, if there are many containers in the pod, we collect stats for them all (and they are the same)?

haircommander · 2024-08-28T20:08:42Z

So, if there are many containers in the pod, we collect stats for them all (and they are the same)?

ah I was slightly wrong about how we're falling back to the pod. I think there's another bug here. The pod cgroup instance should have the PID of one of the containers (reported by cri-o) and should replace that pid if the container exits. We shouldn't need to collect the network metrics on pod level for them to show up...

haircommander · 2024-09-04T20:24:56Z

/close

I believe ca820b6 fixes this better

iwankgb · 2024-09-07T15:52:01Z

I don't think that removing network stats filtering completely is an acceptable solution. I also believe that @kolyshkin is right: same metrics might be reported for more than one container.

sohankunkerkar closed this Sep 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

container/crio: fix network metrics collection logic #3582

container/crio: fix network metrics collection logic #3582

sohankunkerkar commented Aug 27, 2024 •

edited

Loading

sohankunkerkar commented Aug 27, 2024

haircommander commented Aug 27, 2024

kannon92 commented Aug 27, 2024

kannon92 commented Aug 27, 2024

kolyshkin commented Aug 28, 2024

haircommander commented Aug 28, 2024

haircommander commented Sep 4, 2024

iwankgb commented Sep 7, 2024

container/crio: fix network metrics collection logic #3582

container/crio: fix network metrics collection logic #3582

Conversation

sohankunkerkar commented Aug 27, 2024 • edited Loading

Future improvements:

sohankunkerkar commented Aug 27, 2024

haircommander commented Aug 27, 2024

kannon92 commented Aug 27, 2024

kannon92 commented Aug 27, 2024

kolyshkin commented Aug 28, 2024

haircommander commented Aug 28, 2024

haircommander commented Sep 4, 2024

iwankgb commented Sep 7, 2024

sohankunkerkar commented Aug 27, 2024 •

edited

Loading