[System] Make network_summary namespace/container aware #605

sorantis · 2021-02-02T13:46:10Z

Elastic Cloud SREs want to be able to export network summary metrics for all namespaces/docker containers.

Right now network_summary works for the host(/namespace) metricbeat is running on, but it would be really useful to debug networking issues if they were able to have the same kind of network statistics from all namespaces/containers running on a host without the need to run a separate metricbeat inside each container.

These network metrics are accessible from a pid perspective, e.g. in /proc/XXXX/net/netstat or /proc/XXXX/snmp
Since we know in which cgroup each pid belongs from /proc/XXXX/cgroup so it should be feasible to only grab these metrics once per cgroup and not for every pid per metricbeat poll.

These metrics should then have an additional field with the container/namespace id.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-02-02T13:46:11Z

Pinging @elastic/integrations (Team:Integrations)

fearful-symmetry · 2021-02-02T15:33:35Z

This is definitely a cool idea, and it won't be difficult. I assume we'll want to keep this as an addition to the network_summary metricset, since it's basically the same data. Does Cloud have any ideas as far as config? Since we probably don't want to turn this into a firehose, I wonder if we can have it match PIDs from some kind of glob, or map container activity to a given PID. If the latter is possible, I wonder if this would work better as some kind of processor somewhere else?

sorantis · 2021-02-02T15:38:50Z

@elkargig, @matschaffer would love your input here as it will greatly help us in solving the use case.

cargious · 2021-02-02T15:52:52Z

@ydubreuil may also be interested to this conversation around PID/mapping.

elkargig · 2021-02-02T16:58:24Z

Is it possible to configure a list of docker container names (which map to docker container ids -> cgroup IDs) instead of process names ? The docker container name -> docker container id map doesn't sound too expensive to be happening on every metricbeat run. If it is, could we set a caching TTL (and suffer all the consequences of cache invalidation errors in the future) ?

At least for Elastic Cloud that would be preferred I believe, as docker container names are static, but I fully understand why someone would prefer to run this for all processes matching a name, eg 'nginx', 'haproxy', etc.
A process name of java or python though could be too generic for our needs.

fearful-symmetry · 2021-02-02T17:01:54Z

Yah, I assume that mapping container names to cgroup and process info is something we're already doing somewhere else. Again, the issue then becomes if this kind of data is better suited to a different metricset, or perhaps a processor of some kind.

unixsurfer · 2021-02-03T10:16:53Z

I think we could potentially follow the same logic as https://github.com/google/cadvisor.
I couldn't find in the code how they build the map for container names and PIDs, but I do remember that it was fetching/inspecting container names from docker to build a map and then traverse the appropriate tree under /proc. The used to have a cache layer, but I worked with it +4 years ago, so many things may have changed since then.

ydubreuil · 2021-02-05T09:57:53Z

At the very least, we would need the cgroup id as it is reported in system.process.cgroup.id to identify the docker container id. But as @elkargig mentioned, having the docker container name would be ideal.

fearful-symmetry · 2021-04-05T18:09:06Z

So, I'm leaning towards adding this to docker/network, since it should in theory we should be able to map everything to the docker container by name.

exekias · 2021-04-06T22:23:27Z

add_docker_metadata is able to match and enrich documents based on their cgroup id. See the example in from the docs: #match_fields: ["system.process.cgroup.id"].

So we could just report cgroups network_summary, making sure the id is included. Then enable the add_docker_metadata processor to do the rest 🙂

I found this issue from @andrewkroh that may help here: elastic/beats#2483

fearful-symmetry · 2021-04-07T16:17:39Z

So we could just report cgroups network_summary, making sure the id is included.

@exekias what exactly are you imagining here? network_summary reports a separate event for each cgroup ID?

exekias · 2021-04-08T09:06:03Z

tbh I don't have a clear picture of how this could look 😇.

Right now network_summary reports totals for the host, and we probably want to keep things backwards compatible. That means that we cannot really report cgroup specific summaries under the same fields, right?

If we follow the pattern that we have for other cgroup metrics this would go under the process metricset, adding a system.process.cgroup.network field.

My biggest concern is that reporting processes with cgroup data is becoming expensive in terms of storage. I also see how @elkargig is more interested on having these metrics per cgroup (/container) as compared to process. I wonder if you folks would prefer a different organization, something like having a cgroups metricset that reports summaries per cgroup, instead of having cgroup data reported per process. Thoughts?

We can definitely enrich both cases with docker metadata (including container name).

fearful-symmetry · 2021-04-08T18:06:11Z

My biggest concern is that reporting processes with cgroup data is becoming expensive in terms of storage. I also see how @elkargig is more interested on having these metrics per cgroup (/container) as compared to process.

Yah, that's why I was thinking of adding it to docker/network, since system.process seems to be ballooning in complexity, and it already hogs more space than any other metricset. Cgroup info usually most useful when it's attached to something else, so I'm not sure it makes sense to have a separate cgroup metricset, but I could be wrong.

elkargig · 2021-04-15T08:16:49Z

The end goal for us it to be able to graph "various network metrics per (docker) container name".

I'm not sure I fully understand the proposal here, but having the same network metrics captured per different process pid seems a little wasteful regarding stored resources, as multiple processes will be reporting the same network metrics as they belong in the same cgroup.

hope it helps

exekias · 2021-04-15T10:24:44Z

Thanks, I think that helps answering the question! you are more interested in metrics at the container level over fine grained process details.

@fearful-symmetry I'm actually wondering if docker offers these metrics already and we are just not retrieving them?

fearful-symmetry · 2021-04-19T15:37:52Z

I'm actually wondering if docker offers these metrics already and we are just not retrieving them?

@exekias if they're somewhere in the stats APIs, I don't see them. I started experimenting with this last week, I think the best way to do this is to grab the PIDs associated with the container via the docker API, and then query the network stats for each of those PIDs.

fearful-symmetry · 2021-05-10T16:35:57Z

PR Has been merged here: elastic/beats#25354

Unfortunately this probably won't make it into the next release unless we want to backport this late, as I had a perfect storm complexity in the PR itself and of CI issues that resulted in it just sitting in review for a week.

elkargig · 2021-05-11T10:26:54Z

thanks a lot @fearful-symmetry ! is there a beats build we can use to test this before it gets released ?

fearful-symmetry · 2021-05-11T15:47:39Z

@elkargig It'll be in the 7.14 snapshot builds, which should be available by Jun 29th-ish.

botelastic · 2022-09-19T21:03:32Z

Hi! We just realized that we haven't looked into this issue in a while. We're sorry! We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

sorantis added enhancement New feature or request Team:Integrations Label for the Integrations team Integration:system System labels Feb 2, 2021

botelastic bot added the Stalled label Sep 19, 2022

botelastic bot closed this as completed Mar 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[System] Make network_summary namespace/container aware #605

[System] Make network_summary namespace/container aware #605

sorantis commented Feb 2, 2021

elasticmachine commented Feb 2, 2021

fearful-symmetry commented Feb 2, 2021 •

edited

Loading

sorantis commented Feb 2, 2021

cargious commented Feb 2, 2021

elkargig commented Feb 2, 2021

fearful-symmetry commented Feb 2, 2021

unixsurfer commented Feb 3, 2021

ydubreuil commented Feb 5, 2021

fearful-symmetry commented Apr 5, 2021

exekias commented Apr 6, 2021

fearful-symmetry commented Apr 7, 2021

exekias commented Apr 8, 2021

fearful-symmetry commented Apr 8, 2021

elkargig commented Apr 15, 2021

exekias commented Apr 15, 2021

fearful-symmetry commented Apr 19, 2021

fearful-symmetry commented May 10, 2021

elkargig commented May 11, 2021

fearful-symmetry commented May 11, 2021

botelastic bot commented Sep 19, 2022

[System] Make network_summary namespace/container aware #605

[System] Make network_summary namespace/container aware #605

Comments

sorantis commented Feb 2, 2021

elasticmachine commented Feb 2, 2021

fearful-symmetry commented Feb 2, 2021 • edited Loading

sorantis commented Feb 2, 2021

cargious commented Feb 2, 2021

elkargig commented Feb 2, 2021

fearful-symmetry commented Feb 2, 2021

unixsurfer commented Feb 3, 2021

ydubreuil commented Feb 5, 2021

fearful-symmetry commented Apr 5, 2021

exekias commented Apr 6, 2021

fearful-symmetry commented Apr 7, 2021

exekias commented Apr 8, 2021

fearful-symmetry commented Apr 8, 2021

elkargig commented Apr 15, 2021

exekias commented Apr 15, 2021

fearful-symmetry commented Apr 19, 2021

fearful-symmetry commented May 10, 2021

elkargig commented May 11, 2021

fearful-symmetry commented May 11, 2021

botelastic bot commented Sep 19, 2022

fearful-symmetry commented Feb 2, 2021 •

edited

Loading