reduce cpu consumption of hostmetric receiver (process & processes) #8789

cforce · 2023-11-01T14:37:54Z

Is your feature request related to a problem? Please describe.

Hostmetrics process(es) metrics are retrieved using the go lib gopstutil. There are several known issues with CPU utilization peaks mainly connected to getting the "boot time each time" per process. See

A bit off-topic but relevant for hostmetrics "proc/net" network connection metrics is this one. The system.network.connections metric being disabled and not collecting the information from the host does waste less CPU cycles, especially through the flame graph query. It is found that gopsutil is used. This will scan files under the proc directory. The more links, the more CPU. The more resources are occupied, see if it can be adjusted. However, this needs otel collector >= v0.85.0 at least, as before there was a bug that disabling did still execute the scraping. See receiver/hostmetricsreceiver/internal/scraper/networkscraper/network_scraper.go (line 160, extra "if !s.config.Metrics.SystemNetworkConnections.Enabled {"). If you try around with load/perf on a Docker container env, be aware that you need to mount the host filesystem to simulate a proper /proc access.

gopstutil Issue #695

Describe the solution you'd like
I would like to get improvements and fixes related to the high CPU utilization issues caused by retrieving process metrics using gopstutil. Specifically, addressing the known issues mentioned in gopstutil Issue #1283, gopstutil Issue #842, and gopstutil Issue #1070.

Additionally, I suggest addressing the matter of the system.network.connections metric and its impact on CPU cycles, as described in gopstutil Issue #695.

Describe alternatives you've considered
Alternative solutions could involve optimizing the way process metrics are collected and exploring methods to reduce CPU utilization by less /not at all getting "boot time each time per process " or for metrics like system.network.connections. These alternatives may require changes to the gopstutil library or adjustments in the Otell Collector code.

Additional context
I have performed an analysis on CPU spikes for the OTEL collector by modifying the configuration and running a cputop script for approximately 5 minutes. The host metrics scrapers list used in OTEL collector includes CPU, memory, network, disk, load, paging, processes, and process metrics.

Based on the data collected, it is evident that certain metrics significantly impact CPU spikes while others do not. For example:

Process and Processes metrics create an impact on CPU spike.
CPU, memory, and paging metrics do not seem to have a significant effect on CPU spike.

This information should guide our approach to optimizing the collector's performance and reducing CPU utilization in specific areas.

The text was updated successfully, but these errors were encountered:

cforce · 2023-11-01T14:43:12Z

Move to contrib repo open-telemetry/opentelemetry-collector-contrib#28849

cforce mentioned this issue Nov 1, 2023

[processor/resourcedetection, receiver/hostmetrics] Report host CPU info as resource attributes open-telemetry/opentelemetry-collector-contrib#26532

Closed

cforce closed this as completed Nov 1, 2023

cforce mentioned this issue Jul 4, 2024

Use netlink API to get net connections for better performance shirou/gopsutil#695

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce cpu consumption of hostmetric receiver (process & processes) #8789

reduce cpu consumption of hostmetric receiver (process & processes) #8789

cforce commented Nov 1, 2023

cforce commented Nov 1, 2023

reduce cpu consumption of hostmetric receiver (process & processes) #8789

reduce cpu consumption of hostmetric receiver (process & processes) #8789

Comments

cforce commented Nov 1, 2023

cforce commented Nov 1, 2023