You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Hostmetrics process(es) metrics are retrieved using the go lib gopstutil. There are several known issues with CPU utilization peaks mainly connected to getting the "boot time each time" per process. See
A bit off-topic but relevant for hostmetrics "proc/net" network connection metrics is this one. The system.network.connections metric being disabled and not collecting the information from the host does waste less CPU cycles, especially through the flame graph query. It is found that gopsutil is used. This will scan files under the proc directory. The more links, the more CPU. The more resources are occupied, see if it can be adjusted. However, this needs otel collector >= v0.85.0 at least, as before there was a bug that disabling did still execute the scraping. See receiver/hostmetricsreceiver/internal/scraper/networkscraper/network_scraper.go (line 160, extra "if !s.config.Metrics.SystemNetworkConnections.Enabled {"). If you try around with load/perf on a Docker container env, be aware that you need to mount the host filesystem to simulate a proper /proc access.
Describe the solution you'd like
I would like to get improvements and fixes related to the high CPU utilization issues caused by retrieving process metrics using gopstutil. Specifically, addressing the known issues mentioned in gopstutil Issue #1283, gopstutil Issue #842, and gopstutil Issue #1070.
Additionally, I suggest addressing the matter of the system.network.connections metric and its impact on CPU cycles, as described in gopstutil Issue #695.
Describe alternatives you've considered
Alternative solutions could involve optimizing the way process metrics are collected and exploring methods to reduce CPU utilization by less /not at all getting "boot time each time per process " or for metrics like system.network.connections. These alternatives may require changes to the gopstutil library or adjustments in the Otell Collector code.
Additional context
I have performed an analysis on CPU spikes for the OTEL collector by modifying the configuration and running a cputop script for approximately 5 minutes. The host metrics scrapers list used in OTEL collector includes CPU, memory, network, disk, load, paging, processes, and process metrics.
Based on the data collected, it is evident that certain metrics significantly impact CPU spikes while others do not. For example:
Process and Processes metrics create an impact on CPU spike.
CPU, memory, and paging metrics do not seem to have a significant effect on CPU spike.
This information should guide our approach to optimizing the collector's performance and reducing CPU utilization in specific areas.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Hostmetrics process(es) metrics are retrieved using the go lib gopstutil. There are several known issues with CPU utilization peaks mainly connected to getting the "boot time each time" per process. See
A bit off-topic but relevant for hostmetrics "proc/net" network connection metrics is this one. The
system.network.connections
metric being disabled and not collecting the information from the host does waste less CPU cycles, especially through the flame graph query. It is found that gopsutil is used. This will scan files under the proc directory. The more links, the more CPU. The more resources are occupied, see if it can be adjusted. However, this needs otel collector >= v0.85.0 at least, as before there was a bug that disabling did still execute the scraping. See receiver/hostmetricsreceiver/internal/scraper/networkscraper/network_scraper.go (line 160, extra "if !s.config.Metrics.SystemNetworkConnections.Enabled {"). If you try around with load/perf on a Docker container env, be aware that you need to mount the host filesystem to simulate a proper /proc access.Describe the solution you'd like
I would like to get improvements and fixes related to the high CPU utilization issues caused by retrieving process metrics using gopstutil. Specifically, addressing the known issues mentioned in gopstutil Issue #1283, gopstutil Issue #842, and gopstutil Issue #1070.
Additionally, I suggest addressing the matter of the
system.network.connections
metric and its impact on CPU cycles, as described in gopstutil Issue #695.Describe alternatives you've considered
Alternative solutions could involve optimizing the way process metrics are collected and exploring methods to reduce CPU utilization by less /not at all getting "boot time each time per process " or for metrics like
system.network.connections
. These alternatives may require changes to the gopstutil library or adjustments in the Otell Collector code.Additional context
I have performed an analysis on CPU spikes for the OTEL collector by modifying the configuration and running a cputop script for approximately 5 minutes. The host metrics scrapers list used in OTEL collector includes CPU, memory, network, disk, load, paging, processes, and process metrics.
Based on the data collected, it is evident that certain metrics significantly impact CPU spikes while others do not. For example:
This information should guide our approach to optimizing the collector's performance and reducing CPU utilization in specific areas.
The text was updated successfully, but these errors were encountered: