-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/hostmetrics] permission denied on Linux #20435
[receiver/hostmetrics] permission denied on Linux #20435
Comments
Pinging code owners for receiver/hostmetrics: @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
There are two additional options for process scraper in v0.75.0
you can use them to mute these errors. This version also allows scraping all processes without dropping processes it could not get exe from. So I think it can be closed @dmitryax |
@jskiba These two options are not working for me. I am using go.opentelemetry.io/collector/receiver@v0.81.0/. Please let me know if you need any inputs from my end. I see lots of errors like below
|
@OmprakashPaliwal is correct, when you run collector as non-root and enable one of the optional metrics The solution is to give the collector process read access to files in The only way I was able to fix it (other than running the collector as root, which also fixes this issue) is to add the sudo setcap 'cap_dac_read_search=ep' /path/to/the/collector/binary |
Thanks @mx-psi, I closed this accidentally by merging that PR. To close this issue, I believe we need to make it possible to mute the errors that occur when scraping the One way to do this is to add another |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
On Kubernetes, you'd add the I've tried But this is also something the collector should tolerate gracefully. There's no point flooding the log with repeated, predictable errors. The current mute error flags are insufficient, if it's to be simply muted. |
This is a problem. |
…rom process scraper of the hostmetricsreceiver (#34981) **Description:** We are currently encountering an issue with the `process` scraper in the `hostmetricsreceiver`, primarily due to access rights restrictions for certain processes like system processes for example. This is resulting in a large number of verbose error logs. Most of them are coming from the `process.open_file_descriptors` metric but we have errors coming from other metrics as well. In order to solve this issue, we added a flag `mute_process_all_errors `that mutes errors comming from the process scraper metrics, as these errors are predominantly associated with processes that we should not be monitoring anyways. **Link to tracking Issue:** #20435 **Testing:** Added unit tests **Documentation:** **Errors**: - Permission denied errors: ``` go.opentelemetry.io/collector/receiver@v0.90.1/scraperhelper/scrapercontroller.go:176 2024-09-02T17:24:10.341+0200 error scraping metrics {"kind": "receiver", "name": "hostmetrics/linux/localhost", "data_type": "metrics", "error": "error reading open file descriptor count for process \"systemd\" (pid 1): open /proc/1/fd: permission denied; ``` - File not found errors: ``` go.opentelemetry.io/collector/receiver@v0.90.1/scraperhelper/scrapercontroller.go:176 2024-09-02T17:25:38.688+0200 error scraperhelper/scrapercontroller.go:200 Error scraping metrics {"kind": "receiver", "name": "hostmetrics/process", "data_type": "metrics", "error": "error reading cpu times for process \"java\" (pid 466650): open /proc/466650/stat: no such file or directory; error reading memory info for process \"java\" (pid 466650): open /proc/466650/statm: no such file or directory; error reading thread info for process \"java\" (pid 466650): open /proc/466650/status: no such file or directory; error reading cpu times for process \"java\" (pid 474774): open /proc/474774/stat: no such file or directory; error reading memory info for process \"java\" (pid 474774): open /proc/474774/statm: no such file or directory; error reading thread info for process \"java\" (pid 474774): open /proc/474774/status: no such file or directory; error reading cpu times for process \"java\" (pid 481780): open /proc/481780/stat: no such file or directory; error reading memory info for process \"java\" (pid 481780): open /proc/481780/statm: no such file or directory; error reading thread info for process \"java\" (pid 481780): open /proc/481780/status: no such file or directory", "scraper": "process"} ``` **Config**: ``` receiver hostmetrics/process: collection_interval: ${PROCESSES_COLLECTION_INTERVAL}s scrapers: process: mute_process_name_error: true mute_process_exe_error: true mute_process_io_error: true mute_process_user_error: true resource_attributes: # disable non_used default attributes process.command: enabled: false process.command_line: enabled: false process.executable.path: enabled: false process.owner: enabled: false process.parent_pid: enabled: false metrics: # disable non-used default metrics process.cpu.time: enabled: false process.memory.virtual: enabled: false # enable used optional metrics process.cpu.utilization: enabled: true process.open_file_descriptors: enabled: true process.threads: enabled: true ```
…rom process scraper of the hostmetricsreceiver (open-telemetry#34981) **Description:** We are currently encountering an issue with the `process` scraper in the `hostmetricsreceiver`, primarily due to access rights restrictions for certain processes like system processes for example. This is resulting in a large number of verbose error logs. Most of them are coming from the `process.open_file_descriptors` metric but we have errors coming from other metrics as well. In order to solve this issue, we added a flag `mute_process_all_errors `that mutes errors comming from the process scraper metrics, as these errors are predominantly associated with processes that we should not be monitoring anyways. **Link to tracking Issue:** open-telemetry#20435 **Testing:** Added unit tests **Documentation:** **Errors**: - Permission denied errors: ``` go.opentelemetry.io/collector/receiver@v0.90.1/scraperhelper/scrapercontroller.go:176 2024-09-02T17:24:10.341+0200 error scraping metrics {"kind": "receiver", "name": "hostmetrics/linux/localhost", "data_type": "metrics", "error": "error reading open file descriptor count for process \"systemd\" (pid 1): open /proc/1/fd: permission denied; ``` - File not found errors: ``` go.opentelemetry.io/collector/receiver@v0.90.1/scraperhelper/scrapercontroller.go:176 2024-09-02T17:25:38.688+0200 error scraperhelper/scrapercontroller.go:200 Error scraping metrics {"kind": "receiver", "name": "hostmetrics/process", "data_type": "metrics", "error": "error reading cpu times for process \"java\" (pid 466650): open /proc/466650/stat: no such file or directory; error reading memory info for process \"java\" (pid 466650): open /proc/466650/statm: no such file or directory; error reading thread info for process \"java\" (pid 466650): open /proc/466650/status: no such file or directory; error reading cpu times for process \"java\" (pid 474774): open /proc/474774/stat: no such file or directory; error reading memory info for process \"java\" (pid 474774): open /proc/474774/statm: no such file or directory; error reading thread info for process \"java\" (pid 474774): open /proc/474774/status: no such file or directory; error reading cpu times for process \"java\" (pid 481780): open /proc/481780/stat: no such file or directory; error reading memory info for process \"java\" (pid 481780): open /proc/481780/statm: no such file or directory; error reading thread info for process \"java\" (pid 481780): open /proc/481780/status: no such file or directory", "scraper": "process"} ``` **Config**: ``` receiver hostmetrics/process: collection_interval: ${PROCESSES_COLLECTION_INTERVAL}s scrapers: process: mute_process_name_error: true mute_process_exe_error: true mute_process_io_error: true mute_process_user_error: true resource_attributes: # disable non_used default attributes process.command: enabled: false process.command_line: enabled: false process.executable.path: enabled: false process.owner: enabled: false process.parent_pid: enabled: false metrics: # disable non-used default metrics process.cpu.time: enabled: false process.memory.virtual: enabled: false # enable used optional metrics process.cpu.utilization: enabled: true process.open_file_descriptors: enabled: true process.threads: enabled: true ```
Describe the bug
I want to get process metrics of a linux station. So i'm using a collector as an agent with "hostmetrics". When launching the service, i get errors on "process" scraping. the message returns permission denied error for all PIDs.
Steps to reproduce
Being root on the ubuntu system
Download v0.74.0 of the contrib collector deb file (otel-contrib-collector_0.74.0_amd64.deb)
Install contrib collector: dpkg --install otel-contrib-collector_0.74.0_amd64.deb
Configure it to collect host metrics (specifically, process data) via the hostmetrics receiver and process scraper
What did you expect to see?
No errors
What did you see instead?
Every minute, an error message is generated complaining about error reading process name ... permission denied for seemingly every PID on the machine:
error reading process name for pid 1165232: readlink /proc/1165232/exe: permission denied; error reading process name for pid 1165265: readlink /proc/1165265/exe: permission denied; error reading process name for pid 1166088: readlink /proc/1166088/exe: permission denied; error reading process name for pid 1166634: readlink /proc/1166634/exe: permission denied; error reading process name for pid 1166826: readlink /proc/1166826/exe: permission denied; error reading process name for pid 1166827: readlink /proc/1166827/exe: permission denied; error reading process name for pid 1166874: readlink /proc/1166874/exe: permission denied; error reading process name for pid 1168213: readlink /proc/1168213/exe: permission denied; error reading process name for pid 1168214: readlink /proc/1168214/exe: permission denied; error reading process name for pid 1168221: readlink /proc/1168221/exe: permission denied; error reading process name for pid 1168222: readlink /proc/1168222/exe: permission denied", "scraper": "process"}
What version did you use?
v0.74.0 of the contrib collector (https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.74.0/otelcol-contrib_0.74.0_linux_amd64.deb)
What config did you use?
config.yaml
service file
If i add a "sudo" in exec start, or if a chnage User to "root", error changes to :
1163933: readlink /proc/1163933/exe: no such file or directory; error reading process name for pid 1163935: readlink /proc/1163935/exe: no such file or directory; error reading username for process \"gjs\" (pid 1163938): user: unknown userid 1472934163; error reading process name for pid 1164151: readlink /proc/1164151/exe: no such file or directory; error reading process name for pid 1164366: readlink /proc/1164366/exe: no such file or directory; error reading username for process \"brave\" (pid 1165232): user: unknown userid 1472934163; error reading process name for pid 1165263: readlink /proc/1165263/exe: no such file or directory; error reading process name for pid 1165265: readlink /proc/1165265/exe: no such file or directory; error reading username for process \"sudo\" (pid 1166027): user: unknown userid 1472934163; error reading username for process \"grep\" (pid 1166028): user: unknown userid 1472934163; error reading username for process \"sudo\" (pid 1166035): user: unknown userid 1472934163; error reading process name for pid 1166088: readlink /proc/1166088/exe: no such file or directory", "scraper": "process"}
Environment
OS: Ubuntu 22.04
Additional context
N/A
I also have to mention that i found a closed similar issue which didn't helped me to resolve the problem
The text was updated successfully, but these errors were encountered: