Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/hostmetrics] permission denied on Linux #20435

Open
flenoir opened this issue Mar 28, 2023 · 10 comments · Fixed by SumoLogic/sumologic-otel-collector#1228
Open

[receiver/hostmetrics] permission denied on Linux #20435

flenoir opened this issue Mar 28, 2023 · 10 comments · Fixed by SumoLogic/sumologic-otel-collector#1228
Labels
never stale Issues marked with this label will be never staled and automatically removed receiver/hostmetrics

Comments

@flenoir
Copy link

flenoir commented Mar 28, 2023

Describe the bug
I want to get process metrics of a linux station. So i'm using a collector as an agent with "hostmetrics". When launching the service, i get errors on "process" scraping. the message returns permission denied error for all PIDs.

Steps to reproduce

Being root on the ubuntu system
Download v0.74.0 of the contrib collector deb file (otel-contrib-collector_0.74.0_amd64.deb)
Install contrib collector: dpkg --install otel-contrib-collector_0.74.0_amd64.deb
Configure it to collect host metrics (specifically, process data) via the hostmetrics receiver and process scraper

What did you expect to see?
No errors

What did you see instead?
Every minute, an error message is generated complaining about error reading process name ... permission denied for seemingly every PID on the machine:

error reading process name for pid 1165232: readlink /proc/1165232/exe: permission denied; error reading process name for pid 1165265: readlink /proc/1165265/exe: permission denied; error reading process name for pid 1166088: readlink /proc/1166088/exe: permission denied; error reading process name for pid 1166634: readlink /proc/1166634/exe: permission denied; error reading process name for pid 1166826: readlink /proc/1166826/exe: permission denied; error reading process name for pid 1166827: readlink /proc/1166827/exe: permission denied; error reading process name for pid 1166874: readlink /proc/1166874/exe: permission denied; error reading process name for pid 1168213: readlink /proc/1168213/exe: permission denied; error reading process name for pid 1168214: readlink /proc/1168214/exe: permission denied; error reading process name for pid 1168221: readlink /proc/1168221/exe: permission denied; error reading process name for pid 1168222: readlink /proc/1168222/exe: permission denied", "scraper": "process"}

What version did you use?
v0.74.0 of the contrib collector (https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.74.0/otelcol-contrib_0.74.0_linux_amd64.deb)

What config did you use?
config.yaml

extensions:
  health_check:
  pprof:
    endpoint: 0.0.0.0:1777
  zpages:
    endpoint: 0.0.0.0:55679

receivers:
  otlp:
    protocols:
      grpc:
      http:

  opencensus:
  hostmetrics:
    collection_interval: 30s
    root_path: /
    scrapers:
      cpu:
      memory:
      load:
      filesystem:
      network:
      paging:
      process:
      processes:

  # Collect own metrics
  prometheus:
    config:
      scrape_configs:
      - job_name: 'otel-collector-toto'
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:8888']

  jaeger:
    protocols:
      grpc:
      thrift_binary:
      thrift_compact:
      thrift_http:

  zipkin:

processors:
  batch:
  resource: 
    attributes:
    - key: service.name
      value: machine_toto
      action: upsert
    - key: service.namespace
      value: fl001
      action: upsert
    - key: namespace
      value: test_pc_toto0001
      action: upsert
    - key: cluster
      value: linux_ubuntu
      action: upsert
  resourcedetection:
    detectors: ["system"]
    system:
      hostname_sources: ["os"]

exporters:
  logging:
    verbosity: detailed
  otlp/tempo:
    endpoint: [myendpoint-hidden]:80
    tls:
      insecure: true

service:

  pipelines:
    traces:
      receivers: [otlp, opencensus, jaeger, zipkin]
      processors: [batch]
      exporters: [logging, otlp/tempo]

    metrics:
      receivers: [otlp, opencensus, prometheus, hostmetrics]
      processors: [resource, resourcedetection, batch]
      exporters: [logging, otlp/tempo]

  extensions: [health_check, pprof, zpages]

service file

[Unit]
Description=OpenTelemetry Collector Contrib
After=network.target

[Service]
EnvironmentFile=/etc/otelcol-contrib/otelcol-contrib.conf
ExecStart=/usr/bin/otelcol-contrib $OTELCOL_OPTIONS
KillMode=mixed
Restart=on-failure
Type=simple
User=otelcol-contrib
Group=otelcol-contrib

[Install]
WantedBy=multi-user.target

If i add a "sudo" in exec start, or if a chnage User to "root", error changes to :

1163933: readlink /proc/1163933/exe: no such file or directory; error reading process name for pid 1163935: readlink /proc/1163935/exe: no such file or directory; error reading username for process \"gjs\" (pid 1163938): user: unknown userid 1472934163; error reading process name for pid 1164151: readlink /proc/1164151/exe: no such file or directory; error reading process name for pid 1164366: readlink /proc/1164366/exe: no such file or directory; error reading username for process \"brave\" (pid 1165232): user: unknown userid 1472934163; error reading process name for pid 1165263: readlink /proc/1165263/exe: no such file or directory; error reading process name for pid 1165265: readlink /proc/1165265/exe: no such file or directory; error reading username for process \"sudo\" (pid 1166027): user: unknown userid 1472934163; error reading username for process \"grep\" (pid 1166028): user: unknown userid 1472934163; error reading username for process \"sudo\" (pid 1166035): user: unknown userid 1472934163; error reading process name for pid 1166088: readlink /proc/1166088/exe: no such file or directory", "scraper": "process"}

Environment
OS: Ubuntu 22.04

Additional context
N/A

I also have to mention that i found a closed similar issue which didn't helped me to resolve the problem

@mx-psi mx-psi transferred this issue from open-telemetry/opentelemetry-collector Mar 28, 2023
@mx-psi mx-psi changed the title Hostmetrics on Linux - permission denied [receiver/hostmetrics] permission denied on Linux Mar 28, 2023
@github-actions
Copy link
Contributor

Pinging code owners for receiver/hostmetrics: @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@mx-psi
Copy link
Member

mx-psi commented Mar 28, 2023

Relates to/duplicates #18923 #18232

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label May 29, 2023
@mx-psi mx-psi removed the Stale label May 29, 2023
@jskiba
Copy link
Contributor

jskiba commented Jul 13, 2023

There are two additional options for process scraper in v0.75.0

mute_process_exe_error: <true|false>
mute_process_io_error: <true|false>

you can use them to mute these errors. This version also allows scraping all processes without dropping processes it could not get exe from.

So I think it can be closed @dmitryax

@OmprakashPaliwal
Copy link

@jskiba These two options are not working for me. I am using go.opentelemetry.io/collector/receiver@v0.81.0/.

Please let me know if you need any inputs from my end. I see lots of errors like below

Error scraping metrics        {"kind": "receiver", "name": "hostmetrics/linux/localhost", "data_type": "metrics", "error": "error reading open file descriptor count for process \"systemd\" (pid 1): open /proc/1/fd: permission denied; error reading pending signals for process \"systemd\" (pid 1): open /proc/1/fd: permission denied; error reading open file descriptor count for process \"kthreadd\" (pid 2): open /proc/2/fd: permission denied; error reading pending signals for process \"kthreadd\" (pid 2): open /proc/2/fd: permission denied; error reading open file descriptor count for process \"kworker/0:0H\" (pid 4): open /proc/4/fd: permission denied; error reading pending signals for process \"kworker/0:0H\" (pid 4): open /proc/4/fd: permission denied; error reading open file descriptor count for process \"ksoftirqd/0\" (pid 6): open /proc/6/fd: permission denied; error reading pending signals for process \"ksoftirqd/0\"

@andrzej-stencel
Copy link
Member

@OmprakashPaliwal is correct, when you run collector as non-root and enable one of the optional metrics process.open_file_descriptors or process.signals_pending, you get a permission error from the collector process trying to read /proc/[pid]/fd files for processes that are not owned by the user running the collector. As a result, those two metrics are only generated for the processes that are owned by the user running the collector.

The solution is to give the collector process read access to files in /proc/[pid]/fd directories. Unfortunately, regular Linux file permission settings of don't seem to work on files in the /proc directory.

The only way I was able to fix it (other than running the collector as root, which also fixes this issue) is to add the CAP_DAC_READ_SEARCH Linux capability on the collector binary with:

sudo setcap 'cap_dac_read_search=ep' /path/to/the/collector/binary

⚠️ Warning: This capability this gives the collector binary the ability to read any file on the filesystem. See here for examples to exploit this: https://book.hacktricks.xyz/linux-hardening/privilege-escalation/linux-capabilities#cap_dac_read_search

@andrzej-stencel
Copy link
Member

Thanks @mx-psi, I closed this accidentally by merging that PR.

To close this issue, I believe we need to make it possible to mute the errors that occur when scraping the process.open_file_descriptors or process.signals_pending metric.

One way to do this is to add another mute_... configuration property to the scraper. There are already three available, and I'm not sure if adding a fourth is a good idea. Also, I'm not sure how it should be named. Should we have separate options for each metric name - mute_open_file_descriptors_error and mute_signals_pending_error?

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Nov 13, 2023
@mx-psi mx-psi added never stale Issues marked with this label will be never staled and automatically removed and removed Stale labels Nov 13, 2023
@ringerc
Copy link

ringerc commented Jan 31, 2024

On Kubernetes, you'd add the DAC_READ_SEARCH capability flag in the security_context.capabilities. I haven't verified that it resolves the issue though as the workload permissions in my env don't permit that capability flag (for good reasons).

I've tried CAP_DAC_OVERRIDE and it doesn't seem sufficient.

But this is also something the collector should tolerate gracefully. There's no point flooding the log with repeated, predictable errors. The current mute error flags are insufficient, if it's to be simply muted.

@mustafa0x
Copy link

There's no point flooding the log with repeated, predictable errors.

This is a problem. journalctl -f -u otelcol-contrib is flooded with these errors. Every 5 seconds 182 lines are written, all of the format error reading disk usage for process "<process-name>" (pid <id>): open /proc/<id>/io: permission denied;

dmitryax pushed a commit that referenced this issue Sep 30, 2024

Verified

This commit was signed with the committer’s verified signature.
atanmarko Marko Atanasievski
…rom process scraper of the hostmetricsreceiver (#34981)

**Description:** 
We are currently encountering an issue with the `process` scraper in the
`hostmetricsreceiver`, primarily due to access rights restrictions for
certain processes like system processes for example. This is resulting
in a large number of verbose error logs. Most of them are coming from
the `process.open_file_descriptors` metric but we have errors coming
from other metrics as well.

In order to solve this issue, we added a flag `mute_process_all_errors
`that mutes errors comming from the process scraper metrics, as these
errors are predominantly associated with processes that we should not be
monitoring anyways.



**Link to tracking Issue:**
#20435

**Testing:** Added unit tests

**Documentation:** 

**Errors**:

- Permission denied errors:

```
go.opentelemetry.io/collector/receiver@v0.90.1/scraperhelper/scrapercontroller.go:176
2024-09-02T17:24:10.341+0200    error	scraping metrics        {"kind": "receiver", "name": "hostmetrics/linux/localhost", "data_type": "metrics", "error": "error reading open file descriptor count for process \"systemd\" (pid 1): open /proc/1/fd: permission denied;

```
- File not found errors:

```
go.opentelemetry.io/collector/receiver@v0.90.1/scraperhelper/scrapercontroller.go:176
2024-09-02T17:25:38.688+0200    error   scraperhelper/scrapercontroller.go:200  Error scraping metrics  {"kind": "receiver", "name": "hostmetrics/process", "data_type": "metrics", "error": "error reading cpu times for process \"java\" (pid 466650): open /proc/466650/stat: no such file or directory; error reading memory info for process \"java\" (pid 466650): open /proc/466650/statm: no such file or directory; error reading thread info for process \"java\" (pid 466650): open /proc/466650/status: no such file or directory; error reading cpu times for process \"java\" (pid 474774): open /proc/474774/stat: no such file or directory; error reading memory info for process \"java\" (pid 474774): open /proc/474774/statm: no such file or directory; error reading thread info for process \"java\" (pid 474774): open /proc/474774/status: no such file or directory; error reading cpu times for process \"java\" (pid 481780): open /proc/481780/stat: no such file or directory; error reading memory info for process \"java\" (pid 481780): open /proc/481780/statm: no such file or directory; error reading thread info for process \"java\" (pid 481780): open /proc/481780/status: no such file or directory", "scraper": "process"}

```



**Config**:

```
receiver
  hostmetrics/process:
    collection_interval: ${PROCESSES_COLLECTION_INTERVAL}s
    scrapers:
      process:
        mute_process_name_error: true
        mute_process_exe_error: true
        mute_process_io_error: true
        mute_process_user_error: true
        resource_attributes:
          # disable non_used default attributes
          process.command:
            enabled: false
          process.command_line:
            enabled: false
          process.executable.path:
            enabled: false
          process.owner:
            enabled: false
          process.parent_pid:
            enabled: false
        metrics:
          # disable non-used default metrics
          process.cpu.time:
            enabled: false
          process.memory.virtual:
            enabled: false
          # enable used optional metrics
          process.cpu.utilization:
            enabled: true
          process.open_file_descriptors:
            enabled: true
          process.threads:
            enabled: true

```
jriguera pushed a commit to springernature/opentelemetry-collector-contrib that referenced this issue Oct 4, 2024

Verified

This commit was signed with the committer’s verified signature.
atanmarko Marko Atanasievski
…rom process scraper of the hostmetricsreceiver (open-telemetry#34981)

**Description:** 
We are currently encountering an issue with the `process` scraper in the
`hostmetricsreceiver`, primarily due to access rights restrictions for
certain processes like system processes for example. This is resulting
in a large number of verbose error logs. Most of them are coming from
the `process.open_file_descriptors` metric but we have errors coming
from other metrics as well.

In order to solve this issue, we added a flag `mute_process_all_errors
`that mutes errors comming from the process scraper metrics, as these
errors are predominantly associated with processes that we should not be
monitoring anyways.



**Link to tracking Issue:**
open-telemetry#20435

**Testing:** Added unit tests

**Documentation:** 

**Errors**:

- Permission denied errors:

```
go.opentelemetry.io/collector/receiver@v0.90.1/scraperhelper/scrapercontroller.go:176
2024-09-02T17:24:10.341+0200    error	scraping metrics        {"kind": "receiver", "name": "hostmetrics/linux/localhost", "data_type": "metrics", "error": "error reading open file descriptor count for process \"systemd\" (pid 1): open /proc/1/fd: permission denied;

```
- File not found errors:

```
go.opentelemetry.io/collector/receiver@v0.90.1/scraperhelper/scrapercontroller.go:176
2024-09-02T17:25:38.688+0200    error   scraperhelper/scrapercontroller.go:200  Error scraping metrics  {"kind": "receiver", "name": "hostmetrics/process", "data_type": "metrics", "error": "error reading cpu times for process \"java\" (pid 466650): open /proc/466650/stat: no such file or directory; error reading memory info for process \"java\" (pid 466650): open /proc/466650/statm: no such file or directory; error reading thread info for process \"java\" (pid 466650): open /proc/466650/status: no such file or directory; error reading cpu times for process \"java\" (pid 474774): open /proc/474774/stat: no such file or directory; error reading memory info for process \"java\" (pid 474774): open /proc/474774/statm: no such file or directory; error reading thread info for process \"java\" (pid 474774): open /proc/474774/status: no such file or directory; error reading cpu times for process \"java\" (pid 481780): open /proc/481780/stat: no such file or directory; error reading memory info for process \"java\" (pid 481780): open /proc/481780/statm: no such file or directory; error reading thread info for process \"java\" (pid 481780): open /proc/481780/status: no such file or directory", "scraper": "process"}

```



**Config**:

```
receiver
  hostmetrics/process:
    collection_interval: ${PROCESSES_COLLECTION_INTERVAL}s
    scrapers:
      process:
        mute_process_name_error: true
        mute_process_exe_error: true
        mute_process_io_error: true
        mute_process_user_error: true
        resource_attributes:
          # disable non_used default attributes
          process.command:
            enabled: false
          process.command_line:
            enabled: false
          process.executable.path:
            enabled: false
          process.owner:
            enabled: false
          process.parent_pid:
            enabled: false
        metrics:
          # disable non-used default metrics
          process.cpu.time:
            enabled: false
          process.memory.virtual:
            enabled: false
          # enable used optional metrics
          process.cpu.utilization:
            enabled: true
          process.open_file_descriptors:
            enabled: true
          process.threads:
            enabled: true

```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
never stale Issues marked with this label will be never staled and automatically removed receiver/hostmetrics
Projects
None yet
7 participants