Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dd.collector[1724]: WARNING (disk.py:109): Unable to get disk metrics for ... #2932

Closed
ne0ark opened this issue Oct 19, 2016 · 60 comments
Closed

Comments

@ne0ark
Copy link

ne0ark commented Oct 19, 2016

When I mount anobject storage using: https://github.com/ovh/svfs

I get following error: dd.collector[1724]: WARNING (disk.py:109): Unable to get disk metrics for /root/gra1: [Errno 13] Permission denied: '/root/gra1'

@wesseljt
Copy link

wesseljt commented Nov 7, 2016

same issue here

@randallsquared
Copy link

I'm having this issue, too, though it's just a lot of extra log messages; as far as I can see nothing is being reported incorrectly.

I'm using docker on all my servers, and sometime between docker 1.11.2 and 1.12.1 it started attaching every aufs container as a mount, so that there are many repeated "none" and "shm" filesystems in the output of $ mount. Since these are not actually contributing to the total size of any disk or disk usage, no metrics are actually going unreported, as far as I can tell.

On Ubuntu 16.04 I see this with docker 1.12.1, but not with docker 1.11.2. It also appears that downgrading datadog-agent to 1:5.8.5-1 silences these errors, but I think that's because this was previously a debug log, and now is a warning.

Linux testhost 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

@degemer degemer self-assigned this Nov 8, 2016
@degemer
Copy link
Member

degemer commented Nov 8, 2016

Hey there!
If I understand correctly, your problem @randallsquared is that your logs are spammed by these errors making them difficult to read. Is it exactly the same error? [Errno 13] Permission denied ?

@ne0ark and @wesseljt, I guess you actually want the metrics from the svfs storage. We use https://docs.python.org/2/library/os.html#os.statvfs to get those metrics, and it is failing with the [Errno 13] Permission denied. The agent is running as dd-agent, could you maybe give it an access to this storage (not sure exactly what's needed, sorry)?

@randallsquared
Copy link

@degemer Hi!

Yes, it's exactly the error you quote, and log inflation is indeed the only problem I have with it, now that I understand what's going on.

@degemer
Copy link
Member

degemer commented Nov 9, 2016

Hum, the easy solution would be to switch back to debug level (since most of the time it's only noise), but as this issue shows, this warning is sometimes useful.
Could you try adding none and shm to the excluded_filesystems in disk.yaml ? https://github.com/DataDog/dd-agent/blob/master/conf.d/disk.yaml.default#L13-L14
Especially none, if it's indeed the filesystem type we could exclude it by default.

@randallsquared
Copy link

That worked perfectly, @degemer ! Thanks. :)

@rindek
Copy link

rindek commented Jan 26, 2017

Hello,

I am using version 5.9.1 and I did add none, shm and nsfs to the /etc/dd-agent/conf.d/disk.yaml.default yet I still receive a lot of these:

...
2017-01-26 16:46:06 CET | WARNING | dd.collector | checks.disk(disk.py:109) | Unable to get disk metrics for /var/lib/docker/containers/4049477990b5dd0e6f464fc63561d76d767bca172749f01a745eb54716094c1e/shm: [Errno 13] Permission denied: '/var/lib/docker/containers/4049477990b5dd0e6f464fc63561d76d767bca172749f01a745eb54716094c1e/shm'
2017-01-26 16:46:06 CET | WARNING | dd.collector | checks.disk(disk.py:109) | Unable to get disk metrics for /var/lib/docker/containers/2ffda5ac8e45280fd3d27d482137490249ca1e15725274449740facd1dc89563/shm: [Errno 13] Permission denied: '/var/lib/docker/containers/2ffda5ac8e45280fd3d27d482137490249ca1e15725274449740facd1dc89563/shm'
2017-01-26 16:46:06 CET | WARNING | dd.collector | checks.disk(disk.py:109) | Unable to get disk metrics for /var/lib/docker/containers/b055760e065518661c665f30763b01392d8e983bbeaa34426a1f84a002319d23/shm: [Errno 13] Permission denied: '/var/lib/docker/containers/b055760e065518661c665f30763b01392d8e983bbeaa34426a1f84a002319d23/shm'
...

Previously I also received similiar errors for nsfs filesystem, but adding them to excluded_filesystems did remove these warnings, but shm are still appearing.

What can I do to remove these warnings?

@geowa4
Copy link

geowa4 commented Feb 17, 2017

On Ubuntu 16.04, I'm getting this log

WARNING | dd.collector | checks.disk(disk.py:106) | Unable to get disk metrics for /host/proc/sys/fs/binfmt_misc: [Errno 40] Too many levels of symbolic links: '/host/proc/sys/fs/binfmt_misc'

with this configuration

/usr/bin/docker run --name datadog \
  -v /etc/datadog/conf.d:/conf.d \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -v /proc/:/host/proc/:ro \
  -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
  -p 8125:8125 \
  -p 8126:8126 \
  -p 9001:9001 \
  -p 7777:7777 \
  --env-file /etc/environment.d/datadog \
  datadog/docker-dd-agent

@paulrm
Copy link

paulrm commented Feb 21, 2017

I am getting this:

dd.collector[3596]: WARNING (disk.py:106): Unable to get disk metrics for /var/named/chroot/etc/named.root.key: [Errno 13] Permission denied: '/var/named/chroot/etc/named.root.key

I am using version 5.11.2 even /var/named/chroot/ excluded with

excluded_disk_re: /var/named/chroot/*

@andreysaksonov
Copy link

andreysaksonov commented Feb 28, 2017

my loggly logs (you pay for amount of logs there!) heavily spammed by dd-agent:

WARNING (disk.py:106): Unable to get disk metrics for /run/docker/netns/c061983b11e3: [Errno 13] Permission denied: '/run/docker/netns/c061983b11e3'
2017-02-28 10:55:51.043
WARNING (disk.py:106): Unable to get disk metrics for /var/lib/docker/containers/1671f06638eb2923534243a4378ad548c93fac8f583ccd2cda368ec4a19b4e62/shm: [Errno 13] Permission denied: '/var/lib/docker/containers/1671f06638eb2923534243a4378ad548c93fac8f583ccd2cda368ec4a19b4e62/shm'
2017-02-28 10:55:51.041
WARNING (disk.py:106): Unable to get disk metrics for /var/lib/docker/containers/cfa29a5c834ecf112d9c3dd6f50ca52e50f453c40876484a383e09a2e3b50f79/shm: [Errno 13] Permission denied: '/var/lib/docker/containers/cfa29a5c834ecf112d9c3dd6f50ca52e50f453c40876484a383e09a2e3b50f79/shm'
2017-02-28 10:55:51.038
WARNING (disk.py:106): Unable to get disk metrics for /run/docker/netns/default: [Errno 13] Permission denied: '/run/docker/netns/default'
2017-02-28 10:55:51.036
WARNING (disk.py:106): Unable to get disk metrics for /var/lib/docker/overlay/81b7f13d32732320fe158ea991a7e7a9f3cb4f8c50d9408c013e300c9e7259e7/merged: [Errno 13] Permission denied: '/var/lib/docker/overlay/81b7f13d32732320fe158ea991a7e7a9f3cb4f8c50d9408c013e300c9e7259e7/merged'
2017-02-28 10:55:51.033
WARNING (disk.py:106): Unable to get disk metrics for /var/lib/docker/overlay/bea2782f91b9fd4d6aed9d5c226ccd6c3df1d43394e14556e8858f02fa08a5e9/merged: [Errno 13] Permission denied: '/var/lib/docker/overlay/bea2782f91b9fd4d6aed9d5c226ccd6c3df1d43394e14556e8858f02fa08a5e9/merged'

is there any way to disable this warnings?

version of datadog:

root@scw-033b1b:/etc/dd-agent/conf.d# apt show datadog-agent 
Package: datadog-agent
Version: 1:5.11.2-1

@degemer
Copy link
Member

degemer commented Mar 1, 2017

Sorry for the trouble, we're considering silencing this warning on Docker (since it seems to be the main issue there). Could you confirm you're running the agent using our Docker image ?

In the meantime, you can try the excluded_filesystems option. To get the filesystem corresponding to a mountpoint, you can run this one-liner (replacing /my/mountpoint by yours):

DISK='/my/mountpoint' /opt/datadog-agent/embedded/bin/python -c 'import psutil; import os; print [part.fstype for part in psutil.disk_partitions(all=True) if part.mountpoint == os.environ["DISK"]][0]'

It should print the detected filesystem. Could you give it a try @rindek @andreysaksonov ?

@paulrm excluded_disk_re has to be a Python regex, so excluded_disk_re: /var/named/chroot/.* should work.

@geowa4 you can try either the excluded_disk_re option or the excluded_filesystems one.

Let us know if this helps you!

@degemer degemer added the checks label Mar 2, 2017
@sdsalsero
Copy link

sdsalsero commented Mar 22, 2017

I have been unable to make any of these suggestion work. I have 3 secure-NFS mounts which are spamming my /var/log/messages with errors, e.g. every server has local mount /install = nas:/exports/install-files.

I have tried all the following:

excluded_filesystems:
  - nfs4 
excluded_disks:
  - /install/.*

excluded_disk_re: /install/.*

excluded_mountpoint_re: /install/.*

Are these settings dependent on the 'use_mount' yes/no setting?
Is there a log-file or diagnostic command-line you can run to confirm that your custom 'disk.yaml' is being used?

@boomshadow
Copy link

I'm running a bare metal Ubuntu 16.04 (LTS) server. I'm getting lots of these messages as well:

Apr 12 09:30:12 server-name dd.collector[1523]: WARNING (disk.py:106): Unable to get disk metrics for /sys/kernel/debug/tracing: [Errno 13] Permission denied: '/sys/kernel/debug/tracing'

@degemer, when I run your python script, it reports the filesystem as tracefs

Doing the filesystem exclusion in disk.yaml cleared the warnings for me:

excluded_filesystems:
   - tracefs

@keymon
Copy link

keymon commented May 24, 2017

We get the error:

WARNING | dd.collector | checks.disk(disk.py:106) | Unable to get disk metrics for /host/proc/sys/fs/binfmt_misc: [Errno 40] Too many levels of symbolic links: '/host/proc/sys/fs/binfmt_misc'

Running on a centos7 based image

docker run -d --name dd-agent -m 256m --restart=always --oom-kill-disable=true \
         -v /var/run/docker.sock:/var/run/docker.sock \
         -v /proc/:/host/proc:ro \
         -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
         -v /opt/datadog-agent/agent/conf.d/docker_daemon.yaml:/opt/datadog-agent/agent/conf.d/docker_daemon.yaml \
         -e API_KEY="..." \
         -e TAGS="testing" \
         -e EC2_TAGS="yes" \
 -h `hostname` -p 8125:8125/udp datadog/docker-dd-agent:latest-alpine
  • The docker container is started during boot using a user-script.
  • The command ls -l /host/proc/sys/fs/binfmt_misc/ failed in the container, worked in the host.
  • In our case, we found out that restarting the docker container fixed the issue.

@iautom8things
Copy link

iautom8things commented Jul 24, 2017

Is there a general solution for either fixing or silencing this? I am receiving a similar version of this warning as @andreysaksonov is/was:

Jul 24 11:01:45 ip-10-81-104-135 dd.collector: WARNING (disk.py:106): Unable to get disk metrics for /run/docker/netns/ingress_sbox: [Errno 13] Permission denied: '/run/docker/netns/ingress_sbox'
Jul 24 11:01:45 ip-10-81-104-135 dd.collector: WARNING (disk.py:106): Unable to get disk metrics for /run/docker/netns/1-pnbhxtmol6: [Errno 13] Permission denied: '/run/docker/netns/1-pnbhxtmol6'

I am not running the docker container version of dd-agent.

sudo /etc/init.d/datadog-agent info | grep -i -e "v [0-9]"
Collector (v 5.14.1)
Dogstatsd (v 5.14.1)
Forwarder (v 5.14.1)
Trace Agent (v 5.14.1)

@rindek
Copy link

rindek commented Jul 24, 2017

I have managed to fix that perm denied errors by adding a /etc/dd-agent/conf.d/disk.yml with the contents:

init_config: null
instances:
-   excluded_filesystems:
    - tmpfs
    - none
    - shm
    - nsfs
    - netns
    use_mount: false

If any similiar error shows up I simply add that to the list. The list of filesystems I found using "mount" command and see which type of FS is mounted and which DD is unable to fetch

@Komorebi-E
Copy link

Komorebi-E commented Aug 3, 2017

For "/sys/kernel/debug/tracing":
Add "- tracefs" to /etc/dd-agent/conf.d/disk.yml

Example if you have have copied disk.yaml.default to be disk.yaml
N.B. the owner and group needs to be "dd-agent" for the file.

instances:
  # The use_mount parameter will instruct the check to collect disk
  # and fs metrics using mount points instead of volumes
  - use_mount: no
    # The (optional) excluded_filesystems parameter will instruct the check to
    # ignore disks using these filesystems. Note: On some linux distributions,
    # rootfs will be found and tagged as a device, add rootfs here to exclude.
     excluded_filesystems:
       - tracefs

@sdsalsero
Copy link

FYI
I solved my issue, though it may be different than what others are experiencing. It seems that the parser for the 'disk.yaml' is extremely picky about spacing -- when I adjusted each entry so that it was indented either 2-spaces or 4-spaces, then it worked as expected.

There was no error or logging to indicate that the .yaml was being rejected for formatting.

@irabinovitch
Copy link
Contributor

@sdsalsero ya, yaml is super picky on white space.

@Komorebi-E
Copy link

@sdsalsero If you are on a system with systemD you can run and the log output will give a parse error.
(systemctl status da will autocomplete the service name)

systemctl status datadog-agent

# systemctl restart datadog-agent.service status
...
 datadog-agent.service - "Datadog Agent"
Aug 03 17:29:58 ccpc-udm-n2 dd.collector[18597]: ERROR (config.py:957): Unable to parse yaml config in /etc/dd-agent/conf.d/disk.yaml
                                                 Traceback (most recent call last):
...

@raahsri
Copy link

raahsri commented Aug 30, 2017

@degemer ,
Hi,
Please help me I am getting the error while editing in process.yaml file unable to get the metrics.
Errors are as below.

process.yaml contains errors:
[Errno 13] Permission denied: '/etc/dd-agent/conf.d/process.yaml'

Checks

process

@raahsri
Copy link

raahsri commented Aug 31, 2017

@andreysaksonov please help

@lordkyzr
Copy link

I created a repo with the necessary changes to suppress these warnings since none of these solutions worked for me and I have it working on my Docker boxes. Is there any interest in this? I can open a PR if there is.

@rafaelmagu
Copy link

Ran into this today with Docker CE on Ubuntu 16.04, and realised overlay isn't in the list. Once added, errors stop.

@Darwiner
Copy link

Darwiner commented Oct 18, 2017

On a somewhat pretty related topic...

Would anyone happen to have more information on how to get these to be excluded/ignored?

dd.collector[15713]: WARNING (disk.py:106): Unable to get disk metrics for /var/lib/docker/devicemapper/mnt/886ba23853262fbb67b5ebadfed842c1cf60ba9934bb7ad0a08427a81f2bdba1: [Errno 13] Permission denied: '/var/lib/docker/devicemapper/mnt/886ba23853262fbb67b5ebadfed842c1cf60ba9934bb7ad0a08427a81f2bdba1'
dd.collector[26954]: WARNING (disk.py:106): Unable to get disk metrics for net:[4026532199]: [Errno 2] No such file or directory: 'net:[4026532199]'

Putting excluded_mountpoint_re: /var/lib/docker/devicemapper/mnt/.* in place gets rid of one of the two errors. Although I'd much prefer to ignore this using something that seems a bit less "static" than a specific path.

It would also seem that net:[4026532199] is a the running docker container's network namespace identifier, but how could I go about getting disk to stop trying to get info on this?

ps. This is running datadog-agent 5.14.1.

@Darwiner
Copy link

Otherwise, possibly as a workaround, is there a way for excluded_mountpoint_re to accept multiple options? So that I could get both /var/lib/docker/devicemapper/mnt/.* and also net:.* excluded? I've tried a few different methods but can't seem to make it work.

@Darwiner
Copy link

Well, for the record, seems like adding proc to the list of excluded_filesystems gets rid of this warning message.

Solution found via https://blog.bgbgbg.net/archives/4243.

# DISK='net:[4026532174]' /opt/datadog-agent/embedded/bin/python -c 'import psutil; import os; print [part.fstype for part in psutil.disk_partitions(all=True) if part.mountpoint == os.environ["DISK"]][0]'
proc

@bluemalkin
Copy link

bluemalkin commented Nov 1, 2017

I've tried every option here and I still get the error message. Has anyone had any luck with overlay2 ?

Nov 01 00:37:32 HOST dd.collector[30483]: WARNING (disk.py:106): Unable to get disk metrics for /var/lib/docker/overlay2/8c6ae203740b943662099f74f3b31105f5c29a7dad04630453a4a8b894bcfbdb/merged: [Errno 13] Permission denied: '/var/lib/docker/overlay2/8c6ae203740b943662099f74f3b31105f5c29a7dad04630453a4a8b894bcfbdb/merged'
Nov 01 00:37:32 HOST dd.collector[30483]: WARNING (disk.py:106): Unable to get disk metrics for /var/lib/docker/containers/5ab96ee31e289f2c4411ab8bbe100b5dd96bd7020cb3484a0c6352b12b5f6139/shm: [Errno 13] Permission denied: '/var/lib/docker/containers/5ab96ee31e289f2c4411ab8bbe100b5dd96bd7020cb3484a0c6352b12b5f6139/shm'
Nov 01 00:37:32 HOST dd.collector[30483]: WARNING (disk.py:106): Unable to get disk metrics for /run/docker/netns/4e7cd621fd83: [Errno 13] Permission denied: '/run/docker/netns/4e7cd621fd83'```

@bplein
Copy link

bplein commented Aug 8, 2018

I am getting syslog spammed with "Got automount request for /proc/sys/fs/binfmt_misc" on Ubuntu 18.04, running the dd-agent as a container via the "easy one step install" at https://app.datadoghq.com/account/settings#agent/docker

Do I need to install from Github source in order to ignore that filesystem?

alexwitherspoon added a commit to alexwitherspoon/spoon.family that referenced this issue Nov 18, 2018
@mlahaye
Copy link

mlahaye commented Dec 6, 2018

Here's my disk.yaml to ignore those filesystems:

init_config:

instances:
  -
    excluded_mountpoint_re: (/var/lib/docker/.*|/run/docker/netns/.*)

@martinlevesque
Copy link

Disabled the log from datadog completely:

In your datadog.yaml:
log_to_syslog: false
log_to_console: false

@emj-io
Copy link

emj-io commented Feb 14, 2019

Using the Docker image, I ended up turning DD_LOG_LEVEL down to error. I'm with CumpsD, I would like a better explanation of why we're seeing the error.

@flmmartins
Copy link

Hello, also facing this issue. Is there any eta to fix this?

@calvinbui
Copy link

On Datadog 6, I've had success with

$ cat /etc/datadog-agent/conf.d/disk.d/conf.yaml
init_config: null
instances:
  - mount_point_blacklist:
    - /var/lib/docker/(containers|overlay2)/.*/(shm|merged)
    - /run/docker/netns.*

@lynk81
Copy link

lynk81 commented Jun 21, 2019

On Datadog 6, I've had success with

$ cat /etc/datadog-agent/conf.d/disk.d/conf.yaml
init_config: null
instances:
  - mount_point_blacklist:
    - /var/lib/docker/(containers|overlay2)/.*/(shm|merged)
    - /run/docker/netns.*

Ok so this worked for me to filter out A LOT of noise which is great.

I was still seeing the two messages below;

(pkg/collector/py/datadog_agent.go:148 in LogMessage) | (disk.py:114) | Unable to get disk metrics for /sys/kernel/debug/tracing: [Errno 13] Permission denied: '/sys/kernel/debug/tracing'

pkg/collector/py/datadog_agent.go:148 in LogMessage) | (disk.py:114) | Unable to get disk metrics for /run/user/1000/gvfs: [Errno 13] Permission denied: '/run/user/1000/gvfs'

so i added 2 extra lines to disk.d/conf.yaml so it looks like this;

init_config: null
instances:

  • mount_point_blacklist:
    - /var/lib/docker/(containers|overlay2)/./(shm|merged)
    - /run/docker/netns.

    - /sys/kernel/debug/
    - /run/user/1000/

and they are all gone.

@pierot
Copy link

pierot commented Oct 14, 2019

I have followed most recommendations found in this issue but I still get flooded by the WARN message:

(disk.py:92) | Unable to get disk metrics for /var/lib/docker/overlay2/.../merged: [Errno 13] Permission denied: '/var/lib/docker/overlay2/.../merged'
(disk.py:92) | Unable to get disk metrics for /var/lib/docker/containers/.../mounts/shm: [Errno 13] Permission denied: '/var/lib/docker/containers/.../mounts/shm'
(disk.py:92) | Unable to get disk metrics for /run/docker/netns/32c0f8a52d05: [Errno 13] Permission denied: '/run/docker/netns/32c0f8a52d05'

My configuration looks like this:

        instances:
          - url: "unix://var/run/docker.sock"
            use_mount: false
            new_tag_names: true
            mount_point_blacklist:
              - /var/lib/docker/(containers|overlay2)/./(shm|merged)
              - /run/docker/netns/*
              - /sys/kernel/debug/
              - /run/user/1000/
            excluded_filesystems:
              - tmpfs
              - none
              - shm
              - nsfs
              - netns
              - binfmt_misc
              - autofs
            excluded_mountpoint_re: (/var/lib/docker/.*|/run/docker/netns/.*)

I know some settings might be redundant but I hopelessly tried every option ...

@jschaf
Copy link

jschaf commented Oct 16, 2019

@pierot Your mount_point_blacklist regex is incorrect. You have a single . instead of allowing for multiple chars

/var/lib/docker/(containers|overlay2)/./(shm|merged)
# Should be:
/var/lib/docker/(containers|overlay2)/[0-9a-f]+/(shm|merged)

/run/docker/netns/*
# The `*` applies to the slash char.  I don't think you need a glob since it's a prefix match.
/run/docker/netns/

@pierot
Copy link

pierot commented Oct 16, 2019

@jschaf Thanks for looking into this. I altered my config but the warnings keep showing up ...

@irabinovitch
Copy link
Contributor

@pierot Are you running Datadog Agent v5 or v6? Have you had a chance to open a support ticket?

@pierot
Copy link

pierot commented Oct 24, 2019

@irabinovitch I am running v6 of the Datadog Agent. I'll open a support ticket and link to this issue.

@arizawan
Copy link

arizawan commented Nov 12, 2019

init_config: null
instances:

  • mount_point_blacklist:
    • /var/lib/docker/(containers|overlay2)/._/(shm|merged)
    • /run/docker/netns._
    • /sys/kernel/debug/
    • /run/user/1000/

I found this config is working :

init_config: null
instances:
    -   mount_point_blacklist:
            - /var/lib/docker.
            - /run/docker/netns.
            - /sys/kernel/debug/
            - /run/user/1000/

@pierot
Copy link

pierot commented Nov 12, 2019

@irabinovitch I had intensive contact with support. They were very helpful and got the ticket resolved.

This is my config now:

    datadog_checks:
      disk:
        init_config:
        instances:
          - use_mount: false
            file_system_blacklist:
              - tmpfs
              - none
              - shm
              - nsfs
              - netns
              - binfmt_misc
              - autofs
            mount_point_blacklist:
              - /var/lib/docker/(containers|overlay2)/
              - /run/docker/
              - /sys/kernel/debug/
              - /run/user/1000/
      docker:
        init_config:
        instances:
          - url: "unix://var/run/docker.sock"

So, the file_system_blacklist and mount_point_blacklist are meant to be placed under the disk checks, not docker. I missed that completely.

@ssbarnea
Copy link

ssbarnea commented Dec 7, 2019

I can confirm that @pierot patch is valid, mainly I needed /sys/kernel/debug/ for sure but here is a bigger question:

This bug is 3 years old and I do not see even one PR referencing it. I do not find acceptable to have to add a blacklist like /sys/kernel/debug/ which should be in the default blacklist, like many others.

One of the key datadog selling points was that it just works and that it saves you endless number of hours configuring probes.

I can understand that some items may require debates regarding if they should or not be in the default blacklist, but /sys/kernel/debug/ is clearly not among them.

@rosario-raulin
Copy link

rosario-raulin commented Jun 21, 2020

Sadly the blacklist configuration does not seem to work for me here. I have v7 of the Agent (on a Ubuntu host), use the following configuration in /etc/datadog-agent/conf.d/disk.d/conf.yaml:

init_config:

instances:

    ## @param use_mount - boolean - required
    ## Instruct the check to collect using mount points instead of volumes.
    #
  - use_mount: false
    file_system_blacklist:
      - tmpfs
      - none
      - shm
      - nsfs
      - netns
      - binfmt_misc
      - autofs
    mount_point_blacklist:
      - /var/lib/docker/(containers|overlay2)/
      - /run/docker/
      - /sys/kernel/debug/
      - /run/user/1000/

Using this I still get a lot of useless log messages like this:

Jun 21 09:07:26 hvg-srv01 agent[10800]: 2020-06-21 09:07:26 CEST | CORE | WARN | (pkg/collector/python/datadog_agent.go:118 in LogMessage) | disk:5cac80270f8a1ff5 | (disk.py:77) | Unable to get disk metrics for /var/lib/docker/overlay2/2318664abb85bf435a53334fb671366b3ff798a99771a49738548a05d0f76654/merged: [Errno 13] Permission denied: '/var/lib/docker/overlay2/2318664abb85bf435a53334fb671366b3ff798a99771a49738548a05d0f76654/merged'

@njhallett
Copy link

@rosario-raulin
add - overlay to the file_system_blacklist

@squillace91
Copy link

squillace91 commented Aug 12, 2020

I was able to exclude those logs by adding into /etc/datadog-agent/conf.d/disk.d/conf.yaml.default. At first I tried to create a file conf.yaml from the default and editing there but that didn't work.

What I did:

sudo nano /etc/datadog-agent/conf.d/disk.d/conf.yaml.default
// add tracefs to file_system_blacklist
sudo systemctl restart datadog-agent

@mx-psi
Copy link
Member

mx-psi commented Aug 21, 2020

Hi all, the next minor version of the Agent (7.23) will include a new option on the disk check (include_all_devices) to help address this issue.

If include_all_devices is set to false the check will exclude devices from pseudo file systems such as tmpfs, overlay, tracefs or debugfs. You can find more details on DataDog/integrations-core#7378. This option is set to true by default to preserve backwards compatibility.

I am closing this issue, please raise any further issues on the integrations-core repo.

@belambert
Copy link

I think what's so confusing here is what @squillace91 said. Just adding a conf.yaml file doen't change the settings at all. You have to either edit the conf.yaml.default file, or create a conf.yaml file AND delete the defaults file. Is this strange behavior documented somewhere?

@outloudvi
Copy link

I guess my configuration should work as expected (from datadog-agent configcheck):

=== disk check ===
Configuration provider: file
Configuration source: file:/etc/datadog-agent/conf.d/disk.d/conf.yaml
Instance ID: disk:[redacted]
file_system_exclude:
- tmpfs$
- rootfs$
- autofs$
- overlay
- nsfs
- tracefs
include_all_devices: false
use_mount: false
~
===

However, I was still getting something like:

CORE | WARN | (pkg/collector/corechecks/system/disk/disk_nix.go:69 in collectPartitionMetrics) | Unable to get disk metrics of /run/docker/netns/[redacted] mount point: permission denied

It seems to be a warning from disk_nix.go, which I assume it to be this disk_nix.go, rather than disk.py (Had dd-agent been rewritten in Go?). From that file, I didn't see file_system_exclude, but a similar configuration, excludedFilesystems, or excluded_filesystems, and the configuration is straightly ignored by disk_nix.go (I failed to find any useful reference to it).

@tristanpemble
Copy link

@outloudvi you are in the repository for the v5 dd-agent. the issue you are running into is on v6/v7 datadog-agent package, which is a different repo. I had to dig pretty deep but I found the bug in the Go code is the result of a bad type cast when unmarshalling the config. I submitted a fix here

I didn't see file_system_exclude, but a similar configuration, excludedFilesystems, or excluded_filesystems, and the configuration is straightly ignored by disk_nix.go (I failed to find any useful reference to it).

fyi the check in disk_nix.go is here, which calls out to here, and the excluded_filesystems value is checked here.

@outloudvi
Copy link

the issue you are running into is on v6/v7 datadog-agent package, which is a different repo.

Got it. Thanks for your info and PR, @tristanpemble!

@tristanpemble
Copy link

@outloudvi as an interim work-around, I found the excluded_mountpoint_re option currently works with a regex provided. I set to ^(/var/lib/docker|/run/docker/netns)/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests