Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sudo for privileged process checks on filedescriptors (take 2) #1235

Merged
merged 5 commits into from
May 7, 2018
Merged

Use sudo for privileged process checks on filedescriptors (take 2) #1235

merged 5 commits into from
May 7, 2018

Conversation

pdecat
Copy link
Contributor

@pdecat pdecat commented Mar 12, 2018

Note: re-submission of #715 that was merged then reverted.

What does this PR do?

Repackaged the patch I suggested at DataDog/dd-agent#2033 (comment) in a dd-agent-omnibus compatible way (no additional python file, only process/check.py is modified).

Motivation

Datadog does not provide out of box support for monitoring file descriptors used by processes not running under the same user (dd-agent by default).

The suggested work-around of running the datadog agent as root is explicitly not recommended which is probably reasonable: https://help.datadoghq.com/hc/en-us/articles/115000066506-Why-don-t-I-see-the-system-processes-open-file-descriptors-metric-

Additional Notes

This requires the addition of the following sudo policies in the deb and rpm packages:

dd-agent ALL=NOPASSWD: /opt/datadog-agent/embedded/bin/python /opt/datadog-agent/agent/checks.d/process.py num_fds *
dd-agent ALL=NOPASSWD: /opt/datadog-agent/embedded/bin/python /opt/datadog-agent/agent/checks.d/process.pyc num_fds *

Usage of sudo generates frequent logging in /var/log/auth.log. That may be silenced with additional PAM rules in /etc/pam.d/sudo if desired.

Resolves DataDog/dd-agent#2033

The same may be done for the io_counters method.

edit: link

Copy link
Contributor

@ChristineTChen ChristineTChen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pdecat! Thanks for your contribution. Apologies about the late response, I didn't see your previous comment on the first PR. But during testing, I noticed that when running the process check with try_sudo, the open_file_descriptor metric isn't being sent. The collector.log has an error message:

2018-02-09 20:14:38 UTC | ERROR | dd.collector | checks.process(process.py:218) | running psutil method num_fds with sudo failed with return code 1
Traceback (most recent call last):
  File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/process/process.py", line 216, in psutil_wrapper
    result = int(subprocess.check_output(['sudo', sys.executable, __file__, method, str(process.pid)]))
  File "/opt/datadog-agent/embedded/lib/python2.7/subprocess.py", line 219, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
CalledProcessError: Command '['sudo', '/opt/datadog-agent/embedded/bin/python', '/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/process/process.pyc', 'num_fds', '558']' returned non-zero exit status 1

@pdecat
Copy link
Contributor Author

pdecat commented Mar 12, 2018

Hi @ChristineTChen, according to the path in the error message, it looks like the sudoers rules need to be adapted as following:

dd-agent ALL=NOPASSWD: /opt/datadog-agent/embedded/bin/python /opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/process/process.py num_fds *
dd-agent ALL=NOPASSWD: /opt/datadog-agent/embedded/bin/python /opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/process/process.pyc num_fds *

Copy link
Contributor

@ChristineTChen ChristineTChen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just 2 small things to change.

I also tried testing this locally with the updated sudoers rules and wasn't able to successfully collect the open_file_descriptors metric. Could you give a quick rundown of how you've tested this?

# 3p
import psutil


# Main entry point is meant for checks needing privilege escalation with sudo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main entry point needs to be moved after the import statements (see Travis CI check)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, fixed.

As of now, only the `open_fd` metric on Unix platforms is taking advantage of this setting.
Note: the appropriate sudoers rules have to be configured for this to work, e.g. if packaged with the datadog agent:
```
dd-agent ALL=NOPASSWD: /opt/datadog-agent/embedded/bin/python /opt/datadog-agent/agent/checks.d/process.py num_fds *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two sudoers rules work for agents before v.5.22 and v.6.0. We should include the sudoers rules for wheels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation updated to reflect that.

@pdecat
Copy link
Contributor Author

pdecat commented Mar 12, 2018

These changes are currently applied to version 1:5.17.2-1 of the datadog agent installed on debian 9 from the http://apt.datadoghq.com/ repository.

The sudoers rules are put into a /etc/sudoers.d/datadog-agent-sudoers file owned by user root, group root and with 0440 permissions.

Here's the Ansible task I use to apply these changes after the agent installation:

---
- name: "DataDog: add /etc/sudoers.d/datadog-agent-sudoers file"
  copy:
    src: "datadog-agent-sudoers"
    dest: "/etc/sudoers.d/datadog-agent-sudoers"
    owner: root
    group: root
    mode: 0440

- name: "DataDog: patch /opt/datadog-agent/agent/checks.d/process.py"
  patch:
    src: process.py.patch
    dest: /opt/datadog-agent/agent/checks.d/process.py

The process.py.patch file:

--- /opt/datadog-agent/agent/checks.d/process.py.orig   2017-07-17 15:50:07.000000000 +0000
+++ /opt/datadog-agent/agent/checks.d/process.py        2017-08-30 07:32:08.050402425 +0000
@@ -7,9 +7,18 @@
 from collections import defaultdict
 import time
 import os
+import subprocess
+import sys
 # 3p
 import psutil

+
+# Main entry point is meant for checks needing privilege escalation with sudo
+if __name__ == "__main__":
+    if sys.argv[1] == "num_fds":
+        print psutil.Process(int(sys.argv[2])).num_fds()
+    sys.exit(0)
+
 # project
 from checks import AgentCheck
 from config import _is_affirmative
@@ -200,6 +209,14 @@
             self.log.debug("psutil method %s not implemented", method)
         except psutil.AccessDenied:
             self.log.debug("psutil was denied acccess for method %s", method)
+            if method == 'num_fds' and Platform.is_unix():
+                try:
+                    # It is up the agent's packager to grant corresponding sudo policy on unix platforms
+                    result = int(subprocess.check_output(['sudo', sys.executable, __file__, method, str(process.pid)]))
+                except subprocess.CalledProcessError as e:
+                    self.log.exception("running psutil method %s with sudo failed with return code %d", method, e.returncode)
+                except:
+                    self.log.exception("running psutil method %s with sudo also failed", method)
         except psutil.NoSuchProcess:
             self.warning("Process {0} disappeared while scanning".format(process.pid))

The datadog-agent-sudoers file:

dd-agent ALL=NOPASSWD: /opt/datadog-agent/embedded/bin/python /opt/datadog-agent/agent/checks.d/process.py num_fds *
dd-agent ALL=NOPASSWD: /opt/datadog-agent/embedded/bin/python /opt/datadog-agent/agent/checks.d/process.pyc num_fds *

@ChristineTChen do you see any related errors in /var/log/auth.log?

@ChristineTChen
Copy link
Contributor

ChristineTChen commented Mar 14, 2018

I set up a new vagrant VM, using Agent v6.0.3, and overwrote the process.py with the one on this PR.
The following is what's in the /var/log/auth.log file from after updating with the sudoers rules & running datadog-agent check process.

Mar 14 22:16:42 vagrant-ubuntu-trusty-64 sudo:  vagrant : TTY=pts/0 ; PWD=/home/vagrant ; USER=root ; COMMAND=/usr/sbin/visudo
Mar 14 22:16:42 vagrant-ubuntu-trusty-64 sudo: pam_unix(sudo:session): session opened for user root by vagrant(uid=0)
Mar 14 22:16:57 vagrant-ubuntu-trusty-64 sudo: pam_unix(sudo:session): session closed for user root
Mar 14 22:17:01 vagrant-ubuntu-trusty-64 CRON[3136]: pam_unix(cron:session): session opened for user root by (uid=0)
Mar 14 22:17:01 vagrant-ubuntu-trusty-64 CRON[3136]: pam_unix(cron:session): session closed for user root

I will test this with the Agent5 and update you.
Thank you!

@pdecat
Copy link
Contributor Author

pdecat commented Mar 15, 2018

@ChristineTChen I forgot to say that I also add some configuration to /etc/dd-agent/conf.d/process.yaml to actually collect the metrics for some processes.

For example:

init_config:
  # the check will refresh the matching pid list every X seconds
  # except if it detects a change before. You might want to set it
  # low if you want to alert on process service checks.
  # pid_cache_duration: 120
  #
  # used to override the default procfs path, e.g. for docker containers with the outside fs mounted at /host/proc
  # DEPRECATED: please specify `procfs_path` globally in `datadog.conf` instead
  # procfs_path: /proc

instances:
- name: datastax-agent
  search_string: ['agent-pidfile=/var/run/datastax-agent/datastax-agent.pid']
  exact_match: False
- name: datastax-server
  search_string: ['-Ddse.server_process']
  exact_match: False

Note: this example works with the patch from #1235 (comment) that did not include the try_sudo logic.

metric explorer datadog - google chrome_001

@ChristineTChen
Copy link
Contributor

ChristineTChen commented Apr 13, 2018

Hi @pdecat! Apologies for the delayed response, we appreciate your patience on this issue.

I have consulted with my teammates about why this patch hasn't been working for the latest Agent. We have determined that this patch is not compatible with the Agent 6 because the sys.executable variable may be set to the system python executable with Agent6, and we don't think the Agent should guarantee access to an executable of the embedded python from Agent checks.

However, we've come up with two possible solutions to avoid running
the process check. We could either

  1. run lsof -p <PID> as a subprocess and parse the output or,

  2. run ls /proc/<PID>/fd/ as a subprocess and count the
    number of file descriptors

Please let me know what your thoughts are on these alternatives.

Christine

@pdecat
Copy link
Contributor Author

pdecat commented Apr 16, 2018

Hi Christine,

I'm willing to implement all necessary changes but as indicated in my original PR #715 (comment), allowing sudoers with wildcards is a security risk if not restrained properly.
The current implementation does that by only accepting a process id and not anything else.

Would it be possible to package a script that does the same thing the along the omnibus packaged Agent 6?

We have determined that this patch is not compatible with the Agent 6 because the sys.executable variable may be set to the system python executable with Agent6

While I did not check the actual behavior in the context of Agent 6, I did some initial tests of the github.com/sbinet/go-python lib used by Agent 6 and it seemed ok: #715 (comment)

I probably need to test that part further as I may have misunderstood the behavior of go-python.

Regards,
Patrick.

Copy link
Contributor

@ofek ofek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for your patience with this contribution!

@ofek ofek merged commit ca0ad36 into DataDog:master May 7, 2018
@pdecat pdecat deleted the f-process-check-with-sudo branch June 13, 2018 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants