Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logic for detection of processes in docker containers seems wrong #80

Closed
odenbach opened this issue Sep 26, 2017 · 6 comments
Closed

Logic for detection of processes in docker containers seems wrong #80

odenbach opened this issue Sep 26, 2017 · 6 comments
Assignees
Labels
Milestone

Comments

@odenbach
Copy link
Contributor

Hi,

we have a number of debian machines running docker containers. On all of these machines needrestart keeps complaining about the usage of obsolete files even if I restart the containers or even the whole machine.

After debugging the code I found the following logic being used:

  • walk through the process list one by one
  • find all obsolete files being in use by finding all used files and then looking for these pathnames in the file system. If a file name does not exist (any more) it is obsolete and the process must be restarted
  • check if the process belongs to a container. if it does, report the container instead of the process.

This logic misses an important fact: If a process belongs to a container, it uses a different root directory which can be found by looking at the process' cgroup (and the docker config of course).

So currently NR reports

[main] #6722 uses obsolete /usr/local/bin/node

The file /usr/local/bin/node does not exist indeed. But NR should look for the file

/var/lib/docker/aufs/mnt/8b581d3068682ebf7ed7f45298b78a563192446aa1e4cbd2bd0bfc4c4d42f917/usr/local/bin/node

which DOES exist - and is older than the process start time, so there is nothing to do.

So my guess would be that the logic should be changed in the following way:

for each process:

  • find out whether the process is part of a docker container
  • if it is, find out its root directory
  • check the file age against the process age
  • recommend a process/service/container restart if required.

Am I right?

Thanks,

Christopher

@liske liske added bug moreinfo and removed bug labels Oct 1, 2017
@liske
Copy link
Owner

liske commented Oct 1, 2017

I don't think there is any need to restart processes within docker containers by design (read: within a container running a single app there should never be outdated binaries... updates are handled by new docker images).

For the binary path needrestart only checks if the symlink target of /proc/$pid/exe has a (deleted) suffix.

needrestart looks for mapped files at /proc/$pid/map_files/$maddr and /proc/$PID/root/$path for any mapped libraries which is name space aware. This can be disabled by setting the skip_mapfiles option to 1.

Both checks should not match on docker containers... and I'm not able to reproduce your behavior. Which version of needrestart did you use?

@odenbach
Copy link
Contributor Author

odenbach commented Oct 4, 2017

I agree that there is no need to restart processes within docker containers.

This is Debian jessie, needrestart in version 2.11-2~bpo8+1. I have a docker container running mysqld. The process list shows as:

root@delgado[~]# ps -ef | grep mysqld
999 6694 6626 0 Sep26 ? 00:03:09 mysqld

root@delgado[~]# cd /proc/6694
root@delgado[6694]# cat cgroup
8:perf_event:/docker/09319adbf8a79a54f3bed5a151fee36ccf627be1abed6b657c04bf27867ef770
7:blkio:/docker/09319adbf8a79a54f3bed5a151fee36ccf627be1abed6b657c04bf27867ef770
6:net_cls,net_prio:/docker/09319adbf8a79a54f3bed5a151fee36ccf627be1abed6b657c04bf27867ef770
5:freezer:/docker/09319adbf8a79a54f3bed5a151fee36ccf627be1abed6b657c04bf27867ef770
4:devices:/docker/09319adbf8a79a54f3bed5a151fee36ccf627be1abed6b657c04bf27867ef770
3:cpu,cpuacct:/docker/09319adbf8a79a54f3bed5a151fee36ccf627be1abed6b657c04bf27867ef770
2:cpuset:/docker/09319adbf8a79a54f3bed5a151fee36ccf627be1abed6b657c04bf27867ef770
1:name=systemd:/docker/09319adbf8a79a54f3bed5a151fee36ccf627be1abed6b657c04bf27867ef770

root@delgado[6694]# ls -l exe
lrwxrwxrwx 1 999 winfo2-externe 0 Sep 26 17:43 exe -> /usr/sbin/mysqld

There is no (deleted) suffix, so no need to check. But:

root@delgado[~]# needrestart -vv
[main] eval /etc/needrestart/needrestart.conf
[main] needrestart v2.11
[main] running in root mode
[Core] Using UI 'NeedRestart::UI::stdio'...
[main] detected systemd
[Core] #555 is a NeedRestart::Interp::Perl
[Perl] #555: source=/usr/sbin/nsce
[main] #6694 uses obsolete /usr/sbin/mysqld
[main] #6694 is a child of #6626
[main] #6721 uses obsolete /bin/busybox
[main] #6721 is a child of #6632
[main] #6722 uses obsolete /usr/local/bin/node
[main] #6722 is a child of #6630
[main] #6741 uses obsolete /usr/local/bin/node
[main] #6741 is a child of #6678
[main] #6757 uses obsolete /usr/local/bin/redis-server
[main] #6757 is a child of #6667
[main] #6769 uses obsolete /usr/bin/python2.7
[main] #6769 is a child of #6649
[main] #7057 uses obsolete /bin/busybox
[main] #7057 is a child of #6722
[main] #7058 uses obsolete /usr/local/bin/node
[main] #7058 is a child of #7057
[main] #7064 uses obsolete /bin/busybox
[main] #7064 is a child of #6741
[main] #7065 uses obsolete /usr/local/bin/node
[main] #7065 is a child of #7064
[main] #7071 uses obsolete /usr/local/bin/node
[main] #7071 is a child of #7065
[main] #7098 uses obsolete /bin/bash
[main] #7098 is a child of #6769
[main] #7099 uses obsolete /usr/sbin/nginx
[main] #7099 is a child of #6769
[main] #7100 uses obsolete /usr/local/sbin/php-fpm
[main] #7100 is a child of #6769
[main] #7101 uses obsolete /usr/sbin/nginx
[main] #7101 is a child of #7099
[main] #7102 uses obsolete /usr/local/sbin/php-fpm
[main] #7102 is a child of #7100
[main] #7103 uses obsolete /usr/local/sbin/php-fpm
[main] #7103 is a child of #7100
[main] #6626 exe => /usr/bin/docker-containerd-shim
[main] #6626 is docker.service
[main] #6630 exe => /usr/bin/docker-containerd-shim
[main] #6630 is docker.service
[main] #6632 exe => /usr/bin/docker-containerd-shim
[main] #6632 is docker.service
[main] #6649 exe => /usr/bin/docker-containerd-shim
[main] #6649 is docker.service
[main] #6667 exe => /usr/bin/docker-containerd-shim
[main] #6667 is docker.service
[main] #6678 exe => /usr/bin/docker-containerd-shim
[main] #6678 is docker.service
[main] #6722 exe => /usr/local/bin/node
[main] #6722 unexpected cgroup '/docker/65da8e2e41306327ad5196a78396653b60c2b816b2158c1bec96b43a4fdd801a'
[main] trying systemctl status
Failed to get unit for PID 6722: PID 6722 does not belong to any loaded unit.
[main] #6722 running /etc/needrestart/hook.d/10-dpkg
dpkg-query: no path found matching pattern /usr/local/bin/node
[main] #6722 running /etc/needrestart/hook.d/20-rpm
[main] #6722 running /etc/needrestart/hook.d/90-none
[main] #6741 exe => /usr/local/bin/node
[main] #6741 unexpected cgroup '/docker/99d79f2e99daf971c8c16d1873c4882b6f5ac3e85b45584cc7c23c7d697eaa3d'
[main] trying systemctl status
Failed to get unit for PID 6741: PID 6741 does not belong to any loaded unit.
[main] #6741 running /etc/needrestart/hook.d/10-dpkg
dpkg-query: no path found matching pattern /usr/local/bin/node
[main] #6741 running /etc/needrestart/hook.d/20-rpm
[main] #6741 running /etc/needrestart/hook.d/90-none
[main] #6769 exe => /usr/bin/python2.7
[Core] #6769 is a NeedRestart::Interp::Python
[Python] #6769: source file not found, skipping
[Python] #6769: reduced ARGV: /usr/bin/supervisord -c /etc/supervisord.conf
[Core] #6769 source is UNKNOWN
[main] #6769 unexpected cgroup '/docker/81a7563f17650c4dacb17a9b13884b392f4cb19e722c9a6daf61905a412bc20f'
[main] trying systemctl status
Failed to get unit for PID 6769: PID 6769 does not belong to any loaded unit.
[main] #6769 running /etc/needrestart/hook.d/10-dpkg
[main] #6769 package: python2.7-minimal
[main] #6769 running /etc/needrestart/hook.d/20-rpm
[main] #6769 running /etc/needrestart/hook.d/90-none
[main] #7057 exe => /bin/busybox
[main] #7057 unexpected cgroup '/docker/65da8e2e41306327ad5196a78396653b60c2b816b2158c1bec96b43a4fdd801a'
[main] trying systemctl status
Failed to get unit for PID 7057: PID 7057 does not belong to any loaded unit.
[main] #7057 running /etc/needrestart/hook.d/10-dpkg
[main] #7057 package: busybox
[main] #7057 running /etc/needrestart/hook.d/20-rpm
[main] #7057 running /etc/needrestart/hook.d/90-none
[main] #7064 exe => /bin/busybox
[main] #7064 unexpected cgroup '/docker/99d79f2e99daf971c8c16d1873c4882b6f5ac3e85b45584cc7c23c7d697eaa3d'
[main] trying systemctl status
Failed to get unit for PID 7064: PID 7064 does not belong to any loaded unit.
[main] #7064 running /etc/needrestart/hook.d/10-dpkg
[main] #7064 package: busybox
[main] #7064 running /etc/needrestart/hook.d/20-rpm
[main] #7064 running /etc/needrestart/hook.d/90-none
[main] #7065 exe => /usr/local/bin/node
[main] #7065 unexpected cgroup '/docker/99d79f2e99daf971c8c16d1873c4882b6f5ac3e85b45584cc7c23c7d697eaa3d'
[main] trying systemctl status
Failed to get unit for PID 7065: PID 7065 does not belong to any loaded unit.
[main] #7065 running /etc/needrestart/hook.d/10-dpkg
dpkg-query: no path found matching pattern /usr/local/bin/node
[main] #7065 running /etc/needrestart/hook.d/20-rpm
[main] #7065 running /etc/needrestart/hook.d/90-none
[main] #7099 exe => /usr/sbin/nginx
[main] #7099 unexpected cgroup '/docker/81a7563f17650c4dacb17a9b13884b392f4cb19e722c9a6daf61905a412bc20f'
[main] trying systemctl status
Failed to get unit for PID 7099: PID 7099 does not belong to any loaded unit.
[main] #7099 running /etc/needrestart/hook.d/10-dpkg
dpkg-query: no path found matching pattern /usr/sbin/nginx
[main] #7099 running /etc/needrestart/hook.d/20-rpm
[main] #7099 running /etc/needrestart/hook.d/90-none
[main] #7099 package: nginx
[main] no pidfile reference found at nginx
[main] #7100 exe => /usr/local/sbin/php-fpm
[main] #7100 unexpected cgroup '/docker/81a7563f17650c4dacb17a9b13884b392f4cb19e722c9a6daf61905a412bc20f'
[main] trying systemctl status
Failed to get unit for PID 7100: PID 7100 does not belong to any loaded unit.
[main] #7100 running /etc/needrestart/hook.d/10-dpkg
dpkg-query: no path found matching pattern /usr/local/sbin/php-fpm
[main] #7100 running /etc/needrestart/hook.d/20-rpm
[main] #7100 running /etc/needrestart/hook.d/90-none
[Kernel] Linux: kernel release 3.16.0-4-amd64, kernel version #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19)
[Kernel/Linux] /boot/vmlinuz-3.16.0-4-amd64 => 3.16.0-4-amd64 (debian-kernel@lists.debian.org) #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) [3.16.0-4-amd64]*
[Kernel/Linux] Expected linux version: 3.16.0-4-amd64
Running kernel seems to be up-to-date.
Services to be restarted:
systemctl restart docker.service
systemctl restart nginx.service
No containers need to be restarted.
No user sessions are running outdated binaries.

This looks to me as if NR were not able to get the docker image for a running process, maybe because the kernel is too old?

I tried patching NR and adding some additional debug output, which results in this output:

root@delgado[needrestart]# perl -I perl/lib needrestart -vv
[main] eval /etc/needrestart/needrestart.conf
[main] needrestart v2.11
[main] running in root mode
[Core] Using UI 'NeedRestart::UI::stdio'...
[main] detected systemd
Bla
Bla
Bla
Bla
Bla
Bla
Bla
Bla
Bla
[Core] #555 is a NeedRestart::Interp::Perl
[Perl] #555: source=/usr/sbin/nsce
Bla
Bla
Bla
Bla
Bla
Bla
Bla
Bla
Bla
Bla
Bla
Bla
Bla
Bla
Bla
Bla
Bla
Bla
Bla
[Core] #4754 is a NeedRestart::Interp::Perl
[Perl] #4754: could not get a source file, skipping
Bla
[Core] #4755 is a NeedRestart::Interp::Perl
[Perl] #4755: could not get a source file, skipping
Bla
Bla
Bla
Bla
[Core] #4780 is a NeedRestart::Interp::Perl
[Perl] #4780: could not get a source file, skipping
Bla
Bla
Bla
Bla
Bla
Bla
Bla
Bla
Bla
Bla
Bla
[main] #6694 uses obsolete /usr/sbin/mysqld
Bla
checking NeedRestart::CONT::docker=HASH(0x21eb430) for 6694
docker ok
[docker] #6694 is part of docker container '09319adbf8a7' and should be restarted
[main] #6721 uses obsolete /bin/busybox
Bla
checking NeedRestart::CONT::docker=HASH(0x21eb430) for 6721
docker ok
[docker] #6721 is part of docker container '50fa807c0a63' and should be restarted
[main] #6722 uses obsolete /usr/local/bin/node
Bla
checking NeedRestart::CONT::docker=HASH(0x21eb430) for 6722
docker ok
[docker] #6722 is part of docker container '65da8e2e4130' and should be restarted
[main] #6741 uses obsolete /usr/local/bin/node
Bla
checking NeedRestart::CONT::docker=HASH(0x21eb430) for 6741
docker ok
[docker] #6741 is part of docker container '99d79f2e99da' and should be restarted
[main] #6757 uses obsolete /usr/local/bin/redis-server
Bla
checking NeedRestart::CONT::docker=HASH(0x21eb430) for 6757
docker ok
[docker] #6757 is part of docker container 'a6758164bfb5' and should be restarted
[main] #6769 uses obsolete /usr/bin/python2.7
Bla
checking NeedRestart::CONT::docker=HASH(0x21eb430) for 6769
docker ok
[docker] #6769 is part of docker container '81a7563f1765' and should be restarted
[main] #7057 uses obsolete /bin/busybox
Bla
checking NeedRestart::CONT::docker=HASH(0x21eb430) for 7057
docker ok
[docker] #7057 is part of docker container '65da8e2e4130' and should be restarted
[main] #7058 uses obsolete /usr/local/bin/node
Bla
checking NeedRestart::CONT::docker=HASH(0x21eb430) for 7058
docker ok
[docker] #7058 is part of docker container '65da8e2e4130' and should be restarted
[main] #7064 uses obsolete /bin/busybox
Bla
checking NeedRestart::CONT::docker=HASH(0x21eb430) for 7064
docker ok
[docker] #7064 is part of docker container '99d79f2e99da' and should be restarted
[main] #7065 uses obsolete /usr/local/bin/node
Bla
checking NeedRestart::CONT::docker=HASH(0x21eb430) for 7065
docker ok
[docker] #7065 is part of docker container '99d79f2e99da' and should be restarted
[main] #7071 uses obsolete /usr/local/bin/node
Bla
checking NeedRestart::CONT::docker=HASH(0x21eb430) for 7071
docker ok
[docker] #7071 is part of docker container '99d79f2e99da' and should be restarted
[main] #7098 uses obsolete /bin/bash
Bla
checking NeedRestart::CONT::docker=HASH(0x21eb430) for 7098
docker ok
[docker] #7098 is part of docker container '81a7563f1765' and should be restarted
[main] #7099 uses obsolete /usr/sbin/nginx
Bla
checking NeedRestart::CONT::docker=HASH(0x21eb430) for 7099
docker ok
[docker] #7099 is part of docker container '81a7563f1765' and should be restarted
[main] #7100 uses obsolete /usr/local/sbin/php-fpm
Bla
checking NeedRestart::CONT::docker=HASH(0x21eb430) for 7100
docker ok
[docker] #7100 is part of docker container '81a7563f1765' and should be restarted
[main] #7101 uses obsolete /usr/sbin/nginx
Bla
checking NeedRestart::CONT::docker=HASH(0x21eb430) for 7101
docker ok
[docker] #7101 is part of docker container '81a7563f1765' and should be restarted
[main] #7102 uses obsolete /usr/local/sbin/php-fpm
Bla
checking NeedRestart::CONT::docker=HASH(0x21eb430) for 7102
docker ok
[docker] #7102 is part of docker container '81a7563f1765' and should be restarted
[main] #7103 uses obsolete /usr/local/sbin/php-fpm
Bla
checking NeedRestart::CONT::docker=HASH(0x21eb430) for 7103
docker ok
[docker] #7103 is part of docker container '81a7563f1765' and should be restarted
checking NeedRestart::CONT::docker=HASH(0x21eb430) for 1
checking NeedRestart::CONT::machined=HASH(0x21695c8) for 1
checking NeedRestart::CONT::LXC=HASH(0x224ac90) for 1
[Kernel] Linux: kernel release 3.16.0-4-amd64, kernel version #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19)
[Kernel/Linux] /boot/vmlinuz-3.16.0-4-amd64 => 3.16.0-4-amd64 (debian-kernel@lists.debian.org) #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) [3.16.0-4-amd64]*
[Kernel/Linux] Expected linux version: 3.16.0-4-amd64
Running kernel seems to be up-to-date.
No services need to be restarted.
Containers to be restarted:
docker restart 09319adbf8a7
docker restart 50fa807c0a63
docker restart 65da8e2e4130
docker restart 81a7563f1765
docker restart 99d79f2e99da
docker restart a6758164bfb5
No user sessions are running outdated binaries.

This time the correct containers are detected, but still NR supposes a restart.

@liske liske added bug and removed moreinfo labels Oct 15, 2017
@liske
Copy link
Owner

liske commented Oct 15, 2017

Thanks for the details. I think the docker stuff in needrestart is broken by design. It was derived from the LXC module since LXC containers can run outdated binaries. We had just agreed that there is no need to restart docker containers after software updates (one would need to update the docker image and restart the container or services after that - but that is beyond the scope of needrestart).

I'm going to change needrestart to ignore any process running within a docker container namespace.

@liske liske self-assigned this Oct 15, 2017
@liske liske closed this as completed in 678f200 Oct 29, 2017
@odenbach
Copy link
Contributor Author

odenbach commented Nov 7, 2017

Hi,

just checked your commit. If I run the new master with option "-v" everything is fine:

root@libai[needrestart]# perl -Iperl/lib needrestart -v
[main] eval /etc/needrestart/needrestart.conf
[main] needrestart v2.12
[main] running in root mode
[...]
[main] #19336 uses obsolete /usr/local/bin/node
[docker] #19336 is part of docker container 'a80442a592656c266a21212ee196675cedf9bfd9a352cc2e8ff6f4a40b104f68' and will be ignored
[...]
No services need to be restarted.
No containers need to be restarted.
No user sessions are running outdated binaries.

But if I run it without the '-v' switch, I get

root@libai[needrestart]# perl -Iperl/lib needrestart
Scanning processes...
Scanning candidates...
Scanning linux images...
Running kernel seems to be up-to-date.
Services to be restarted:
systemctl restart docker.service
No containers need to be restarted.
No user sessions are running outdated binaries.

Something must still be br0ken.

Cheers,

Christopher

@liske liske reopened this Nov 12, 2017
@liske
Copy link
Owner

liske commented Dec 19, 2017

I was not able to reproduce this behavior. With needrestart 2.11-3 (Debian stretch) it tries to restart docker.service. With git HEAD the container is listed as ignored and docker.service is not suggested to be restarted.

Might it be just a problem of the perl include path?

@liske liske added this to the v2.12 milestone Dec 22, 2017
@liske
Copy link
Owner

liske commented Feb 3, 2018

This might be triggered by the debconf perl package which does fork and executes a new needrestart process which seems to use the wrong version of the needrestart packages.

@liske liske closed this as completed Feb 3, 2018
@idl0r idl0r mentioned this issue Sep 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants