Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debian 9.8/10 timing out on audits #22

Closed
mattray opened this issue Feb 28, 2019 · 7 comments · Fixed by #28
Closed

Debian 9.8/10 timing out on audits #22

mattray opened this issue Feb 28, 2019 · 7 comments · Fixed by #28

Comments

@mattray
Copy link
Contributor

mattray commented Feb 28, 2019

Describe the bug
I'm using this profile in a testing environment and it was working the last 2 months but it recently stopped working through the Chef Audit cookbook. Other similar Debian machines don't seem to recreate the issue in Kitchen, so I'm looking for any suggestions for debugging.

Expected behavior
The machine in question worked for weeks, then stopped. Manual verification and running "inspec exec" by hand don't recreate the issue.

Actual behavior

From the Automate 2 UI:

Command timed out after 600s:
Command exceeded allowed execution time, process terminated
---- Begin output of bash -c \#\!/bin/sh'
'DEBIAN_FRONTEND\=noninteractive\ apt-get\ update\ \>/dev/null\ 2\>\&1'
'readlock\(\)\ \{\ cat\ /proc/locks\ \|\ awk\ \'\{print\ \$5\}\'\ \|\ grep\ -v\ \^0\ \|\ xargs\ -I\ \{1\}\ find\ /proc/\{1\}/fd\ -maxdepth\ 1\ -exec\ readlink\ \{\}\ \\\;\ \|\ grep\ \'\^/var/lib/dpkg/lock\$\'\;\ \}'
'while\ test\ -n\ \"\$\(readlock\)\"\;\ do\ sleep\ 1\;\ done'
'echo\ \"\ \"'
'echo\ -n\ \'\{\"available\":\[\''
'DEBIAN_FRONTEND\=noninteractive\ apt-get\ upgrade\ --dry-run\ \|\ grep\ Inst\ \|\ tr\ -d\ \'\[\]\(\)\'\ \|\\'
'\ \ awk\ \'\{\ printf\ \"\{\\\"name\\\":\\\"\"\$2\"\\\",\\\"version\\\":\\\"\"\$4\"\\\",\\\"repo\\\":\\\"\"\$5\"\\\",\\\"arch\\\":\\\"\"\$6\"\\\"\},\"\ \}\'\ \|\ rev\ \|\ cut\ -c\ 2-\ \|\ rev\ \|\ tr\ -d\ \'\\n\''
'echo\ -n\ \'\]\}\''
' ----
STDOUT: 
STDERR: 
---- End output of bash -c \#\!/bin/sh'
'DEBIAN_FRONTEND\=noninteractive\ apt-get\ update\ \>/dev/null\ 2\>\&1'
'readlock\(\)\ \{\ cat\ /proc/locks\ \|\ awk\ \'\{print\ \$5\}\'\ \|\ grep\ -v\ \^0\ \|\ xargs\ -I\ \{1\}\ find\ /proc/\{1\}/fd\ -maxdepth\ 1\ -exec\ readlink\ \{\}\ \\\;\ \|\ grep\ \'\^/var/lib/dpkg/lock\$\'\;\ \}'
'while\ test\ -n\ \"\$\(readlock\)\"\;\ do\ sleep\ 1\;\ done'
'echo\ \"\ \"'
'echo\ -n\ \'\{\"available\":\[\''
'DEBIAN_FRONTEND\=noninteractive\ apt-get\ upgrade\ --dry-run\ \|\ grep\ Inst\ \|\ tr\ -d\ \'\[\]\(\)\'\ \|\\'
'\ \ awk\ \'\{\ printf\ \"\{\\\"name\\\":\\\"\"\$2\"\\\",\\\"version\\\":\\\"\"\$4\"\\\",\\\"repo\\\":\\\"\"\$5\"\\\",\\\"arch\\\":\\\"\"\$6\"\\\"\},\"\ \}\'\ \|\ rev\ \|\ cut\ -c\ 2-\ \|\ rev\ \|\ tr\ -d\ \'\\n\''
'echo\ -n\ \'\]\}\''
' ----
Ran bash -c \#\!/bin/sh'
'DEBIAN_FRONTEND\=noninteractive\ apt-get\ update\ \>/dev/null\ 2\>\&1'
'readlock\(\)\ \{\ cat\ /proc/locks\ \|\ awk\ \'\{print\ \$5\}\'\ \|\ grep\ -v\ \^0\ \|\ xargs\ -I\ \{1\}\ find\ /proc/\{1\}/fd\ -maxdepth\ 1\ -exec\ readlink\ \{\}\ \\\;\ \|\ grep\ \'\^/var/lib/dpkg/lock\$\'\;\ \}'
'while\ test\ -n\ \"\$\(readlock\)\"\;\ do\ sleep\ 1\;\ done'
'echo\ \"\ \"'
'echo\ -n\ \'\{\"available\":\[\''
'DEBIAN_FRONTEND\=noninteractive\ apt-get\ upgrade\ --dry-run\ \|\ grep\ Inst\ \|\ tr\ -d\ \'\[\]\(\)\'\ \|\\'
'\ \ awk\ \'\{\ printf\ \"\{\\\"name\\\":\\\"\"\$2\"\\\",\\\"version\\\":\\\"\"\$4\"\\\",\\\"repo\\\":\\\"\"\$5\"\\\",\\\"arch\\\":\\\"\"\$6\"\\\"\},\"\ \}\'\ \|\ rev\ \|\ cut\ -c\ 2-\ \|\ rev\ \|\ tr\ -d\ \'\\n\''
'echo\ -n\ \'\]\}\''
' returned 

OS / Environment
Debian 9.8, Chef 14.10, InSpec 3.6.6.

Inspec Version

3.6.6

Baseline Version
0.4.0 and git master branch have same results

@mattray
Copy link
Contributor Author

mattray commented Feb 28, 2019

I'm really just looking for suggestions on how to debug this, it just sits there eating 3-4% CPU until it times out.

@chris-rock
Copy link
Member

Try to run the same command manually on this machine. Normally, this is caused by a bad internet connection. Very likely the apt-get update does not run through.

@mattray
Copy link
Contributor Author

mattray commented Mar 4, 2019

It works fine manually. It also works fine if I run "inspec exec" manually or from another machine. It's something about being inside the audit cookbook that is causing it to fail, it's consistently failed for about 10 days now.

The machine isn't heavily loaded and my other Debian boxes are unaffected.

@chris-rock
Copy link
Member

If it is working manually, then it may be an issue with permissions. Maybe it is also related to an updated InSpec version?

@mattray
Copy link
Contributor Author

mattray commented Aug 3, 2019

Did a fresh install to Debian 10 and the problem went away for awhile, now it's started back up. It's just this 1 machine, other Debian 10 boxes are fine. This is the box running Automate, so it's running inspec by chef-client via the audit cookbook and reporting back to itself, but that doesn't look like the issue.

Command timed out after 600s:
Command exceeded allowed execution time, process terminated
---- Begin output of bash -c \#\!/bin/sh'
'DEBIAN_FRONTEND\=noninteractive\ apt-get\ update\ \>/dev/null\ 2\>\&1'
'readlock\(\)\ \{\ cat\ /proc/locks\ \|\ awk\ \'\{print\ \$5\}\'\ \|\ grep\ -v\ \^0\ \|\ xargs\ -I\ \{1\}\ find\ /proc/\{1\}/fd\ -maxdepth\ 1\ -exec\ readlink\ \{\}\ \\\;\ \|\ grep\ \'\^/var/lib/dpkg/lock\$\'\;\ \}'
'while\ test\ -n\ \"\$\(readlock\)\"\;\ do\ sleep\ 1\;\ done'
'echo\ \"\ \"'
'echo\ -n\ \'\{\"available\":\[\''
'DEBIAN_FRONTEND\=noninteractive\ apt-get\ upgrade\ --dry-run\ \|\ grep\ Inst\ \|\ tr\ -d\ \'\[\]\(\)\'\ \|\\'
'\ \ awk\ \'\{\ printf\ \"\{\\\"name\\\":\\\"\"\$2\"\\\",\\\"version\\\":\\\"\"\$4\"\\\",\\\"repo\\\":\\\"\"\$5\"\\\",\\\"arch\\\":\\\"\"\$6\"\\\"\},\"\ \}\'\ \|\ rev\ \|\ cut\ -c\ 2-\ \|\ rev\ \|\ tr\ -d\ \'\\n\''
'echo\ -n\ \'\]\}\''
' ----
STDOUT: 
STDERR: 
---- End output of bash -c \#\!/bin/sh'
'DEBIAN_FRONTEND\=noninteractive\ apt-get\ update\ \>/dev/null\ 2\>\&1'
'readlock\(\)\ \{\ cat\ /proc/locks\ \|\ awk\ \'\{print\ \$5\}\'\ \|\ grep\ -v\ \^0\ \|\ xargs\ -I\ \{1\}\ find\ /proc/\{1\}/fd\ -maxdepth\ 1\ -exec\ readlink\ \{\}\ \\\;\ \|\ grep\ \'\^/var/lib/dpkg/lock\$\'\;\ \}'
'while\ test\ -n\ \"\$\(readlock\)\"\;\ do\ sleep\ 1\;\ done'
'echo\ \"\ \"'
'echo\ -n\ \'\{\"available\":\[\''
'DEBIAN_FRONTEND\=noninteractive\ apt-get\ upgrade\ --dry-run\ \|\ grep\ Inst\ \|\ tr\ -d\ \'\[\]\(\)\'\ \|\\'
'\ \ awk\ \'\{\ printf\ \"\{\\\"name\\\":\\\"\"\$2\"\\\",\\\"version\\\":\\\"\"\$4\"\\\",\\\"repo\\\":\\\"\"\$5\"\\\",\\\"arch\\\":\\\"\"\$6\"\\\"\},\"\ \}\'\ \|\ rev\ \|\ cut\ -c\ 2-\ \|\ rev\ \|\ tr\ -d\ \'\\n\''
'echo\ -n\ \'\]\}\''
' ----
Ran bash -c \#\!/bin/sh'
'DEBIAN_FRONTEND\=noninteractive\ apt-get\ update\ \>/dev/null\ 2\>\&1'
'readlock\(\)\ \{\ cat\ /proc/locks\ \|\ awk\ \'\{print\ \$5\}\'\ \|\ grep\ -v\ \^0\ \|\ xargs\ -I\ \{1\}\ find\ /proc/\{1\}/fd\ -maxdepth\ 1\ -exec\ readlink\ \{\}\ \\\;\ \|\ grep\ \'\^/var/lib/dpkg/lock\$\'\;\ \}'
'while\ test\ -n\ \"\$\(readlock\)\"\;\ do\ sleep\ 1\;\ done'
'echo\ \"\ \"'
'echo\ -n\ \'\{\"available\":\[\''
'DEBIAN_FRONTEND\=noninteractive\ apt-get\ upgrade\ --dry-run\ \|\ grep\ Inst\ \|\ tr\ -d\ \'\[\]\(\)\'\ \|\\'
'\ \ awk\ \'\{\ printf\ \"\{\\\"name\\\":\\\"\"\$2\"\\\",\\\"version\\\":\\\"\"\$4\"\\\",\\\"repo\\\":\\\"\"\$5\"\\\",\\\"arch\\\":\\\"\"\$6\"\\\"\},\"\ \}\'\ \|\ rev\ \|\ cut\ -c\ 2-\ \|\ rev\ \|\ tr\ -d\ \'\\n\''
'echo\ -n\ \'\]\}\''
' returned 

@mattray mattray changed the title Debian 9.8 timing out on audits Debian 9.8/10 timing out on audits Aug 3, 2019
@mattray
Copy link
Contributor Author

mattray commented Mar 20, 2020

This bug continues to haunt me, it went away when I stopped running Automate on the box. I migrated Automate to another Debian box, and now that machine exhibits the failure. As far as I can tell, the offending line is

https://github.com/dev-sec/linux-patch-baseline/blob/master/libraries/linux_updates.rb#L187

which sits there chewing on

find /proc/24114/fd -maxdepth 1 -exec readlink {} ;

where process 24114 is java running ElasticSearch

24114 ?        Sl   1315:49 /hab/pkgs/chef/automate-openjdk/11.0.2/20191204000228/bin/java -Xms3984m -Xmx3984m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Djdk.io.permissionsUseCanonicalPath=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -Djava.io.tmpdir=/hab/svc/automate-elasticsearch/var/tmp -XX:+HeapDumpOnOutOfMemoryError -Djavax.net.ssl.trustStore=/hab/svc/automate-elasticsearch/var/ca.jks -Djavax.net.ssl.trustStorePassword=changeit -Djavax.net.ssl.trustStoreType=JKS -Des.path.home=/hab/pkgs/chef/automate-elasticsearch/6.8.3/20191217225932/es -Des.path.conf=/hab/svc/automate-elasticsearch/config -Des.distribution.flavor=default -Des.distribution.type=tar -cp /hab/pkgs/chef/automate-elasticsearch/6.8.3/20191217225932/es/lib/* org.elasticsearch.bootstrap.Elasticsearch

and it has 19334 entries. It eventually gets killed by inspec because it takes too long or fails.

I assume what we're trying to do is wait for /var/lib/dpkg/lock to free up? Couldn't we use lsof /var/lib/dpkg/lock?

@kmf
Copy link

kmf commented Apr 30, 2020

We are have it in Prod ... on our Automate Server ...
Ubuntu 18.04

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants