Use a temporary file for df output to avoid PIPE limits. #1294

conorbranagan · 2015-01-14T14:46:41Z

subprocess.PIPE is limited in size because the data is buffered in
memory. When running the Agent on a host that has many containers,
the number of mountpoints can be very large. Because of this buffer limit,
the shell call can hang for these huge outputs.

To workaround this, we will write the output to a temporary file and
read it from there.

This is a "quick" fix for a specific case and there might be a better
general approach. But either way, we will still need to handle the
case of large outputs from our subprocess commands otherwise the
whole Agent collector simply hangs.

refs:

cc @remh @LeoCavaille

remh · 2015-01-14T15:36:23Z

Thanks @conorbranagan nice debugging session!

I'm more in favor of rewriting the disk check to use psutil and stop using subprocess. Those are pretty small and it should be pretty straightforward.

But i don't want to guarantee that it would be part of the 5.2 release and would rather aim at doing this for 5.3.

If it's a critical fix we could merge it as it for 5.2 and rewrite the disk check for 5.3 ?

I'm open to more suggestions though.

conorbranagan · 2015-01-14T15:59:21Z

OK if we're going to go the psutil direction, then I'd prefer to get this small change in 5.2 since it's critical for a particular customer so they can get their Agents running again going forward (without having to keep a patched version around).

What do you think?

remh · 2015-01-14T20:01:53Z

checks/system/unix.py

@@ -906,12 +907,15 @@ def _get_subprocess_output(command, log):
    Run the given subprocess command and return it's output. Raise an Exception
    if an error occurs.
    """
-    proc = sp.Popen(command, stdout=sp.PIPE, close_fds=True, stderr=sp.PIPE)
+    stdout_f = tempfile.TemporaryFile('rw')


Nitpick but could you embed that in a "with" statement so won't have to wait for the garbage collection for the file to be deleted ?

subprocess.PIPE is limited in size because the data is buffered in memory. When running the Agent on a host that has many containers, the number of mountpoints can be very large. Because of this buffer limit, the shell call can hang for these huge outputs. To workaround this, we will write the output to a temporary file and read it from there. This is a "quick" fix for a specific case and there might be a better general approach. But either way, we will still need to handle the case of large outputs from our subprocess commands otherwise the whole Agent collector simply hangs. ref: http://thraxil.org/users/anders/posts/2008/03/13/Subprocess-Hanging-PIPE-is-your-enemy/

Use a temporary file for df output to avoid PIPE limits.

coveralls · 2016-08-05T09:24:17Z

Changes Unknown when pulling f298832 on conor/subprocess-pipe-limit into * on master*.

remh reviewed Jan 14, 2015
View reviewed changes

conorbranagan force-pushed the conor/subprocess-pipe-limit branch from b2dbb87 to f298832 Compare January 14, 2015 20:15

remh pushed a commit that referenced this pull request Jan 14, 2015

Merge pull request #1294 from DataDog/conor/subprocess-pipe-limit

012550c

Use a temporary file for df output to avoid PIPE limits.

remh merged commit 012550c into master Jan 14, 2015

remh deleted the conor/subprocess-pipe-limit branch March 23, 2015 17:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a temporary file for df output to avoid PIPE limits. #1294

Use a temporary file for df output to avoid PIPE limits. #1294

conorbranagan commented Jan 14, 2015

remh commented Jan 14, 2015

conorbranagan commented Jan 14, 2015

remh Jan 14, 2015

coveralls commented Aug 5, 2016

Use a temporary file for df output to avoid PIPE limits. #1294

Use a temporary file for df output to avoid PIPE limits. #1294

Conversation

conorbranagan commented Jan 14, 2015

remh commented Jan 14, 2015

conorbranagan commented Jan 14, 2015

remh Jan 14, 2015

Choose a reason for hiding this comment

coveralls commented Aug 5, 2016