Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low/No Disk space caused the Agent to crash and not recover. #2223

Merged
merged 2 commits into from
Jan 28, 2016

Conversation

truthbk
Copy link
Member

@truthbk truthbk commented Jan 22, 2016

It seems like a low/no disk space can cause the TemporaryFile calls made in subprocess_output to "fail" - and by fail we mean empty output that can lead to IndexErrors, etc, when parsing the output. In general we have to be ready to expect an empty string as output.

@JohnLZeller
Copy link
Contributor

It seems like the default case when calling get_subprocess_output() is that we expect output. I think we should invert the option here.

@truthbk
Copy link
Member Author

truthbk commented Jan 22, 2016

@JohnLZeller I agree that we normally do use it to run commands where we want an output, but from a library point of view - and this being a library - I don't really want to enforce that kind of behavior were empty output is automatically akin to an error.

@truthbk
Copy link
Member Author

truthbk commented Jan 22, 2016

@JohnLZeller actually nevermind.... you're right, the function is get_subprocess_output after all!

@@ -37,6 +37,10 @@ def get_subprocess_output(command, log, shell=False, stdin=None):

stdout_f.seek(0)
output = stdout_f.read()

if output_expected and output == "":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could change output == "" to not output. That way it'll catch "" and None

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think read() always returns something... but I agree None is more pythonic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right :) definitely a nit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't know if maybe we should .strip() before testing if the string is empty, some rude programs sometimes output newlines for instance.

@JohnLZeller
Copy link
Contributor

Looking pretty good. I wonder if it'd be worth defining a small Exception class, and raising that. This way we can catch a specific exception rather than all of them.

Maybe something like this:
class SubprocessOutputEmptyError(Exception): pass

Then you can just:
raise SubprocessOutputEmptyError('get_subprocess_output expected output but had none.')

And later can catch SubprocessOutputEmptyError


if Platform.is_freebsd(platf):
output, _, _ = get_subprocess_output(['sysctl', 'hw.ncpu'], log)
systemStats['cpuCores'] = int(output.split(': ')[1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not your code, but seems like this could be written as:

if Platform.is_linux(platf):
...
elif Platform.is_darwin(platf) or Platform.is_freebsd(platf):
...

if Platform.is_darwin(platf) or Platform.is_freebsd(platf):
output, _, _ = get_subprocess_output(['sysctl', 'hw.ncpu'], log)
systemStats['cpuCores'] = int(output.split(': ')[1])
except Exception as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, Could we do a catch SubprocessOutputEmptyError: here instead?

@JohnLZeller
Copy link
Contributor

LGTM, other than the notes 👍

[subprocess] wrapping calls in relevant try-catch blocks.
truthbk added a commit that referenced this pull request Jan 28, 2016
Low/No Disk space caused the Agent to crash and not recover.
@truthbk truthbk merged commit f599428 into master Jan 28, 2016
@truthbk truthbk deleted the jaime/uncaught branch January 28, 2016 23:20
@olivielpeau olivielpeau added this to the 5.7.0 milestone Feb 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants