Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JMX SD] Wraps the jmx_state get call in try catch #3461

Merged
merged 1 commit into from
Aug 1, 2017

Conversation

nmuesch
Copy link
Contributor

@nmuesch nmuesch commented Jul 31, 2017

Note: Please remember to review the Datadog Contribution Guidelines
if you have not yet done so.

What does this PR do?

This PR wraps the jmx_state get line that pulls in the information about the JMXFetch process from supervisor in a try/except block.

Motivation

There have been a few cases where this line throws an exception due to a broken pipe, which causes the Agent to crash since its uncaught. Example stacktrace:

dd.collector | collector(agent.py:604) | Uncaught error running the Agent 
Traceback (most recent call last): 
File "/opt/datadog-agent/agent/agent.py", line 600, in <module> 
sys.exit(main()) 
File "/opt/datadog-agent/agent/agent.py", line 543, in main 
agent.start(foreground=True) 
File "/opt/datadog-agent/agent/daemon.py", line 174, in start 
self.run() 
File "/opt/datadog-agent/agent/agent.py", line 324, in run 
self.reload_configs(checks_to_reload=self.reload_configs_flag) 
File "/opt/datadog-agent/agent/agent.py", line 155, in reload_configs 
self._submit_jmx_service_discovery(jmx_sd_configs) 
File "/opt/datadog-agent/agent/agent.py", line 433, in _submit_jmx_service_discovery 
jmx_state = self.supervisor_proxy.supervisor.getProcessInfo(JMX_SUPERVISOR_ENTRY) 
File "/opt/datadog-agent/embedded/lib/python2.7/xmlrpclib.py", line 1243, in __call__ 
return self.__send(self.__name, args) 
File "/opt/datadog-agent/embedded/lib/python2.7/xmlrpclib.py", line 1602, in __request 
verbose=self.__verbose 
File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/supervisor/xmlrpc.py", line 500, in request 
self.connection.request('POST', handler, request_body, self.headers) 
File "/opt/datadog-agent/embedded/lib/python2.7/httplib.py", line 1042, in request 
self._send_request(method, url, body, headers) 
File "/opt/datadog-agent/embedded/lib/python2.7/httplib.py", line 1082, in _send_request 
self.endheaders(body) 
File "/opt/datadog-agent/embedded/lib/python2.7/httplib.py", line 1038, in endheaders 
self._send_output(message_body) 
File "/opt/datadog-agent/embedded/lib/python2.7/httplib.py", line 882, in _send_output 
self.send(msg) 
File "/opt/datadog-agent/embedded/lib/python2.7/httplib.py", line 858, in send 
self.sock.sendall(data) 
File "/opt/datadog-agent/embedded/lib/python2.7/socket.py", line 228, in meth 
return getattr(self._sock,name)(*args) 
error: [Errno 32] Broken pipe

While this won't resolve the issue of there being a broken pipe, it will allow the Agent to continue running.

Testing Guidelines

An overview on testing
is available in our contribution guidelines.

Additional Notes

Anything else we should know when reviewing?

Copy link
Member

@truthbk truthbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return isn't in the right line :)

agent.py Outdated
try:
jmx_state = self.supervisor_proxy.supervisor.getProcessInfo(JMX_SUPERVISOR_ENTRY)
log.debug("Current JMX check state: %s", jmx_state['statename'])
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want the return inside the except block instead of here, otherwise we'll be skipping the rest of the method when everything is fine, and not the opposite.

agent.py Outdated
log.debug("Current JMX check state: %s", jmx_state['statename'])
return
except Exception as e:
log.exception("Unable to get JMXFetch process state from supervisor: %s", e)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also fit this: "cannot submit JMX auto-discovery configurations" into the message somehow?

@nmuesch nmuesch force-pushed the nick/JMX_SD_catch_jmxstate_error branch from 18fafc1 to 4d1f7b0 Compare July 31, 2017 18:25
Copy link
Member

@truthbk truthbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 👍

@truthbk truthbk added this to the 5.17 milestone Aug 1, 2017
@truthbk truthbk merged commit 8f46f52 into master Aug 1, 2017
@ipfaffy
Copy link

ipfaffy commented Aug 7, 2017

Any idea when this PR will be released?

@olivielpeau olivielpeau deleted the nick/JMX_SD_catch_jmxstate_error branch August 24, 2017 21:42
@olivielpeau
Copy link
Member

Sorry for the late answer @ipfaffy: this PR will be released with 5.17 (target release date August 28)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants