Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[auto-discovery] re-pipe configurations after JMXFetch restart #3415

Merged
merged 4 commits into from
Jul 21, 2017

Conversation

truthbk
Copy link
Member

@truthbk truthbk commented Jul 5, 2017

What does this PR do?

If JMXFetch happens to go down, use the atime stat date on the JMX launch file to decide if the auto-discovery configurations should be re-piped to JMX. JMXFetch will touch that file when coming up so we should have a relatively good idea of when JMXFetch started - alternatively, we could look at the process table to figure this stuff out (but that would be computationally more expensive).

Motivation

Customer bug.

Testing Guidelines

An overview on testing
is available in our contribution guidelines.

Additional Notes

Sister PR: DataDog/jmxfetch#143

@bai
Copy link

bai commented Jul 7, 2017

It seems that SD-based jmx metric collection is borked at the moment on Kubernetes. Friendly question: Is there any ETA on merging this?

@hush-hush hush-hush added this to the 5.16 milestone Jul 10, 2017
@@ -69,6 +69,8 @@
'list_limited_attributes': "List attributes that do match one of your instances configuration but that are not being collected because it would exceed the number of metrics that can be collected",
JMX_COLLECT_COMMAND: "Start the collection of metrics based on your current configuration and display them in the console"}

JMX_LAUNCH_FILE = 'jmx.launch'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it should be a hidden file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it's in jmxfetch's temp directory I'd say a regular file is okay

@olivielpeau olivielpeau modified the milestones: 5.15.1, 5.16 Jul 12, 2017
@olivielpeau olivielpeau self-requested a review July 12, 2017 17:17
@olivielpeau olivielpeau self-assigned this Jul 12, 2017
@truthbk
Copy link
Member Author

truthbk commented Jul 17, 2017

@bai this just barely missed the window for the upcoming 5.15.0 release, but will hopefully make it into the next bugfix version.

Copy link
Member

@olivielpeau olivielpeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment, let me know what you think

@@ -69,6 +69,8 @@
'list_limited_attributes': "List attributes that do match one of your instances configuration but that are not being collected because it would exceed the number of metrics that can be collected",
JMX_COLLECT_COMMAND: "Start the collection of metrics based on your current configuration and display them in the console"}

JMX_LAUNCH_FILE = 'jmx.launch'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it's in jmxfetch's temp directory I'd say a regular file is okay

agent.py Outdated
try:
jmx_launch = JMXFetch._get_jmx_launchtime()
if self.last_jmx_piped and self.last_jmx_piped < jmx_launch:
self.sd_backend.reload_check_configs = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this jmxfetch-specific code block should be here exactly: JMXFetch restarts aren't really related to the updates in the config template store (which is what the parent block is about).

On top of that, when JMXFetch restarts, it'd be nice to reload only the jmx-related check configs instead of all the check configs.

So:

  1. Could we move this logic to outside this if block (i.e. right after self.reload_configs_flag = False)?
  2. To avoid reloading all the check configs, we could do what's done here: https://github.com/DataDog/dd-agent/blob/5.15.0/agent.py#L281-L285 (which could be out in a separate function) instead of setting self.sd_backend.reload_check_configs = True

Let me know how that sounds to you, I might have missed something.

@truthbk
Copy link
Member Author

truthbk commented Jul 20, 2017

Thanks for the review @olivielpeau, insightful comments. I scrambled to put this together ASAP and wasn't as careful as I should have. Let me know if you prefer the current logical path.

Copy link
Member

@olivielpeau olivielpeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@truthbk truthbk merged commit 5317bd5 into master Jul 21, 2017
@truthbk truthbk deleted the jaime/adjmx branch July 21, 2017 09:53
@olivielpeau olivielpeau modified the milestones: 5.17, 5.15.1 Aug 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants