Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a randomized start before running CheckMonitors. #546

Merged
merged 4 commits into from
Dec 18, 2014
Merged

Add a randomized start before running CheckMonitors. #546

merged 4 commits into from
Dec 18, 2014

Conversation

ryanbreen
Copy link
Contributor

We have situations where systems use many checks (dozens to hundreds) with fairly long intervals (say, every 60s). Starting them all at the same time leads to undesirable spikes in resource utilization. This patch randomly staggers the start time of CheckMonitors to avoid this. It should address our issue without altering behavior much in the common case.

@ryanuber
Copy link
Member

@ryanbreen there is also a function in util, randomStagger(), that should help with this. Check it out here. It can probably plug right into the interval.

Otherwise I think this is a good idea!

@armon
Copy link
Member

armon commented Dec 18, 2014

Agreed with @ryanuber! Also some tests for this would be good :)

@ryanbreen
Copy link
Contributor Author

Thanks for the feedback, guys.

I'm struggling a bit with how best to test a random check start time in a deterministic fashion.

@armon
Copy link
Member

armon commented Dec 18, 2014

@ryanbreen Probably enough to ensure the check is run within a timely manner (e.g. <= Interval).

@ryanbreen
Copy link
Contributor Author

Sounds good. I'll get that added to check_test.go.

@ryanbreen
Copy link
Contributor Author

Let me know if that test doesn't cover it. Thanks!

next := time.After(0)
// Get the randomized initial pause time
initialPauseTime := randomStagger(c.Interval)
c.Logger.Printf("[DEBUG] agent: pausing %ds before first invocation of %s", int(initialPauseTime.Seconds()), c.Script)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you just print a time.Duration directly here as a string, it will get formatted in a nice, human-friendly way by its String() method. That way we don't have to explicitly cast to an int, and we get a more accurate value in the log.

@ryanuber
Copy link
Member

@ryanbreen I left a few comments, but looking good!

@ryanbreen
Copy link
Contributor Author

Great feedback, thanks @ryanuber! Let me know if this doesn't address those points adequately.

@ryanuber
Copy link
Member

@ryanbreen LGTM, Thanks for doing this!

ryanuber added a commit that referenced this pull request Dec 18, 2014
Add a randomized start before running CheckMonitors.
@ryanuber ryanuber merged commit b74af61 into hashicorp:master Dec 18, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants