-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Randomization for plugin execution #460
Comments
+1 for the jitter idea. Standard kind of solution to this problem, straightforward to implement and to understand. Might be useful for plugins that do anything with a network request as well. Should it be fully random or deterministically random, e.g. should a random offset for a plugin be determined once and used repeatedly, or should it be a new random number each time the plugin executes? I'm not sure if it makes more sense for jitter to be default on or off, but it seems like it should be configurable. |
Today, telegraf has two modes:
round_interval=true. Will call all plugins "on the second", exactly, every time
round_interval=false. Will call plugins exactly every configured second, starting at the startup time.
Randomization of this interval is important for us as it avoids self-synchronization effects mucking up analysis and causing all theads to be doing work concretely. What we would like is an explicit randomization of the time each plugin is run. This is important because, for example, reading values from sysfs can cause measurable changes to the system, and doing all this reading at the exact same time - and exactly every x seconds - is bad.
By means of an example, if plugin was set to run every 10 seconds, and this random jitter value was set to 1, the plugin would run at a random nanosecond between 9 and 11 seconds. If we had 100 plugins, they would be randomly spread out between 9 and 11 seconds, and this spread would be different for each collection interval.
Implementation wise, this should be fairly simple to achieve with a random sleep before doing the actual collection (the launching of a goroutine synchronized isnt a problem; its the collection work that it goes on to do that is the problem).
I chatted to @sparrc about this, and figured a issue was the best place to get suggestions from others.
The text was updated successfully, but these errors were encountered: