Thundering herd when starting many Beats at the same time. #4010

jarpy · 2017-04-13T01:10:30Z

The timing of Beats is so accurate that they create bursty traffic when simultaneously restarted. In a large fleet with good configuration management, it's feasible that many hundreds of Beats could be restarted within one second of each other. They then proceed to stay in perfect sync.

Perhaps an intial startup delay of rand(period) would be nice here?

The text was updated successfully, but these errors were encountered:

andrewkroh · 2017-04-13T02:38:55Z

Since the earliest implementing I was thinking that the scheduling of each individual metricset should be staggered at startup to help smooth the CPU load on a host. I hadn't consider the herd effect caused by an entire fleet restarting. The same problem will affect Beats when central monitoring is available and you can reconfigure all at once.

I think both issues will be addressed if we introduce a random delay into the startup of each metricset. Thanks for providing a visualization of the issue. We can check this again after introducing the a fix for this.

tsg · 2017-04-17T13:54:53Z

+1 for random delay at startup.

Add random startup delay to each metricset to avoid the thundering herd problem. Fixes elastic#4010.

Add random startup delay to each metricset to avoid the thundering herd problem. Fixes #4010.

andrewkroh added the enhancement label Apr 13, 2017

tsg added libbeat low hanging fruit labels Apr 17, 2017

andrewkroh added a commit to andrewkroh/beats that referenced this issue Jun 14, 2017

Add random startup delay to each metricset

5b5c2bd

Add random startup delay to each metricset to avoid the thundering herd problem. Fixes elastic#4010.

andrewkroh mentioned this issue Jun 14, 2017

Add random startup delay to each metricset #4503

Merged

tsg closed this as completed in #4503 Jun 14, 2017

tsg pushed a commit that referenced this issue Jun 14, 2017

Add random startup delay to each metricset (#4503)

5c0766a

Add random startup delay to each metricset to avoid the thundering herd problem. Fixes #4010.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thundering herd when starting many Beats at the same time. #4010

Thundering herd when starting many Beats at the same time. #4010

jarpy commented Apr 13, 2017

andrewkroh commented Apr 13, 2017

tsg commented Apr 17, 2017

Thundering herd when starting many Beats at the same time. #4010

Thundering herd when starting many Beats at the same time. #4010

Comments

jarpy commented Apr 13, 2017

andrewkroh commented Apr 13, 2017

tsg commented Apr 17, 2017