Added rate_limit support to avoid sending too many requests to Consul Agent #1066

pierresouchay · 2018-02-06T01:07:18Z

This PR adds support for:

waiting at least between 2 downloads when watching a file controlled by rate_limit.min_delay_between_updates
add a random delay between fetching data with param rate_limit.random_backoff

This should avoid allow limiting the bandwidth used by consul-template on large clusters with lots of services when watching many services (and X-Consul-Index being updated very frequently)

cf #1065

… Agent. cf hashicorp#1065

Rate limit support: hashicorp#1066

pierresouchay · 2018-03-07T09:31:54Z

Any news on this ?
Since hashicorp/consul#3890 has now been fixed, it should greatly improve performance for large clusters

pearkes · 2018-03-19T20:28:52Z

Given consul-template is built on watches, we wonder if the suggested model should be to run it with -once on a schedule or based on an external signal. consul-template was designed to be a thin and concise wrapper around watches, and are concerned that adding complexity between the two will cause issues for other use cases.

Are you relying on de-duplication mode, which is disabled under -once? Or is there any other reason you couldn't use -once and accomplish the same thing as rate limiting?

pierresouchay · 2018-04-03T22:39:52Z

@pearkes -once would have a worse behaviour on large clusters for some templates, because each time you would run -once, you would have to re-download everything

If we want to build a large template (for instance, watch all services having tag "http", for each, create a list of endpoint to create a HaProxy / nginx configuration file) => -once is not an option => well it is, but then having the full file rendered would take a lot of time and would not allow to:

have babysitting of processes
allow to send reload signals

I agree this is not optimal, but it simply renders consul-template not usable for large clusters. On our, when using it, the DL speed between consul-template and its Consul Agent was around 800Mb/sec (and around the same between Consul Agent and Consul Server)

Since hashicorp/consul#3890 has now been merged, the situation is probably far better (except for /catalog/services and friends).

With this implementation https://github.com/criteo/consul-templaterb/ , for the same cluster, we succeeded to have around 5Mbits/sec instead of 800Mbits/sec, after optimization hashicorp/consul#3934 => we are around 2Mbits/sec while watching at all services.

I definitely think we could improve a bit the current implementation.

I'll try with consul-template to see what is the current bandwidth after hashicorp/consul#3934 has been applied

guidoiaquinti · 2018-04-19T19:00:43Z

Little bump on this as hashicorp/consul#3934 has been closed

sodabrew · 2018-11-15T20:04:39Z

I am also interested in this work, given that wait has been broken since 0.19.1 per my analysis here: #1043 (comment)

Without the wait block, my site's Consul servers were badly pummeled when I tried updating to consul-template 0.19.5 from our previous 0.16 version (I kept the dedup stanza, but removed the wait values -- bad plan).

pierresouchay · 2018-11-15T20:33:10Z

@sodabrew be my guest ;)

That's the main reason why I developed https://github.com/criteo/consul-templaterb it reduces the load on servers and bandwidth with orders of magnitude on large clusters while allowing far more features: real language, loops, parametrization, json, XML and even a fully customizable US UI with great performance on very large clusters: https://github.com/criteo/consul-templaterb/tree/master/samples/consul-ui

eikenb · 2019-08-20T23:46:13Z

What's the downside to having this enabled?

I ask as given the description and general sentiment makes it sounds like this is a simple win. Why make it configurable? Why not just add a sensible limit all the time? I don't believe in making things configurable just because... there needs to be real, tangible trade-offs that people need to manage for something to justify the complexity of being configurable.

Thanks.

pierresouchay · 2019-08-21T10:26:56Z

I would say none. The benefits have been reduced since we made patches in Consul itself, however, it is still a win for pathological cases

eikenb · 2019-08-21T22:40:46Z

@sodabrew, @guidoiaquinti, @gmr, and anyone else who cares. Would you agree with @pierresouchay that there is no downside to this? If you agree, I propose we just make this the new behavior and skip the config options.

gmr · 2019-08-22T16:56:32Z

I'd suggest that changing the behavior as non-optional should be made in major version changes, not minor or revision. Don't get me wrong, I like the new options, but someone might be relying on current behavior and taken by surprise if it were to change in a point release.

pierresouchay · 2019-08-22T17:46:11Z

@gmr the default behavior is probably not very wise (and has been changed in Consul in version 1.0.7+ for services )

gmr · 2019-08-22T19:09:12Z

I guess if it's documented and clear that it's a change in behavior (for the better), then go for it?

pierresouchay · 2019-08-28T12:06:47Z

Ark... now conflicting... :(

eikenb · 2019-08-28T19:47:27Z

I'm still leaning towards just making this the default behavior and the fact that this has conflicts just reinforces that as it highlights just how much code this has to touch to make it configurable. If not configurable it looks like it'd only mean changing 1 file (view.go).

What I'm thinking basically boils down to taking the logic from RateLimitFunc, put it a function in view.go and call it in the same place as in the patch. I'd guess with the current default values in place (set as vars to allow changing it for tests).

The current default values are:

rate_limit.min_delay_between_updates: 100 milliseconds
rate_limit.random_backoff: 33 milliseconds

Are these good default values?

eikenb · 2019-08-28T19:55:48Z

Assuming we go the non-configurable route. @pierresouchay, is that something you'd want to implement (maybe as a separate PR for comparison) or something I'd just take care of?

pierresouchay · 2019-08-28T22:12:46Z

I would be glad to implement this, but won't have the time to in next 5 weeks, so if you need it sooner, feel free to implement.

Regards

eikenb · 2019-08-30T22:22:42Z

I have a simplified version of this implemented that only has the little bit of code that actually does the rate limiting in watch/view.go. The current implementation has the base rate limiting value (MinDelayBetweenUpdates in the patch) as a module level variable so it can be 0'd out for tests as needed.

I was looking over the use/implementation in consul-templateb and it really looks like there might be some variety of values that are useful for the delay... so I was re-thinking the possibility of a simple command-line only option to set just the delay. This would let you disable it (by setting it to 0) as well as setting it more based on need. I'd keep it on by default, with the default value as is (100ms). Having it cli only would keep the changes down as well.

Any thoughts on this?

pierresouchay · 2019-08-30T22:57:34Z

@eikenb At the time consul-templaterb was written, the index (especially on services, prior to version 1.0.6 IIRC) was not well behaving (so watches were always waken up when any change in raft did occur). In consul-templaterb, there is an automatic penalty for endpoints being notified too often (see https://github.com/criteo/consul-templaterb/blob/master/lib/consul/async/consul_endpoint.rb#L237 ), meaning that endpoint not changing will have a bigger delay than endpoints really being modified. One of our last PR to improve things (look at most or my PR and those from @aestek) is this one: hashicorp/consul#4810 ... IIRC, this one is very helpful (starting with Consul 1.4.1) since if you watch a service being non-existent, it will avoid you to hammer the cluster for changes on a large cluster, hence all of the optimizations pushed in consul-templaterb.

Most of those optimizations are not really necessary anymore with recent versions since we worked a lot on reducing updates on anti-entropy changes (we have at least 5 PR merged regarding this).

Furthermore, all endpoints in of consul in consul-templaterb are handled the same way, including when there is no watch (eg: https://github.com/criteo/consul-templaterb/blob/master/bin/consul-templaterb#L69 ), i used the same mechanism to configure refresh on endpoint changing value all the time (metrics, /agent/self...) and for which I don't want being notified too often (eg: datacenters that sorts according RTT beween DCs)

This allowed us at the time, on large clusters, to reduce bandwidth from 800Mb/s to 8kb/s.

This is probably not that important today and you probably don't need that kind of granularity anymore since many optimization have been bought to Consul itself.

In the real life, most templating systems don't need to refresh data too often anyway (I mean 640ms is probably enough for anybody 😄 ), so having a safeguard to avoid refreshing every ms the cluster is a good thing and will avoid to DDOS the cluster.

In any case, I think this is a very good thing to integrate this finally in consul-template, thank you :-)

eikenb · 2019-08-31T02:26:04Z

@pierresouchay .. Thanks so much for the information dump. Really helped me understand the background of this and why it probably doesn't need to be configurable at all. So that's what I'll start with.

I'll push my version of it up as a new PR soon.

Rate limit all API calls to Consul to ~1/100ms to keep consul-template from flooding consul-agent with requests. This takes the idea from #1066 and simplifies it by making it work this way by default instead of making it a configurable option. Closes #1066

pierresouchay added 4 commits February 6, 2018 02:00

Added rate_limit support to avoid sending too many requests to Consul…

87bb57f

… Agent. cf hashicorp#1065

Fixed parsing of config file for rate_limit param

43958bc

More conservative valid to default DefaultMinDelayBetweenUpdates

ba75e2f

Moved watch logic to the right place

5f87db9

pierresouchay mentioned this pull request Feb 6, 2018

When watching all services, consul-template is DOSing the Consul agent #1065

Open

pierresouchay added a commit to criteo-forks/consul-template that referenced this pull request Feb 6, 2018

Added rate_limit support

f6ff2b8

Rate limit support: hashicorp#1066

pierresouchay mentioned this pull request Feb 15, 2018

X-Consul-Index reflects global service churn when watching a single service causing issues with sleeping between blocking calls hashicorp/consul#3890

Closed

pearkes added the waiting-reply label Mar 19, 2018

eikenb added enhancement and removed waiting-reply labels Jun 18, 2019

eikenb mentioned this pull request Sep 9, 2019

add basic rate limiting to consul api calls #1279

Merged

eikenb closed this in #1279 Sep 10, 2019

This was referenced Feb 24, 2020

Consul 1.6.3 DDos using consul-template (1.6.2 working fine) hashicorp/consul#7259

Closed

Consul 1.6.3 DDos using consul-template (1.6.2 working fine) #1346

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added rate_limit support to avoid sending too many requests to Consul Agent #1066

Added rate_limit support to avoid sending too many requests to Consul Agent #1066

pierresouchay commented Feb 6, 2018

pierresouchay commented Mar 7, 2018

pearkes commented Mar 19, 2018

pierresouchay commented Apr 3, 2018

guidoiaquinti commented Apr 19, 2018

sodabrew commented Nov 15, 2018

pierresouchay commented Nov 15, 2018

eikenb commented Aug 20, 2019

pierresouchay commented Aug 21, 2019

eikenb commented Aug 21, 2019

gmr commented Aug 22, 2019

pierresouchay commented Aug 22, 2019

gmr commented Aug 22, 2019

pierresouchay commented Aug 28, 2019

eikenb commented Aug 28, 2019

eikenb commented Aug 28, 2019

pierresouchay commented Aug 28, 2019

eikenb commented Aug 30, 2019

pierresouchay commented Aug 30, 2019

eikenb commented Aug 31, 2019

Added rate_limit support to avoid sending too many requests to Consul Agent #1066

Added rate_limit support to avoid sending too many requests to Consul Agent #1066

Conversation

pierresouchay commented Feb 6, 2018

pierresouchay commented Mar 7, 2018

pearkes commented Mar 19, 2018

pierresouchay commented Apr 3, 2018

guidoiaquinti commented Apr 19, 2018

sodabrew commented Nov 15, 2018

pierresouchay commented Nov 15, 2018

eikenb commented Aug 20, 2019

pierresouchay commented Aug 21, 2019

eikenb commented Aug 21, 2019

gmr commented Aug 22, 2019

pierresouchay commented Aug 22, 2019

gmr commented Aug 22, 2019

pierresouchay commented Aug 28, 2019

eikenb commented Aug 28, 2019

eikenb commented Aug 28, 2019

pierresouchay commented Aug 28, 2019

eikenb commented Aug 30, 2019

pierresouchay commented Aug 30, 2019

eikenb commented Aug 31, 2019