Rate limiting in the dispatcher #402

bradtm · 2017-05-11T19:28:23Z

Expected behavior

We are considering adding the ability to rate limit the dispatcher, both at the broker level but also per topic (or bundle).

Actual behavior

Today, there is no such rate limit so when a consumer gets backlogged, messages get delivered as quickly as possible which can have adverse effects on other bundles hosted on the same broker.

Steps to reproduce

Publish messages at a high rate on a single topic with one consumer that consumes messages slowly, thus creating a backlog. Then toggle the consumer to consume as quickly as it can. This can easily be reproduced with the PerformanceConsumer:

[main:PerformanceConsumer@219] - Start receiving from 1 consumers on 1 destinations
[main:PerformanceConsumer@236] - Throughput received: 1316416.414  msg/s -- 100.435 Mbit/s
[main:PerformanceConsumer@236] - Throughput received: 1334013.524  msg/s -- 101.777 Mbit/s

The text was updated successfully, but these errors were encountered:

bradtm · 2017-05-18T15:13:18Z

A flexible way to implement the configuration of this feature would be similar to how namespace isolation policies are configured:

A regex can be supplied which maps to a rate limiting policy (basically, maximum messages per second). If the regex matches a topic, then all subscriptions on that topic would get the rate limit applied
A default policy can also be supplied
If there is no match and no default, then the current behavior is in effect (no rate limiting)

If a topic matches more than one policy, it can choose the more restrictive policy.

If using regex'es is too heavyweight, we can opt for having just the default and a (hopefully small) list of overrides where the exact topic name is provided. This gives the ability to rate limit only certain topics (or raise the limit over the default for certain topics).

The motivation for this feature is not to reduce GC pressure, but rather to protect against DOS when consumers consume too high of a rate in the steady state.

msb-at-yahoo · 2017-05-18T17:30:31Z

Are you sure this is the right approach? It seems like it'd make since to have a global rate limit and then do some sort of fair queuing to the ready sockets on dispatch.

also, is this issue to cover consumers, producers, or both?

rdhabalia · 2017-07-20T01:22:40Z

I have created PIP for message dispatching throttling. Can we please provide a feedback or add any other additional approach. @merlimat @saandrews

merlimat · 2017-07-21T19:31:56Z

@rdhabalia Can you add information on the implementation approach? How is the limit going to be enforced?

In all the cases, global-limit vs namespace limit, the policy apply to each topic. That doesn't include any per tenant, namespace or bundle enforcement.

Another thought, should we differentiate between cached vs uncached reads? By throttling cached reads, we could bet set up for a much bigger amount of work later when we have to fetch it from bookies.

per-subscriber throttling

I agree that different quotas per subscription are overkill, though it's not clear to me whether the configured limit is per-topic (eg: shared across all subscriptions) or applied to each individual subscription

Throttling threshold: Message-rate Vs Bytes-rate
Broker reads data from bookkeeper and dispatches it to consumer in form of message entity.
Therefore, it makes more sense to define threshold as message-rate over bytes-rate.

If the objective is to limit CPU usage, I agree. But if we're trying to protect network bandwidth, the the size should be considered.
I would say both should be allowed to apply.

I think that if the objective is to protect broker, another approach would be to a have a per-broker limit. When the limit is reached, the throttling will start applying to the heaviest users. Also, it would be interesting to integrate with load-manager so that heavy users can be potentially "kicked-out" on a isolated broker asap.

rdhabalia · 2017-07-21T20:29:48Z

Can you add information on the implementation approach?

Sure, I will broker changes details.

Another thought, should we differentiate between cached vs uncached reads?
we could bet set up for a much bigger amount of work later when we have to fetch it from bookies.

Actually, there are two things here: if we are trying to protect broker's resources against a topic which is draining with much higher rate then it make sense to throttle overall msg-out-rate and consider both cached and uncached.
However, it can put higher load on bookie later on because by the time next read comes, bookie might have discarded entries from the cache. But then to address this issue, broker needs visibility of bookie's caching. Broker also cache entry at managed-ledger but broker doesn't do caching if consumer has not caught up or consumer is draining backlog, and one of the need of throttling is for namespace which are draining backlog. so, I think we can throttle both kind of entries.?

though it's not clear to me whether the configured limit is per-topic (eg: shared across all subscriptions) or applied to each individual subscription

Configuration per topic which will be shared across all the subscriptions and only reason is to put cap at the topic so, topic with many subscribers can't misuse the throttling and if it is needed then we can configure higher number of message-rate for the namespace.
However, there is very low probability to have many subscribers. So, do you think individual subscribers can have separate quota would be better option?

I would say both should be allowed to apply.

Actually, both CPU and n/w bandwidth are concerns while draining the backlog. Don't you think allowing both the policies may create unnecessary complexity because then it will be hard to decide which one should be kept as resource consumption is at runtime and can be varied broker to broker?

merlimat · 2017-07-21T20:37:53Z

broker needs visibility of bookie's caching

I was talking at the ManagedLedger level cache

rdhabalia · 2017-07-21T20:39:02Z

think that if the objective is to protect broker, another approach would be to a have a per-broker limit. When the limit is reached, the throttling will start applying to the heaviest users.

Yes, actually earlier in the PIP I have defined Broker level configuration rather Cluster level configuration but team decided to go with Cluster level configuration to make sure if namespace load by broker1 or broker2, both the brokers should throttle with similar rate to have transparency of rate-limiting.

merlimat · 2017-07-21T20:42:03Z

Yes, actually earlier in the PIP I have defined Broker level configuration rather Cluster level configuration but team decided to go with Cluster level configuration to make sure if namespace load by broker1 or broker2, both the brokers should throttle with similar rate to have transparency of rate-limiting.

That doesn't change the fact that the limit is per-topic.

I was proposing to have the limit per-broker, irrespective of whether there's a per-topic limit or not and whether the limit is configured cluster-wise or at individual brokers level.

That would be to ensure the broker resources are not exceeded. When that happens, delivery will be slowed down

rdhabalia · 2017-07-21T20:50:41Z

That would be to ensure the broker resources are not exceeded. When that happens, delivery will be slowed down
Also, it would be interesting to integrate with load-manager so that heavy users can be potentially "kicked-out" on a isolated broker asap.

Yes, I agree, then load-manager should identify topic with much higher msgOutRate and unload it. ModularLoadManager considers msgRateOut variable and we can extend its functionality to handle it more accurately.

rdhabalia · 2017-07-21T20:57:50Z

I was talking at the ManagedLedger level cache

Yes. I think then we do not throttle already caught up consumers (activeCursors) in that way we will never throttle cached entry.

saandrews · 2017-07-21T21:15:32Z

I had the same point, but the broker resource related load balancing is done by load manager. So, ideally if broker becomes busy, load manager should shed load to bring down its load.

…

On Fri, Jul 21, 2017 at 1:42 PM, Matteo Merli ***@***.***> wrote: Yes, actually earlier in the PIP I have defined Broker level configuration rather Cluster level configuration but team decided to go with Cluster level configuration to make sure if namespace load by broker1 or broker2, both the brokers should throttle with similar rate to have transparency of rate-limiting. That doesn't change the fact that the limit is per-topic. I was proposing to have the limit per-broker, irrespective of whether there's a per-topic limit or not and whether the limit is configured cluster-wise or at individual brokers level. That would be to ensure the broker resources are not exceeded. When that happens, delivery will be slowed down — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#402 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ATk4LK-sk22Q92CPKKASzyulJhZOXoa7ks5sQQ0hgaJpZM4NYcrz> .

merlimat · 2017-07-21T21:18:07Z

r. So, ideally if broker becomes busy, load manager
should shed load to bring down its load.

Yes, but that might happen later. In the meantime, the broker could immediately throttle the reads. So that it won't even get overloaded.

My point of integrating it with load-manager, is that if the broker is throttling some topics (even though the resources are not 100%), the load-manager should make use of that information in some ways.

rdhabalia · 2017-08-18T01:15:31Z

Implemented with #634 so, closing it.

yangou · 2020-05-02T19:57:53Z

I know this is probably considered implemented but as this PIP mentioned, and suggested by @bradtm, are we supporting the regex-based per topic rate limit yet? Cause I don't see how to dynamically configure it with the admin tool.

merlimat added the type/feature The PR added a new feature or issue requested a new feature label Jul 21, 2017

merlimat added this to the 1.20.0-incubating milestone Jul 21, 2017

rdhabalia mentioned this issue Aug 8, 2017

PIP-3 : Introduce message-dispatch rate limiting #634

Merged

rdhabalia closed this as completed Aug 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rate limiting in the dispatcher #402

Rate limiting in the dispatcher #402

bradtm commented May 11, 2017 •

edited

Loading

bradtm commented May 18, 2017

msb-at-yahoo commented May 18, 2017

rdhabalia commented Jul 20, 2017 •

edited

Loading

merlimat commented Jul 21, 2017

rdhabalia commented Jul 21, 2017

merlimat commented Jul 21, 2017

rdhabalia commented Jul 21, 2017

merlimat commented Jul 21, 2017

rdhabalia commented Jul 21, 2017

rdhabalia commented Jul 21, 2017

saandrews commented Jul 21, 2017 via email

merlimat commented Jul 21, 2017

rdhabalia commented Aug 18, 2017

yangou commented May 2, 2020

Rate limiting in the dispatcher #402

Rate limiting in the dispatcher #402

Comments

bradtm commented May 11, 2017 • edited Loading

Expected behavior

Actual behavior

Steps to reproduce

bradtm commented May 18, 2017

msb-at-yahoo commented May 18, 2017

rdhabalia commented Jul 20, 2017 • edited Loading

merlimat commented Jul 21, 2017

rdhabalia commented Jul 21, 2017

merlimat commented Jul 21, 2017

rdhabalia commented Jul 21, 2017

merlimat commented Jul 21, 2017

rdhabalia commented Jul 21, 2017

rdhabalia commented Jul 21, 2017

saandrews commented Jul 21, 2017 via email

merlimat commented Jul 21, 2017

rdhabalia commented Aug 18, 2017

yangou commented May 2, 2020

bradtm commented May 11, 2017 •

edited

Loading

rdhabalia commented Jul 20, 2017 •

edited

Loading