-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rate limiting in the dispatcher #402
Comments
A flexible way to implement the configuration of this feature would be similar to how namespace isolation policies are configured:
If a topic matches more than one policy, it can choose the more restrictive policy. If using regex'es is too heavyweight, we can opt for having just the default and a (hopefully small) list of overrides where the exact topic name is provided. This gives the ability to rate limit only certain topics (or raise the limit over the default for certain topics). The motivation for this feature is not to reduce GC pressure, but rather to protect against DOS when consumers consume too high of a rate in the steady state. |
Are you sure this is the right approach? It seems like it'd make since to have a global rate limit and then do some sort of fair queuing to the ready sockets on dispatch. also, is this issue to cover consumers, producers, or both? |
I have created PIP for message dispatching throttling. Can we please provide a feedback or add any other additional approach. @merlimat @saandrews |
@rdhabalia Can you add information on the implementation approach? How is the limit going to be enforced? In all the cases, global-limit vs namespace limit, the policy apply to each topic. That doesn't include any per tenant, namespace or bundle enforcement. Another thought, should we differentiate between cached vs uncached reads? By throttling cached reads, we could bet set up for a much bigger amount of work later when we have to fetch it from bookies.
I agree that different quotas per subscription are overkill, though it's not clear to me whether the configured limit is per-topic (eg: shared across all subscriptions) or applied to each individual subscription
If the objective is to limit CPU usage, I agree. But if we're trying to protect network bandwidth, the the size should be considered. I think that if the objective is to protect broker, another approach would be to a have a per-broker limit. When the limit is reached, the throttling will start applying to the heaviest users. Also, it would be interesting to integrate with load-manager so that heavy users can be potentially "kicked-out" on a isolated broker asap. |
Sure, I will broker changes details.
Actually, there are two things here: if we are trying to protect broker's resources against a topic which is draining with much higher rate then it make sense to throttle overall msg-out-rate and consider both cached and uncached.
Configuration per topic which will be shared across all the subscriptions and only reason is to put cap at the topic so, topic with many subscribers can't misuse the throttling and if it is needed then we can configure higher number of message-rate for the namespace.
Actually, both CPU and n/w bandwidth are concerns while draining the backlog. Don't you think allowing both the policies may create unnecessary complexity because then it will be hard to decide which one should be kept as resource consumption is at runtime and can be varied broker to broker? |
I was talking at the ManagedLedger level cache |
Yes, actually earlier in the PIP I have defined |
That doesn't change the fact that the limit is per-topic. I was proposing to have the limit per-broker, irrespective of whether there's a per-topic limit or not and whether the limit is configured cluster-wise or at individual brokers level. That would be to ensure the broker resources are not exceeded. When that happens, delivery will be slowed down |
Yes, I agree, then load-manager should identify topic with much higher msgOutRate and unload it. |
Yes. I think then we do not throttle already caught up consumers (activeCursors) in that way we will never throttle cached entry. |
I had the same point, but the broker resource related load balancing is
done by load manager. So, ideally if broker becomes busy, load manager
should shed load to bring down its load.
…On Fri, Jul 21, 2017 at 1:42 PM, Matteo Merli ***@***.***> wrote:
Yes, actually earlier in the PIP I have defined Broker level configuration
rather Cluster level configuration but team decided to go with Cluster
level configuration to make sure if namespace load by broker1 or broker2,
both the brokers should throttle with similar rate to have transparency of
rate-limiting.
That doesn't change the fact that the limit is per-topic.
I was proposing to have the limit per-broker, irrespective of whether
there's a per-topic limit or not and whether the limit is configured
cluster-wise or at individual brokers level.
That would be to ensure the broker resources are not exceeded. When that
happens, delivery will be slowed down
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#402 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ATk4LK-sk22Q92CPKKASzyulJhZOXoa7ks5sQQ0hgaJpZM4NYcrz>
.
|
Yes, but that might happen later. In the meantime, the broker could immediately throttle the reads. So that it won't even get overloaded. My point of integrating it with load-manager, is that if the broker is throttling some topics (even though the resources are not 100%), the load-manager should make use of that information in some ways. |
Implemented with #634 so, closing it. |
Expected behavior
We are considering adding the ability to rate limit the dispatcher, both at the broker level but also per topic (or bundle).
Actual behavior
Today, there is no such rate limit so when a consumer gets backlogged, messages get delivered as quickly as possible which can have adverse effects on other bundles hosted on the same broker.
Steps to reproduce
Publish messages at a high rate on a single topic with one consumer that consumes messages slowly, thus creating a backlog. Then toggle the consumer to consume as quickly as it can. This can easily be reproduced with the
PerformanceConsumer
:The text was updated successfully, but these errors were encountered: