Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate limiting in the dispatcher #402

Closed
bradtm opened this issue May 11, 2017 · 14 comments
Closed

Rate limiting in the dispatcher #402

bradtm opened this issue May 11, 2017 · 14 comments
Labels
type/feature The PR added a new feature or issue requested a new feature

Comments

@bradtm
Copy link
Contributor

bradtm commented May 11, 2017

Expected behavior

We are considering adding the ability to rate limit the dispatcher, both at the broker level but also per topic (or bundle).

Actual behavior

Today, there is no such rate limit so when a consumer gets backlogged, messages get delivered as quickly as possible which can have adverse effects on other bundles hosted on the same broker.

Steps to reproduce

Publish messages at a high rate on a single topic with one consumer that consumes messages slowly, thus creating a backlog. Then toggle the consumer to consume as quickly as it can. This can easily be reproduced with the PerformanceConsumer:

[main:PerformanceConsumer@219] - Start receiving from 1 consumers on 1 destinations
[main:PerformanceConsumer@236] - Throughput received: 1316416.414  msg/s -- 100.435 Mbit/s
[main:PerformanceConsumer@236] - Throughput received: 1334013.524  msg/s -- 101.777 Mbit/s
@bradtm
Copy link
Contributor Author

bradtm commented May 18, 2017

A flexible way to implement the configuration of this feature would be similar to how namespace isolation policies are configured:

  1. A regex can be supplied which maps to a rate limiting policy (basically, maximum messages per second). If the regex matches a topic, then all subscriptions on that topic would get the rate limit applied
  2. A default policy can also be supplied
  3. If there is no match and no default, then the current behavior is in effect (no rate limiting)

If a topic matches more than one policy, it can choose the more restrictive policy.

If using regex'es is too heavyweight, we can opt for having just the default and a (hopefully small) list of overrides where the exact topic name is provided. This gives the ability to rate limit only certain topics (or raise the limit over the default for certain topics).

The motivation for this feature is not to reduce GC pressure, but rather to protect against DOS when consumers consume too high of a rate in the steady state.

@msb-at-yahoo
Copy link
Contributor

Are you sure this is the right approach? It seems like it'd make since to have a global rate limit and then do some sort of fair queuing to the ready sockets on dispatch.

also, is this issue to cover consumers, producers, or both?

@rdhabalia
Copy link
Contributor

rdhabalia commented Jul 20, 2017

I have created PIP for message dispatching throttling. Can we please provide a feedback or add any other additional approach. @merlimat @saandrews

@merlimat
Copy link
Contributor

@rdhabalia Can you add information on the implementation approach? How is the limit going to be enforced?

In all the cases, global-limit vs namespace limit, the policy apply to each topic. That doesn't include any per tenant, namespace or bundle enforcement.

Another thought, should we differentiate between cached vs uncached reads? By throttling cached reads, we could bet set up for a much bigger amount of work later when we have to fetch it from bookies.

per-subscriber throttling

I agree that different quotas per subscription are overkill, though it's not clear to me whether the configured limit is per-topic (eg: shared across all subscriptions) or applied to each individual subscription

Throttling threshold: Message-rate Vs Bytes-rate
Broker reads data from bookkeeper and dispatches it to consumer in form of message entity.
Therefore, it makes more sense to define threshold as message-rate over bytes-rate.

If the objective is to limit CPU usage, I agree. But if we're trying to protect network bandwidth, the the size should be considered.
I would say both should be allowed to apply.

I think that if the objective is to protect broker, another approach would be to a have a per-broker limit. When the limit is reached, the throttling will start applying to the heaviest users. Also, it would be interesting to integrate with load-manager so that heavy users can be potentially "kicked-out" on a isolated broker asap.

@rdhabalia
Copy link
Contributor

Can you add information on the implementation approach?

Sure, I will broker changes details.

Another thought, should we differentiate between cached vs uncached reads?
we could bet set up for a much bigger amount of work later when we have to fetch it from bookies.

Actually, there are two things here: if we are trying to protect broker's resources against a topic which is draining with much higher rate then it make sense to throttle overall msg-out-rate and consider both cached and uncached.
However, it can put higher load on bookie later on because by the time next read comes, bookie might have discarded entries from the cache. But then to address this issue, broker needs visibility of bookie's caching. Broker also cache entry at managed-ledger but broker doesn't do caching if consumer has not caught up or consumer is draining backlog, and one of the need of throttling is for namespace which are draining backlog. so, I think we can throttle both kind of entries.?

though it's not clear to me whether the configured limit is per-topic (eg: shared across all subscriptions) or applied to each individual subscription

Configuration per topic which will be shared across all the subscriptions and only reason is to put cap at the topic so, topic with many subscribers can't misuse the throttling and if it is needed then we can configure higher number of message-rate for the namespace.
However, there is very low probability to have many subscribers. So, do you think individual subscribers can have separate quota would be better option?

I would say both should be allowed to apply.

Actually, both CPU and n/w bandwidth are concerns while draining the backlog. Don't you think allowing both the policies may create unnecessary complexity because then it will be hard to decide which one should be kept as resource consumption is at runtime and can be varied broker to broker?

@merlimat merlimat added the type/feature The PR added a new feature or issue requested a new feature label Jul 21, 2017
@merlimat merlimat added this to the 1.20.0-incubating milestone Jul 21, 2017
@merlimat
Copy link
Contributor

broker needs visibility of bookie's caching

I was talking at the ManagedLedger level cache

@rdhabalia
Copy link
Contributor

think that if the objective is to protect broker, another approach would be to a have a per-broker limit. When the limit is reached, the throttling will start applying to the heaviest users.

Yes, actually earlier in the PIP I have defined Broker level configuration rather Cluster level configuration but team decided to go with Cluster level configuration to make sure if namespace load by broker1 or broker2, both the brokers should throttle with similar rate to have transparency of rate-limiting.

@merlimat
Copy link
Contributor

Yes, actually earlier in the PIP I have defined Broker level configuration rather Cluster level configuration but team decided to go with Cluster level configuration to make sure if namespace load by broker1 or broker2, both the brokers should throttle with similar rate to have transparency of rate-limiting.

That doesn't change the fact that the limit is per-topic.

I was proposing to have the limit per-broker, irrespective of whether there's a per-topic limit or not and whether the limit is configured cluster-wise or at individual brokers level.

That would be to ensure the broker resources are not exceeded. When that happens, delivery will be slowed down

@rdhabalia
Copy link
Contributor

That would be to ensure the broker resources are not exceeded. When that happens, delivery will be slowed down
Also, it would be interesting to integrate with load-manager so that heavy users can be potentially "kicked-out" on a isolated broker asap.

Yes, I agree, then load-manager should identify topic with much higher msgOutRate and unload it. ModularLoadManager considers msgRateOut variable and we can extend its functionality to handle it more accurately.

@rdhabalia
Copy link
Contributor

I was talking at the ManagedLedger level cache

Yes. I think then we do not throttle already caught up consumers (activeCursors) in that way we will never throttle cached entry.

@saandrews
Copy link
Contributor

saandrews commented Jul 21, 2017 via email

@merlimat
Copy link
Contributor

r. So, ideally if broker becomes busy, load manager
should shed load to bring down its load.

Yes, but that might happen later. In the meantime, the broker could immediately throttle the reads. So that it won't even get overloaded.

My point of integrating it with load-manager, is that if the broker is throttling some topics (even though the resources are not 100%), the load-manager should make use of that information in some ways.

@rdhabalia
Copy link
Contributor

Implemented with #634 so, closing it.

@yangou
Copy link

yangou commented May 2, 2020

I know this is probably considered implemented but as this PIP mentioned, and suggested by @bradtm, are we supporting the regex-based per topic rate limit yet? Cause I don't see how to dynamically configure it with the admin tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature The PR added a new feature or issue requested a new feature
Projects
None yet
Development

No branches or pull requests

6 participants