-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add event rate quota per Cloud Foundry organization #21020
Comments
Pinging @elastic/integrations-platforms (Team:Platforms) |
Discussed this with @jsoriano off-issue so wanted to summarize our discussion here:
|
Starting to think a bit about what the configuration for such a processor might look like, here's an initial proposal (not married to any of the setting names, of course):
The way the above example configuration would be interpreted is:
This configuration would allow for complex rate limiting policies to be configured while also allowing simple ones to be configured quite simply. For example, to enforce a rate limit of 500 events per second for events from cloud foundry org
I'm deliberately leaving aside the choice of rate limiting algorithm (fixed window, sliding window, token bucket, leaky bucket) for now. Just trying to focus on the configuration UX first. WDYT @jsoriano? |
@ycombinator thanks for your proposal, it looks quite good, but I have some of observations, maybe we can make the processor simpler. One is that instead of having global and per field configurations, we could rely on conditional processors for more complex configurations. There are already a couple of ways of defining conditional processors, and maybe we can use them for complex configurations, so we can simplify the processor itself.
And more usual configurations are simpler:
Other point is partially a question, allowing to configure multiple fields in the same processor may lead to unexpected results. For example with the following configuration, if multiple apps with the same name exist in multiple orgs, they can be affecting one to each other:
I guess that there can be multiple interpretations for this configuration, the result may depend on the order of the definition, how would this be interpreted? Would events discarded because of the first field be counted for the second?
Or this:
One last thing is about the definition of the rate limits. If multiple values are in the same processor, and they have different units e.g.
But this is may be more an implementation detail, and not so needed if we simplify the processor. |
Yeah, I had also thought of using conditionals for complex conditions but what I didn't like about it is there is a lot of repetition (and therefore chance of making errors) with the field name. So if you take the first example with conditionals:
The field OTOH, I do like the readability of using conditionals — I think it's much more obvious what rate limits will be applied in which cases. So on the whole, I'm +1 to going with the conditionals approach instead of my original proposed syntax. Regarding the question about multiple fields causing ambiguity, I like this proposal:
It's similar to the Regarding the units, I think we should try to keep it as easy for users as possible, which would mean allowing them to define different units for multiple values in the same processor. I think some of this will come down to the implementation of the rate limiting algorithm. If it proves to be too difficult then we can go with your suggestion of asking the user to define a common unit for the entire processor definition. But initially I think it would be nice if we could provide more flexibility to the users and take on the resulting complexity on ourselves in the implementation. |
Reviewing my suggestion, I think one of the
Or do you expect to have more rate limiting methods apart of
Agree, better to build this key internally, it will be less error-prone. I cannot think on any legit use case that would work with
Agree, let's keep them as you proposed by now, we can reconsider this during implementation. In any case I would only see it as a potential problem if we had multiple limits in the same processor, something we don't have in the simplified one. |
No, let's simplify as you suggested and drop the extra level of configuration. |
BTW, I just my 1-1 with @exekias and he suggested naming this processor something like
|
I slightly disagree 🙂 I agree that they do basically the same thing: dropping a part of the total events, but they have different use cases and expectations. Also, we could decide to do different actions in the future. Apart of a default In any case, we could decide to implement sampling in this processor, and if we implement both things in the same processor then I am ok with calling it Take into account that the parameters required for sampling and rate limiting are different, e.g. I may want to get 1% of the metrics, but no more than 100/s, I would need a definition like this one:
Or could be done as this with more specific processors:
|
Continuing with my previous comment, about doing sampling and rate limiting in the same processor. I am thinking now that they cover different use cases, and they probably require two completely different implementations (sampling is simpler than rate limiting), so in my opinion they should be two different processors. |
Thanks @jsoriano for your thoughtful comments on rate limiting vs. sampling. @exekias and I discussed some of these points too (off issue). In the end, I think there are enough differences, in semantics but also in options that might only make sense for either rate limiting or sampling, that I think we should make two separate processors as well. Just to get things moving for now, I'm going to start working on a |
Sounds good folks, sorry for the noise. My comment was around the fact that rate limiting doesn't necessarily imply dropping data, where sampling implies it. I think |
The following are more my own notes for writing tests and docs but posting them here in case anyone sees any issues: Use case: rate limit all events to 10000 /mConfiguration:
Use case: rate limit events from the
|
The PR for a basic Additionally, we will need a follow up PR for this requirement:
Some quick thoughts about this requirement after discussing it with @jsoriano off-issue:
|
Describe the enhancement:
Add some kind of event rate quota per Cloud Foundry organization, to allow to dropping events once some limit is reached.
There is an issue about adding throttling in general (#17775), but this may require many more changes.
In the case of Cloud Foundry we could add the rate limit per organization in the specific input.
Add a field or tag to the events to indicate that they are being throttled.
Describe a specific use case for the enhancement or feature:
On clusters with many organizations it may be good to limit the events rate so organizations make a fair use of resources.
The text was updated successfully, but these errors were encountered: