-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[8] "Flow Control" How to handle full load? #709
Comments
Possible solutions:
It's on the list for 2.0 |
Stopping collection is not enough. Also the parser's activity is critical as it blows up the amount of data by least a factor of 2 or 3. The expert's impact is only a factor of 1-1.5. (Only estimates, haven't measured anything)
The input pipelines (for e.g. parsers and experts) could stop serving new data if load is too high. Actually not so hard to implement, could also be done in 1.x |
Yes maybe. But not 1.0 please ;)
|
Thank you for this very valuable piece of information. So the parsers should also slow down and only forward data when a sufficient amount of space is available. Somehow the parsers need to buffer the data...
Should bots be capable to talk to each other? Or should this be done by a daemon who watches over all bots? |
A daemon could control the flow by throttling either with the receive_message function or directly in redis: We have a transparent queue between the previous output and next input's queue and messages are pushed from one to the other by the daemon. |
In general there are solutions out there for solving this. For flow control and message queuing systems I am sure there are other such options available. ActiveMQ seems to have it. What I am trying to say: let's take a step back, do some user requirements for phase 2 and then select good frameworks. My 2 cents.
|
Yes, I agree @aaronkaplan But to be more realistic: We can't switch to another messaging queue in the next months after the release, but we need such a feature - not necessarily very mature, but working - to put our setups into production. |
Sure, that's why I am collecting input for 2.0.
There are a couple of intermediate steps I can think of:
In my experience in operating our instance, this is the currently working solution. For a general "let's implement backpressure" approach I recommend we first look at who already implemented it in which framework. |
As this issue is a very distinc problem statement, let's discuss a short- and intermediate-term solution here and for long-term solutions (i.e. other messaging queuing systems) we discuss this later/elsewhere?
|
So there you go :)
|
I don't think we need something complicated to start with. The main goal Plan to Limit the Memory Consumption of the Redis DBGoal: Keep the size of the Redis database under some chosen limit. Now, choose another limit, the
Based on that Collector Bots ChangesThe The collector bot is modified so that it only fetches a new report if
If the conditions are not met, the bot pauses for a while before trying Note that the bots do not need to coordinate and there's no need for Estimating the Memory Requirements.To choose the
In particular, the deduplicator bot stores a hash of all events it
This is not just the size of the JSON serialization, but also e.g.
Some rough notes on the relationships between these numbers: In the worst case, all collector bots fetch new reports at the same time The product of the last two numbers, event size and number of events, The actual lower bound is much less than this, obviously, because the There are also some correlations between the numbers. E.g. the total Measuring memory requirementsIn order to get a feel for some of these numbers, particularly the
Results (allocated bytes after the bot has finished):The input file was a shadowserver Open-Portmapper report, 39562549
¹ The file-input bot uses chunking with a chunk-size of 10000000 bytes, Conclusions:
Caveats:
Problems
However, even if not all collector bots can use the solution outline |
On one of our machines we've tested how the system works with large amount of data. We've managed to get the system swapping, although have a rather huge amount of RAM (24GB). That happened when approx 4 Million events were running through the pipe. A lot of events were stuck at the deduplicator.
How to deal with such situations?
One approach is to stop collection of new data when high load / critical load is detected.
Keywords: Performance, Stability
The text was updated successfully, but these errors were encountered: