-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve percentiles computation with HDR histogram #92
Closed
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're sending requests at a fixed rate, you should use
RecordCorrectedValue
to compensate for GC pauses in vegeta itself.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the specific case where the load gen is truly constant throughput (i.e. will speed up to catch up if it falls behind), you can avoid using correction/compensation in the histogram by computing the latency correctly:
Latency = requestEndTime - (systemStartTime + requestNumber / throughput);
This strongly depends on the loader actually working to stay on pace. It wouldn't be valid otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vegeta relies on a periodic timer to maintain a constant request rate. From the documentation, we have:
From this, I take that if a GC pause is in course while a tick is to be delivered, the tick will be dropped. I could be wrong, though.
Regarding scheduling, Vegeta uses 1 go routine per request by default. This issues quite some pressure on the runtime scheduler but in theory it reduces coordinated omission by making it possible for each request to be promptly executed.
This setting can be limited by the user with the
workers=n
flag. In this case, the probability of dropping ticks increases.In conclusion, it seems we have 2 possible sources of execution delays:
workers
)With these in mind, I wouldn't say Vegeta is always constant throughput but does a pretty reliable job in most situations, with sane defaults. So I would say that using
RecordCorrectedValue
is the best approach. Let me know what you guys think.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RecordCorrectedValue math/logic will work if you know that there were no interleaved requests that happened during the expected interval (e.g. it is valid within each serial request issuing connection), and therefore know that there are a specific number of "missing" requests that should have been sent but weren't. But it's not clear to me that you attacker knows that (can there be another attacker sending on the same connection?).
Given that your goal is to maintain a constant throughout, I would avoid relying on the scheduler to do time math for you, or on compensation inside the histogram recording. I would instead modify the code to make decisions based on actual time accounting, and to make sure that the right number of requests is actually sent, timing the latency of each from the time it was supposed to be sent until it actually responded. The simplest way to do that is to keep issuing requests as long as "not enough requests have been issued up to now". In order to avoid back-to-back sends when "behind" (which can create an unrealistic huge spike after a pause on either the sender or receiver), it is usually useful to do "catchup" at a limited throughout (e.g. 2x the normal throughput).
This usually involves keeping track of how many requests have actually been completed in a "connection", and doing the math based purely on that count, the required throughput (in that connection), and the current time.
E.g. can have the hit daisy-chain within a "connection" by kicking off another hit each time they are done doing what they need to (scheduled for when the next request in this "connection" needs to be send according to the connection's required throughput) . What a hit "needs to do" is to either send the request (if the time has arrived) or skip the request if it is not yet time to send it. The hit should always re-schedule another hit for the proper time to send the next request (which may be "now").
You can see an example of logic to calculate the time until the next request should be sent here: https://github.com/giltene/wrk2/blob/master/src/wrk.c#L461
It basically uses the current number of completed requests and the current time to decide how long to wait before the next request sis sent (or 0 if it is time to send the request). Providing the time until the next send rather than just a "send now" boolean indication is useful because it allows you to sleep/wait/delay checking again until that time arrives.
You can see logic for computing the latency when a response is complete her:
https://github.com/giltene/wrk2/blob/master/src/wrk.c#L539
(ignore all the nice debug-related output and the u_latency_histogram math, focus on the calculation that feeds the recoding into latency_histogram).
If you use this logic, your throughput (averaged over time) will remain constant, catching up to missed requests at an increased (but not back to back) rate. With a such a constant throughout plan actually being executed, your latency reporting will be precise, since you have a known plan (to execute requests at a given constant pace) and can determine exactly what the latency is for each response based purely on it's arrival time (ignoring its actual send time, which is susceptible to coordination when you are temporarily off-pace).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for the detailed insight. I really appreciate it. Before I dive into this though, I’d like to first simulate a load test where the potential issues with my current approach become obvious. If you have anything in mind for this, please let me know!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trivial way to simulate the load test in question is to ^Z the server being tested for 20 seconds, while your load generator is trying to generate 200K reqs/sec. If at the end of the test the number of issued request is equal to actual_run_time * 200K, then you are not dropping things. If it's not, you have s problem.
You can also use this test to sanity check your reported latencies. You can hand- compute some percentiles from the known behavior. E.g. If the total run time is 40 seconds and the pause is 20 secinds, your 99%'ile should show 19.6 seconds, your 95%'ile should show 18 seconds, etc. If that's not what theyvshow, they are wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now everything makes sense. Vegeta wasn't built to generate that kind of throughput and its upper bound is significantly lower. Vegeta makes the distinction between load testing (analysing your server under different kinds of load) and benchmarking (testing your server's at peak capacity). It was built to load test individual HTTP services within a Service Oriented Architecture which sustain several orders of magnitude less requests per second.
I am not so sure that Go would be my first choice in building a benchmarking tool in
wrk
's category but I will definitely play around with your algorithms. Once again, thanks for your input!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can replace 200K/sec with 200/sec above and still do the same test... The ^Z sniff test applies to anything that reports latency stats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are the results for that request rate, suspending the server for ~10s.
You may explore the data further with this interactive plot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks perfect! Well calibrated.