Latency percentiles and deeper insight percentiles #8428
Replies: 7 comments 3 replies
-
@nbrady-techempower |
Beta Was this translation helpful? Give feedback.
-
Hi, @schizobeyond-tech Techempower BenchmarkI think that the Techempower team do the best, still there will always be possibilities for improvements. With the help of the community. You have some ways to check the latency in this benchmark. Visualize chartsBy default are ordered by avg, but you can order by max latency. Dstat chartsHere you have a lot of options to choose for view and order. LogsEach run create the logs of each framework per test, and show the Example link: I think that exist a lot of ways to check the benchmark results and order by max, 99%, ... latency and not only by average. Latency is relativeWe can't check the latency alone, we need to check it with in the context. Relative to concurrent usersWith more concurrent users, more latency. Relative to appOne framework will be faster using database queries, and another without. Relative to req/sFramework A can serve max, for example, 1000 req/s at 0.01 ms max latency. Which framework will choose you ? A or B. Relative to the serverOf course, the req/s and latency is completely dependent from server specs and configuration. My conclusionThis benchmark is for give us a guide to choose, but more important is that is used by the frameworks developers to improve the performance of their code. The latency always need to be measured in your production servers, depending of the app, framework, servers and concurrent users, and with that information you need to provision enough servers to have a good latency. We can have the faster framework, but if you need to serve 2000 concurrent users, with only one server, we have a problem. This is just my two cents. |
Beta Was this translation helpful? Give feedback.
-
I hope for fixed rate benchmark we will choose some coordinated omission free load gen eg wrk2 etc etc Or more recent |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
I am aware of all that, as of right now all the tests are heavily cheated and specialized case optimized which is anything but reflection of real world usage for 99.99% of use cases. No one will hardcode UTF-8 bytes as pre-generated API/HTML response. It's very misleading and people trust these results and ranks to pick their frameworks, only to find out months if not years later that what they believed to be true is not actually true at all. Max, avg measures of latency dont show anything at all, latency does NOT follow a uniform distribution, so it cannot be averaged, we need a good statistical insight into distribution, kurtosis, skew, values, etc.. of latency for each framework/test. I agree, we should strive towards creating latency benchmarks which reflect different tiers of real world usage, big emphasis on real world here(concurrent requests, multiple concurrent requests per connection, different request methods/POST size etc.., HTTP/2, QUIC.. etc). All benchmarks currently are too synthethic and don't create any real load on the framework, both due to how simple they are and how little variation they have. And as a consequence this just resulted in all frameworks finding ways to hardcode to appear higher on benchmarks. As it stands you are better off picking frameworks below top 30 since it's a more honest reflection of performance for that framework than picking any in topp 30 which are all specialized with bare byte array hardcoding of response. And yeah obviously different test tools will have different performance and shape of traffic. Again, a framework which can do 50k RPS but with consitent uniform latency of 10ms is far better than hand optimized byte buffer JSON/plaintext framework benchmark that can do 1,000,000 RPS but has 50ms avg latency and 75% is 100ms++ |
Beta Was this translation helpful? Give feedback.
-
Oh and to reply to your "latency is relative" so are requests per second, especially when they run on 10 Gbit 28 core xeons or even more powerful hardware. As the saying goes, performance is not a number, it is a shape. Shape of latency with dozens of variations, not raw req/sec and avg/max latency which is useless. |
Beta Was this translation helpful? Give feedback.
-
Are you looking at the plaintext results? If yes, then why?! That is the least realistic test, as acknowledged by the TechEmpower team - it is meant to serve roughly as an upper bound on performance. Your objections do not apply to the fortunes test (at least concerning the top performers I have looked at), which is the most realistic one - it is shown by default in the result visualization for a reason.
If that is your only data, then I have no idea how you have made that conclusion. As @joanhey alluded, you have absolutely no notion of how the second framework behaves at 50k RPS - it may very well have a median latency of 1 ms and a maximum of 9ms, which is clearly better than the first one. On the other hand you know that the first framework has an infinite latency at 1000k RPS (since it is simply unable to reach that level), which is definitely worse, so if you really want to have a preference, it should be for the second framework. But seriously, this comparison does not make any sense at all to begin with, and we do not even need to get into "advanced" topics such as coordinated omission to figure that out. The simple truth of the matter is that currently the benchmarking toolset (and as a result the continuous benchmarking environment) is not prepared at all to do latency measurements. Yes, it gathers some latency data, but those values are pretty much useless if you want to do any sort of comparison - even if you want to compare a framework with itself after making some optimizations! Unfortunately it is not possible to do proper latency and throughput measurements at the same time, and given that obtaining just the second set of results takes roughly a week, doing both is going to be a really hard sell. On the other hand, I am not sure if the TechEmpower team would be willing to drop the throughput measurement entirely - they are certainly not useless. |
Beta Was this translation helpful? Give feedback.
-
FrameworkBenchmarks should focus far more into latency(and deeper distribution of it beyond min max avg and 75%) of every test due to nature of how important it is in real world, average page today has ~ 50 different requests to load all resources which is a statistical guarantee that most users will hit a 99%ile (yes 99%), not only that but given that in the real world server has to complete requests for thousands of people simultaneously latency impact of 99%ile and beyond, such as 99.99%ile will be far more impactful on website/api response times and hence conversion rate and profits as well as server bills will suffer
"Over a 4 week period, we analysed mobile site data from 37 retail, travel, luxury and lead
generation brands across Europe and the US. Results showed that a mere 0.1s change in
load time can influence every step of the user journey, ultimately increasing conversion rates.
Conversions grew by 8% for retail sites and by 10% for Travel sites on average." [1]
"The e-commerce giant, Amazon, found in a 2009 study that every 100ms in added page load time cost Amazon 1% in revenue. In 2009, that 1% revenue loss equated to $107 million. Today, Amazon's same 1% revenue loss would be about $1.2 billion."
"A 100-millisecond delay in website load time can hurt conversion rates by 7 percent"
Also latency is far more impactful because no server in production will serve requests/sec that we see in techempower benchmarks, however it is an undeniable axiom of reality that for every event there is a latency of it that reflects the delay it took to complete.
It is my firm belief that latency and deeper percentiles beyond 99%, such as 99.9%, 99.99%, 99.999%, 99.9999% would reflect a much better picture of performance of all frameworks under different tests and levels, especially since not so many people understand latency and how percentiles such as 99% work out to be almost a guarantee due to number of requests per page and number of users, as an example of the real world even with 100 customers doing 50 requests each we dont want to lose 10% conversion rate due to 100ms extra latency from api call on checkout (which cannot be cached, so this argument has no ground), even on 100 customers and average conversion rate, with a $50 product this results in thousands lost due to checkout delays..
It would be great to have insight into the following latency measures for all tests/frameworks
Another thing about avg/median latency is that it is nearly useless since latency is not a uniform distribution, not just that but due to different types of languages, ie native, managed, etc.. and due to different way they handle memory (GC, explicit, variations of different GC etc..) all of these frameworks have different latency profiles and distributions, especially in the high percentiles
(i apologize for low quality image, please zoom in)
Zuckerberg email to tech department
Here are some aditional resources which have shown me just how important latency really is, I believe if people take few moments of their time to read them they will undeniably come to the same conclusion. I also believe we should all strive to create better things which reflect real world measures and not "" heavily optimized with hardcoding "" synthetic benchmarks to be higher ranked.
https://bravenewgeek.com/everything-you-know-about-latency-is-wrong/
https://www.youtube.com/watch?v=lJ8ydIuPFeU
https://latencytipoftheday.blogspot.com/ (all posts)
[1] https://www2.deloitte.com/content/dam/Deloitte/ie/Documents/Consulting/Milliseconds_Make_Millions_report.pdf
Beta Was this translation helpful? Give feedback.
All reactions