Latency percentiles and deeper insight percentiles #8428

schizobeyond-tech · 2023-09-17T16:25:09Z

schizobeyond-tech
Sep 17, 2023

FrameworkBenchmarks should focus far more into latency(and deeper distribution of it beyond min max avg and 75%) of every test due to nature of how important it is in real world, average page today has ~ 50 different requests to load all resources which is a statistical guarantee that most users will hit a 99%ile (yes 99%), not only that but given that in the real world server has to complete requests for thousands of people simultaneously latency impact of 99%ile and beyond, such as 99.99%ile will be far more impactful on website/api response times and hence conversion rate and profits as well as server bills will suffer

"Over a 4 week period, we analysed mobile site data from 37 retail, travel, luxury and lead
generation brands across Europe and the US. Results showed that a mere 0.1s change in
load time can influence every step of the user journey, ultimately increasing conversion rates.
Conversions grew by 8% for retail sites and by 10% for Travel sites on average." [1]

"The e-commerce giant, Amazon, found in a 2009 study that every 100ms in added page load time cost Amazon 1% in revenue. In 2009, that 1% revenue loss equated to $107 million. Today, Amazon's same 1% revenue loss would be about $1.2 billion."

"A 100-millisecond delay in website load time can hurt conversion rates by 7 percent"

Also latency is far more impactful because no server in production will serve requests/sec that we see in techempower benchmarks, however it is an undeniable axiom of reality that for every event there is a latency of it that reflects the delay it took to complete.

It is my firm belief that latency and deeper percentiles beyond 99%, such as 99.9%, 99.99%, 99.999%, 99.9999% would reflect a much better picture of performance of all frameworks under different tests and levels, especially since not so many people understand latency and how percentiles such as 99% work out to be almost a guarantee due to number of requests per page and number of users, as an example of the real world even with 100 customers doing 50 requests each we dont want to lose 10% conversion rate due to 100ms extra latency from api call on checkout (which cannot be cached, so this argument has no ground), even on 100 customers and average conversion rate, with a $50 product this results in thousands lost due to checkout delays..

It would be great to have insight into the following latency measures for all tests/frameworks

avg,min,max,75%,99%,99.9%,99.99%,99.999%,99.9999%

Another thing about avg/median latency is that it is nearly useless since latency is not a uniform distribution, not just that but due to different types of languages, ie native, managed, etc.. and due to different way they handle memory (GC, explicit, variations of different GC etc..) all of these frameworks have different latency profiles and distributions, especially in the high percentiles

(i apologize for low quality image, please zoom in)

Zuckerberg email to tech department

Here are some aditional resources which have shown me just how important latency really is, I believe if people take few moments of their time to read them they will undeniably come to the same conclusion. I also believe we should all strive to create better things which reflect real world measures and not "" heavily optimized with hardcoding "" synthetic benchmarks to be higher ranked.

https://bravenewgeek.com/everything-you-know-about-latency-is-wrong/
https://www.youtube.com/watch?v=lJ8ydIuPFeU
https://latencytipoftheday.blogspot.com/ (all posts)

[1] https://www2.deloitte.com/content/dam/Deloitte/ie/Documents/Consulting/Milliseconds_Make_Millions_report.pdf

schizobeyond-tech · 2023-09-17T16:30:16Z

schizobeyond-tech
Sep 17, 2023
Author

@nbrady-techempower

0 replies

joanhey · 2023-10-03T19:37:40Z

joanhey
Oct 3, 2023

Hi, @schizobeyond-tech
Wow, it don't exist an easy and unique answer to this question, but we will discuss it and try to find solutions.

Techempower Benchmark

I think that the Techempower team do the best, still there will always be possibilities for improvements. With the help of the community.

You have some ways to check the latency in this benchmark.
And remember that wrk will close the connection if don't receive the response in 8 seconds in this bench. So the max latency will be always 8 seconds.

Visualize charts

By default are ordered by avg, but you can order by max latency.

Dstat charts

https://ajdust.github.io/tfbvis/?testrun=Citrine_started2022-06-30_edd8ab2e-018b-4041-92ce-03e5317d35ea&testtype=fortune&round=false

Here you have a lot of options to choose for view and order.

Logs

Each run create the logs of each framework per test, and show the wrk info from all tests.

Example link:
https://tfb-status.techempower.com/unzip/results.2023-09-26-03-12-58-692.zip/results/20230919195956

I think that exist a lot of ways to check the benchmark results and order by max, 99%, ... latency and not only by average.

Latency is relative

We can't check the latency alone, we need to check it with in the context.
One framework can have very low max latency, but it will only serve 1/4 the requests than another framework.

Relative to concurrent users

With more concurrent users, more latency.
You can check it with the logs, the tests run with different number of concurrent users and how the latency also go up.

Relative to app

One framework will be faster using database queries, and another without.
So it also depends of the needs of your app.

Relative to req/s

Framework A can serve max, for example, 1000 req/s at 0.01 ms max latency.
And framework B serve max 5000 req/s at at 0.800 ms max latency, but if framework B need to send only 1000 req/s the max latency is only 0.0001 ms.

Which framework will choose you ? A or B.
Some frameworks are really fast, but don't scale.

Relative to the server

Of course, the req/s and latency is completely dependent from server specs and configuration.

My conclusion

This benchmark is for give us a guide to choose, but more important is that is used by the frameworks developers to improve the performance of their code.

The latency always need to be measured in your production servers, depending of the app, framework, servers and concurrent users, and with that information you need to provision enough servers to have a good latency.

We can have the faster framework, but if you need to serve 2000 concurrent users, with only one server, we have a problem.
We need to have the necessary servers infrastructure to handle that users. And this benchmark is only a help to show the performance of the frameworks.

This is just my two cents.

0 replies

franz1981 · 2023-10-03T19:44:29Z

franz1981
Oct 3, 2023

I hope for fixed rate benchmark we will choose some coordinated omission free load gen eg wrk2 etc etc
The other problem is the meaning of averages for all out throughput benchmarks ie NONE.
For the latter statement see https://youtu.be/v6qNmoCPRMA?si=tXXgtDTCl5DXs9Pt

Or more recent

https://www.p99conf.io/2023/03/28/gil-tene/?utm_medium=social%20media%20-%20organic&utm_source=linkedin&utm_term=bg

0 replies

joanhey · 2023-10-03T20:18:35Z

joanhey
Oct 3, 2023

And remember also that the latency and req/s measured depend of the accuracy of the benchmark.

Which http protocol (1.0, 1.1, 2.0 or 3.0), keep alive, database, ....

And also is very important to stress the server, and check that the CPU usage is more than 95%.
It isn't the same use wrk, than ab or any other load testing tool.

For example, I have seen benchs that use Go for load testing a Go app vs a Rust app.
That bench have very similar results for the Go and Rust app. But if we use wrk the Rust app is a lot of faster.
If you use K6 that is made with Go, never will create enough stress to a Rust app to stress the server (cpu, hd, ...) but the Go app have similar performance than the testing tool, so both Go and Rust show similar results.

Max traffic generation capability

Measure the latency with a good load testing tool to have an accurate data.

Like you said "no many people understand latency ... ", still less people know how to make a good benchmark.

0 replies

schizobeyond-tech · 2023-10-05T05:03:35Z

schizobeyond-tech
Oct 5, 2023
Author

I am aware of all that, as of right now all the tests are heavily cheated and specialized case optimized which is anything but reflection of real world usage for 99.99% of use cases. No one will hardcode UTF-8 bytes as pre-generated API/HTML response. It's very misleading and people trust these results and ranks to pick their frameworks, only to find out months if not years later that what they believed to be true is not actually true at all.

Max, avg measures of latency dont show anything at all, latency does NOT follow a uniform distribution, so it cannot be averaged, we need a good statistical insight into distribution, kurtosis, skew, values, etc.. of latency for each framework/test.

I agree, we should strive towards creating latency benchmarks which reflect different tiers of real world usage, big emphasis on real world here(concurrent requests, multiple concurrent requests per connection, different request methods/POST size etc.., HTTP/2, QUIC.. etc). All benchmarks currently are too synthethic and don't create any real load on the framework, both due to how simple they are and how little variation they have. And as a consequence this just resulted in all frameworks finding ways to hardcode to appear higher on benchmarks. As it stands you are better off picking frameworks below top 30 since it's a more honest reflection of performance for that framework than picking any in topp 30 which are all specialized with bare byte array hardcoding of response.

And yeah obviously different test tools will have different performance and shape of traffic.

Again, a framework which can do 50k RPS but with consitent uniform latency of 10ms is far better than hand optimized byte buffer JSON/plaintext framework benchmark that can do 1,000,000 RPS but has 50ms avg latency and 75% is 100ms++

0 replies

schizobeyond-tech · 2023-10-05T05:34:39Z

schizobeyond-tech
Oct 5, 2023
Author

Oh and to reply to your "latency is relative" so are requests per second, especially when they run on 10 Gbit 28 core xeons or even more powerful hardware.

As the saying goes, performance is not a number, it is a shape. Shape of latency with dozens of variations, not raw req/sec and avg/max latency which is useless.

0 replies

volyrique · 2023-11-30T22:28:29Z

volyrique
Nov 30, 2023

As it stands you are better off picking frameworks below top 30 since it's a more honest reflection of performance for that framework than picking any in topp 30 which are all specialized with bare byte array hardcoding of response.

Are you looking at the plaintext results? If yes, then why?! That is the least realistic test, as acknowledged by the TechEmpower team - it is meant to serve roughly as an upper bound on performance.

Your objections do not apply to the fortunes test (at least concerning the top performers I have looked at), which is the most realistic one - it is shown by default in the result visualization for a reason.

Again, a framework which can do 50k RPS but with consitent uniform latency of 10ms is far better than hand optimized byte buffer JSON/plaintext framework benchmark that can do 1,000,000 RPS but has 50ms avg latency and 75% is 100ms++

If that is your only data, then I have no idea how you have made that conclusion. As @joanhey alluded, you have absolutely no notion of how the second framework behaves at 50k RPS - it may very well have a median latency of 1 ms and a maximum of 9ms, which is clearly better than the first one. On the other hand you know that the first framework has an infinite latency at 1000k RPS (since it is simply unable to reach that level), which is definitely worse, so if you really want to have a preference, it should be for the second framework. But seriously, this comparison does not make any sense at all to begin with, and we do not even need to get into "advanced" topics such as coordinated omission to figure that out.

The simple truth of the matter is that currently the benchmarking toolset (and as a result the continuous benchmarking environment) is not prepared at all to do latency measurements. Yes, it gathers some latency data, but those values are pretty much useless if you want to do any sort of comparison - even if you want to compare a framework with itself after making some optimizations! Unfortunately it is not possible to do proper latency and throughput measurements at the same time, and given that obtaining just the second set of results takes roughly a week, doing both is going to be a really hard sell. On the other hand, I am not sure if the TechEmpower team would be willing to drop the throughput measurement entirely - they are certainly not useless.

3 replies

schizobeyond-tech Dec 1, 2023
Author

No i dont look at plaintext, everything else you wrote is wrong and i wont bother with clueless devs like you. not sure why you even post 4 paragraphs of garbage. Stick to reddit and dont spread your braindamage elsewhere, thanks in advance

schizobeyond-tech Dec 1, 2023
Author

As for techempower benchmarks, they are more marketing than benchmarks, you guys enjoy your circlejerks while i peace the fuck out of it

volyrique Dec 1, 2023

Good riddance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latency percentiles and deeper insight percentiles #8428

{{title}}

Replies: 7 comments 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Latency percentiles and deeper insight percentiles #8428

schizobeyond-tech Sep 17, 2023

Replies: 7 comments · 3 replies

schizobeyond-tech Sep 17, 2023 Author

joanhey Oct 3, 2023

Techempower Benchmark

Visualize charts

Dstat charts

Logs

Latency is relative

Relative to concurrent users

Relative to app

Relative to req/s

Relative to the server

My conclusion

franz1981 Oct 3, 2023

joanhey Oct 3, 2023

schizobeyond-tech Oct 5, 2023 Author

schizobeyond-tech Oct 5, 2023 Author

volyrique Nov 30, 2023

schizobeyond-tech Dec 1, 2023 Author

schizobeyond-tech Dec 1, 2023 Author

volyrique Dec 1, 2023

schizobeyond-tech
Sep 17, 2023

Replies: 7 comments 3 replies

schizobeyond-tech
Sep 17, 2023
Author

joanhey
Oct 3, 2023

franz1981
Oct 3, 2023

joanhey
Oct 3, 2023

schizobeyond-tech
Oct 5, 2023
Author

schizobeyond-tech
Oct 5, 2023
Author

volyrique
Nov 30, 2023

schizobeyond-tech Dec 1, 2023
Author

schizobeyond-tech Dec 1, 2023
Author