-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes / Updates / Dates for Round 21 #7289
Comments
Could you clarify exactly the changes. Thank you !! |
Thanks @joanhey - It's really just the db driver clarification (which currently only affects postgres). I updated the issue and will continue to as needed.
There's no need to open discussions about not approved changes. I won't really be monitoring that. It's fine to post here if you need to. People should only care about the pinned post, but it's the easiest way to get a notification to my inbox, since I'm extremely busy with other work atm. |
I see two options, not exclusive:
We could have a table to keep track of each framework and the verification status. If contributors are not sure, then someone could help them with manual verification (I volunteer). Which also means document the verification steps. |
Question:
Because some frameworks have more or similar req/s in fortunes than in 1-query. That is very odd. If yes, I will update some frameworks before the next round. |
I'm also happy to help with manual verification. Let me know if/when that's needed. I'd probably use the process I described in this PR comment. I did do that verification on most of the top frameworks already but I didn't record the results anywhere. The gist was that the frameworks in the tier above vertx in multiquery all failed verification, but those in the same tier were ok. |
The database conneciton is OK |
You can see the source code here. query statements is cached for fortune test and it's just a single query. For single query test you have to type check the input id and properly encoding it which is more work to do that could very well be slower than a sort.( Edit: I forget to add that there is also extra cost of parsing the query string to a number from uri path in single query test. |
It's not only a sort, it's also to be properly escape all the rows. But the important think is now how fast is the fw, but a more tecnicallly question it's the response from the db and to the request, is larger. But if you explain, the rest will learn. |
It's actually an interesting topic. The top scorers in single query are using batch mode and they have a significant perf drop in fortune test. I suspect it has to do with io read overhead by batching on large response. |
Yes all, but not all fw |
The response in Xitca-web fortunes is cached ?? |
No. You can see the source code here.
Fortune take ownership of the data it owns. And in Rust when ownership goes out of scope all memory it associated with is dropped. |
Sorry I don't want start a war :) In the top 30 frameworks, the issue is not just speed, but the size of the network response. |
That's a legit question and I understand your concern. But the result is what it is. |
@nbrady-techempower Hi! We (quarkus team) plan to send a PR to upgrade to the latest version (and other changes), but
which date exactly? many thanks! |
No date, exactly. As usual, we're pretty backed up with client-facing work, but I'd get it in as soon as possible. I'll talk to the team today, but I'd say we'll shoot for closing PR's by the end of next week. |
Hi everyone! Very excited to get Round 21 out the door. I'm out on Friday and a US Holiday on Monday, let's close PR's for Round 21 on Tuesday, May 31. Everything opened before then will be QA'd and merged for the upcoming round. This round might take a few extra preview runs than normal as we have to identify some frameworks that need updating to comply with the rules before we perform the run for the round. |
One is Justjs
https://just.billywhizz.io/blog/on-javascript-performance-01/ @nbrady-techempower |
author of just-js here. thanks for pinging me @joanhey. i would have missed this deadline if you hadn't! 😅 i should be able to get a PR ready by monday to fix this issue in current just-js release and might also be able to upgrade to latest just-js framework and libraries. i did some testing today against the top performing frameworks and, as @michaelhixson mentioned above, it seems there are still a number that will fail to meet the new requirements - i am not sure how to ping the authors so maybe @nbrady-techempower or @michaelhixson can reach out to them so they have a chance to make changes before the deadline. here is what i found when i sniffed what was being put on the wire for the latest master branch. i should be able to test some more frameworks tomorrow once i have a PR ready with fixes for just-js. i still feel what just-js currently does should be allowed for multi-query and update tests - it appends a SYNC for batch of queries in every http request but it seems this won't be allowed according to new rule. 😢 just to be clear in case folks aren't understanding the new requirement:
here is what correct and incorrect behaviour looks like in terms of what we should see on the wire. there's a lot more detail in the very long debate we had a few months ago, pass
fail
on another note - @nbrady-techempower @michaelhixson would it be possible going forward to just cut and publish a new round of the benchmarks monthly or on some regular schedule? it seems this wouldn't be too much work on your end as the tests are already running continuously and results are generated automatically - i am happy to help with effort in any way i can if you don't have time to work on this. the last round was more than a year ago and i think would be useful for folks to have them published on a regular, known schedule. imho, it doesn't need a writeup or anything like that for each published. round. happy to raise this as a separate issue if you think would be a good idea. |
@billywhizz Thanks for looking into this and for the additional work looking into the other top frameworks. Thanks for your comment about regular rounds. It's something we've always wanted to do, but it does require a lot of work on our end. The compromise we made with this years ago is the continuous reporting on https://tfb-status.techempower.com Additionally, we underwent some changes within the company that also delayed things this year. Ideally, I'd like to see a round released every quarter, but that also includes doing the due diligence of checking the top performers during several preview rounds before we release as "official" so in that sense, you're definitely already helping, and it's much appreciated! |
just to point something out on this - it is actually possible that fortunes RPS could be greater than single query. if database is the bottleneck and we have spare cpu cycles on the web server and don't saturate the network interface, then fortunes could run faster than single query. this may be to do with fact that single query has to provide a parameter which adds load on the outbound connection to database and possibly requires a little more work inside postgres to parse and execute. fortunes is just a simple "select *" without a where clause. this tool is useful for seeing where cpu is not fully utilized on the web server, indicating some other bottleneck, likely network saturation (for plaintext) or database overload. |
@fafhrd91 it looks like this improvement is due to a recent upgrade of the ntex framework in TE repository. This upgrade pulled in an updated Rust postgres library which you seem to maintain. The reason it's performing much faster now is because the sync on every command that was happening has been removed and it seems now it only syncs on every 50 commands per connection. I just wanted to point this out because, as you can see in the discussion above, this behaviour will no longer be allowed and you will need to change it if you want to be included in the next official run (please correct me if i am wrong on this @nbrady-techempower). also pinging maintainers of following frameworks which i have checked - you will all need to submit PRs in order to be compliant with new rules too as far as i can see.
|
That's correct. Preview runs will be starting this week and I'll be removing tests that aren't resolved. PRs to add those tests back will need to address the issues above. |
Seems to be a regressive move!! |
hi @sumeetchhetri. i had a very long debate with the TE folks and others who proposed this change but the consensus seems to be that it's necessary in order to ensure frameworks are using the most common, safe patterns rather than optimising for the specific tests in the benchmark suite. i can understand that POV even i don't wholly agree with it. 🤷♂️ @nbrady-techempower is there any leeway to extend the deadline a little considering the fact quite a few maintainers, including myself, are only now finding out about it? if not, could you clarify when the exact UTC time is for PR's to be submitted for the cutoff? thx. |
Here is what I understand right now: Only in the update test, we cannot batch updates commands since all of them will be rolled back in anyone fails (this is what lithium-postgres-batch is doing). Batching the select queries of all tests is accepted since there is no concepts of transactions for selection. @billywhizz what I understand from you post is stricter: pgsql batching is forbidden on all tests for all queries (lithium-postgres already complies with this). So at the end I'm not sure to understand the new rule. Could you @nbrady-techempower make it clearer by saying explicitly in which tests and on which request pgsql batching is forbidden ? Thanks ! |
yes - this is the rule that is being enforced as i understand it - not my decision and not something i agree with (i would be fine with batching of syncs within the context of a single http request) but this is the new rule as explained above. also, from my testing, lithium breaks this rule on all tests currently on master branch so will be excluded from the next official round unless you change it to have an explicit sync for every single query that is sent to postgres. i.e. postgres "pipelining" is not allowed on any of the tests. @nbrady-techempower should be able to confirm this is the case. 🙏 at this stage there seems to be so much confusion that, given the amount of effort maintainers put into these benchmarks, i would suggest postponing this deadline and giving maintainers who were not aware of the deadline or misunderstood the new requirement the opportunity to make changes so they are not excluded from the next official round. |
@nbrady-techempower I have noticed that the rust/ntex framework had not been appearing in the unofficial results since early March (about 4 months ago) and it did not build/run for the run which became Round 21. It turns out this was due to a aconfig error in a commit to this repo back then which evidently errored out the build process very early on in each run -- since ntex did not even show up in the "Did not complete" status at the bottom of each result page for any of those runs. @fafhrd91 has submitted a PR to this repo (in the past 24 hours) at #7439 to address this problem. Is there any possibility once this is merged that an additional run could be arranged for the official "Round 21" results so that the TechEmpower Benchmarks will not be missing results for this popular framework? As a reference, it had managed to score 4th overall in the Composite results in Round 20. (note -- I am just a public user of this web framework; I have no affiliation with the development team) |
PHP Symfony was working without problems, till the official run. |
Thanks everyone. Unfortunately, we just had to bring the machines down. Emergency maintenance in the IT room. I'm not sure when we'll be back up. I don't think we want to set the precedent of doing additional runs just because a fw or 2 failed. We could do another 6-day run and then a different popular framework might fail, so we're going to leave as is. Unless the results are off across the board for some reason, we're going to go with this run. What we can do is get Round 22 out in just a few months. |
tfb-status / citrine are still experiencing some technical difficulties. It may not be resolved this weekend. Round 21 results will be officially posted on Monday. I'll open a new issue for Round 22 which I would like to publish in Oct/Nov. |
@nbrady-techempower I value this benchmark and don't want to cause any suspicions about the results. As such, as the author, I am asking that FaF be excluded from this round of official results as I mentioned in #7402 until we have a satisfactory explanation. |
@errantmind I appreciate that very much. No problem. |
Other people are making the tricks, and say nothing. But I hope the round 22, will be more correct. |
We need to clarify more the rules. |
@joanhey i think the previous suggestion to add a random number on each request to the extra row for fortunes test would be a good one to avoid any caching of results. maybe we should create an issue with suggestions for further tightening of the rules so they are all in one place? there are also some changes that could be made to make it easier to automatically verify compliance with the rules - when i tried to do this across many frameworks for the pipelining it was v difficult without manual work due to the current structure of the requirements and the various tricks different frameworks get up to. in doing this work i noticed a number of frameworks which warm up their caches (this is currently allowed, but i don't think it should be) before tests start and also ones that run profiling before tests start and re-compile themselves based on profiling information. not sure that should be allowed either. it also makes it more difficult to have any rigour in verifying expected number of db requests against actual as tests are run. |
I'm not sure if it's intended but the link from the Round 21 page that says it is going to the Blog is actually pointing here. I would very much be interested in a summary of results/changes as was done in the past. I'm not currently seeing that in the blog, which is still at round 20. |
@rcollette It was intended. The only real changes for this round was the rules change listed atop the thread. In the future, we'd like some maintainers to write a small blurb about things they did/encountered when preparing for the next round. For now, there won't be a blog post. |
I certainly look forward to that type of blog post in the future. |
I am pretty sure you mean |
i was thinking of lithium in particular as that was the one i had noticed doing this - am sure there are others too. https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/frameworks/C%2B%2B/lithium/compile.sh#L22 i don't think it's an "unfair" advantage when this is a natural advantage JIT has in the real world and is a possible reason to choose JIT over static, depending on the workload. what's the point in the benchmark if it doesn't give us an insight into which languages and platforms might have advantages over others because everyone has heavily optimised their entry to fit the specific tests? |
I suppose you have done that by accident, but what you wrote might give the impression that you consider feedback-directed/profile-guided optimization for static languages to be somehow not a real-world optimization and to be akin to "coding to the benchmark" - it is not. It is a well-understood generic optimization mechanism, and there is work to make it applicable to an even wider range of use cases (e.g. AutoFDO). In fact, the relative ease with which both And yes, obviously I disagree that this is a "natural" advantage of JIT runtimes. The real advantages are:
In particular, the second advantage would be perfectly well reflected in the results even if FDO/PGO is allowed for languages that are compiled ahead-of-time. |
@volyrique i'll leave it up to the TE folks to decide but there's much confusion about what these benchmarks are for and there seems to be a perception out there (on HN/twitter/reddit) that they are rendered meaningless/ridiculous by the extreme lengths maintainers go to in order to achieve the top scores. the reality i think is only a tiny fraction of devs out there are interested in those kind of micro-optimizations and would prefer to see a realistic comparison of standard implementations handling a range of different workloads without specific optimizations. i myself was on the other side of this debate in wanting to see the more extreme optimizations and understand what would be possible if we optimized everything we could, but i have been won over to the realistic side of the argument since. it might be best to have two distinct categories for "realistic" and "optimized" and have two completely different rankings for them? it certainly seems the status quo is too difficult to police and leaves too much room for "gaming" the rules. |
This is a false dichotomy - there are certainly generic micro-optimizations, and PGO for the C programs in this repository tends to gravitate towards this category IMHO.
I fail to see how that is going to help with the policing issue - it is just changing labels, unless I am missing something. Also, we kind of already have the same thing with the |
Is there an easy way to click-through and see which versions of frameworks were used, which serializers, etc...? |
Hello. How to find out what version of Node.js, Java and .NET is used? |
could someone tag the commit that was tested in round 21? it looks like it was 0db323061e4e258d1ce66a34ea2132f8beef5cc8. |
Also add a link to the details of the run: After search by the commit id, I think that is this: |
will Bun's web framework be included in next round's benchmark? like elysia or Bun's native http service, curious how fast it could be compared with other frameworks. Thanks. |
I'd say the half year mark is probably the cutoff for saying there is improvement on that front. |
It's been a trying year with a big move and hopefully some new hardware very soon. Hopefully the continuous benchmarking we implemented is enough for now. If there's a reason you need one of these runs marked as official, I'd love to hear about it! |
@nbrady-techempower actually I am interested in the blog posts explaining the changes more than the actual raw results :) so if you were to name one run official and have an explanation on the changes, that would be super interesting. |
@psmulovics Thanks for the insight! We actually didn't do a blog post for Round 21 😅 I would really love to collect some insights from contributors about what they did to improve their respective frameworks for the next round. The updates from us are the boring kernel and db updates. 😉 |
Yes - the 21 'blogpost' was this github issue 😂 |
We would like to set our eyes on a new official Round release. Because there have been some rules clarifications, we're going to need the community's help in identifying frameworks whose maintainers need to be pinged for updates so they can be included in the official results.
We'll aim for mid-to-late May for an official preview run. I'll add dates here with a couple week's notice before locking PR's.
Rules Clarification:
Dates:
The text was updated successfully, but these errors were encountered: