-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rationalise Python platforms benchmarks #8055
Conversation
f43f870
to
095a86a
Compare
I like this rationally but not every WebFramework calls WSGI and ASGI the same way, like Quart have worst performance on socketify than uvicorn, and pure socketify ASGI is way faster than uvicorn (almost double). So not every framework has the same percent/avg uplift using socketify. I can remove all tests and only keep pure ASGI, WSGI, and socketify itself but People should be able to compare different servers on popular web frameworks on python. As I said, not every framework uses ASGI/WSGI in the same way and may not have the same avg difference to leverage the faster server. PyPy is another thing, some WebFrameworks have their overhead reduced and can have much better performance than using CPython. Most servers do not run very well or at all on PyPy. Take a look at django, that have 288,565 on PyPy vs 92,043 on CPython with socketify and about 70k using meinheld, and meinheld on Falcon is equal to or faster than socketify in CPython. Raw socketify WSGI is 1,561,530 on PyPy and 697,312 on CPython, meinheld should be close to or better than socketify on CPython, and is not compatible with PyPy. We can limit the number of benchmarks for each WebFramework (only keep the fastest), I think this is fine but people should be able to know on what server it's running and why. And Composite scores should only be grouped when using the same server + runtime. |
095a86a
to
f21cbe6
Compare
Not so simple. Some WSGI/ASGI servers may have some tweaks that allow you to work more efficiently with a particular framework. For example, look my tweak for Tweak in All these tweaks give a significant speed boost in some use cases. |
As I promise the naming issue was addressed here: https://github.com/TechEmpower/FrameworkBenchmarks/pull/8058/files My opinion about rationalizing benchmarks is that unfortunately is not possible, WebFrameworks diverges a lot in overhead and implementation if you include PyPy it's even more difficult to rationalize. Adding different frameworks help-me a lot to find bugs in WSGI, ASGI implementations (I even opened issues on granian using this information, but never posted granian on TFB using other frameworks because I know you do not approve this and I respect your decision). I also want to state here, that I disagree with keeping old/dead projects on benchmarks like vibora and japronto. Meinheld is not being maintained too but at least is used by a lot of people. The only prize we get, being better over time in TFB, is getting a better understanding of the behavior and scaling of our application, and being able to compare the same hardware with other implementations. So keeping dead projects is only hurting the benchmark time. I still want to create a cloud environment (12vcores or more) to run the different benchmarks, like tracking CPU, Memory, IO usage in each benchmark to identify bottlenecks and also add more types of payloads (different sizes). For payloads my idea was: 'Hello, World!' (13 bytes) avoid HTTP/1.1 pipelining use, also adding POST data benchmarks, and in the future adding WebSockets and others. In this case, I will not test JSON performance or database, but instead, create another benchmark to test different JSON serializers/deserializers, and database connectors separated. And also I will only add Python Benchmarks, with some other languages as references like Express, and Fastify using node.js, Golang gnet, fiber and gin, asp.net core, and Rust ntex, to be a reference. But for this, I need more planning and time. |
Given the comments, I gonna close this. @nbrady-techempower feel free to continue the discussion, re-open this or extract parts from this. |
The main rationale behind this is to avoid mixing frameworks tests and platforms ones in Python.
We can match platforms with servers, and thus testing different servers on different frameworks makes quite no sense to me, as:
RT
, and we resumeavg(RT)
in tests, we can actually computeRT = PT + FT
wherePT
is the platform or server time andFT
is the actual framework timeRT
is always a composition of the used server and the framework, the single benchmark won't add any useful information on the table in case we have also platforms benchmarksAlso, since we have "Composite scores" grouped by frameworks, it gets very complicated to understand such values as they can come from different implementations.
Skipping these benchmarks will:
Details of changes:
pypy
as explicit andCPython
as implicit to align to common usagegunicorn
testhypercorn
testThis will stay draft until I checked all the involved points.
In the meantime, a discussion can be started, I would like to have opinions from @cirospaciari, @remittor and @nbrady-techempower