Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: check endpoint handler is async only once #2536

Merged
merged 3 commits into from
Apr 20, 2024

Conversation

sk-
Copy link
Contributor

@sk- sk- commented Mar 7, 2024

Summary

We improve the dispatch in the routing module to only check once whether the handler is async. This gives an improvement of 2.5% (sync), 1.82% (async) in the number of requests/s. The average latency decreased 1.6% (sync) and 1.5% (async).

Note that we had to use a cast in the helper function, as the typeguard does not work for the negative case. In the main branch the code is working without a cast, because the typeguard return type is in practice AwaitableCallable[Any], which end up swallowing the other types in the union.

Benchmarking

We use a simple json app, with both a sync and async endpoint, and the wrk tool to get the measurements. WE also show the profile shown by the snakeviz tool.

The measurements were done on a Macbook Pro with M1 chip, 16GB of memory and macOS 12.3. The Python version used for the tests is Python 3.12.2, and the uvicorn version is 0.27.1

Before

$ wrk http://localhost:8000/sync
Running 10s test @ http://localhost:8000/sync
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   733.77us   55.57us   3.21ms   78.35%
    Req/Sec     6.84k   147.06     7.15k    87.13%
  137474 requests in 10.10s, 18.35MB read
Requests/sec:  13610.69
Transfer/sec:      1.82MB

$ wrk http://localhost:8000/async
Running 10s test @ http://localhost:8000/async
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   717.14us   49.05us   1.83ms   71.11%
    Req/Sec     7.00k   112.14     7.36k    76.24%
  140613 requests in 10.10s, 18.77MB read
Requests/sec:  13922.97
Transfer/sec:      1.86MB
Screen Shot 2024-03-07 at 17 10 20 Screen Shot 2024-03-07 at 17 09 45

After

$ wrk http://localhost:8000/sync
Running 10s test @ http://localhost:8000/sync
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   721.34us  202.40us  11.13ms   99.32%
    Req/Sec     7.01k   230.04     7.62k    94.00%
  139558 requests in 10.00s, 18.63MB read
Requests/sec:  13956.14
Transfer/sec:      1.86MB

$ wrk http://localhost:8000/async
Running 10s test @ http://localhost:8000/async
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   706.04us  109.90us   7.46ms   98.30%
    Req/Sec     7.12k   136.09     7.39k    90.59%
  143188 requests in 10.10s, 19.12MB read
Requests/sec:  14176.95
Transfer/sec:      1.89MB
Screen Shot 2024-03-07 at 17 10 37 Screen Shot 2024-03-07 at 17 06 15

Test App

The app used for the test is as follows

from starlette.applications import Starlette
from starlette.responses import JSONResponse
from starlette.routing import Route
import uvicorn

async def async_page(request):
    return JSONResponse({'status': 'ok'})

async def sync_page(request):
    return JSONResponse({'status': 'ok'})

app = Starlette(routes=[
    Route('/async', async_page),
    Route('/sync', sync_page),
])

if __name__ == "__main__":
    uvicorn.run("app:app", port=8000, log_level="critical")

Checklist

  • I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.

The two points above are not checked as this is just refactoring code, and it does not require any new tests or docs. If required I could add tests for the added helper function _to_async.

We improve the dispatch in the routing module to only check once whether the handler is async. This gives an improvement of 2.5% (sync), 1.82% (async) in the number of requests/s. The average latency decreased 1.6% (sync) and 1.5% (async).

Note that we had to use a cast in the helper function, as the typeguard does not work for the negative case. In the main branch the code is working without a cast, because the typeguard return type is in practice `AwaitableCAllable[Any]`, which end up swallowing the other types in the union.

Benchmarking
We use a simple json app, with both a sync and async endpoint, and the wrk tool to get the measurements.

The measuerements were done on a Macbook Pro with M1 chip, 16GB of memory and macOS 12.3. The Python version used for the tests is Python 3.12.2, and the uvicorn version is 0.27.1

Before
```
$ wrk http://localhost:8000/sync
Running 10s test @ http://localhost:8000/sync
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   733.77us   55.57us   3.21ms   78.35%
    Req/Sec     6.84k   147.06     7.15k    87.13%
  137474 requests in 10.10s, 18.35MB read
Requests/sec:  13610.69
Transfer/sec:      1.82MB

$ wrk http://localhost:8000/async
Running 10s test @ http://localhost:8000/async
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   717.14us   49.05us   1.83ms   71.11%
    Req/Sec     7.00k   112.14     7.36k    76.24%
  140613 requests in 10.10s, 18.77MB read
Requests/sec:  13922.97
Transfer/sec:      1.86MB
````

After
```
$ wrk http://localhost:8000/sync
Running 10s test @ http://localhost:8000/sync
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   721.34us  202.40us  11.13ms   99.32%
    Req/Sec     7.01k   230.04     7.62k    94.00%
  139558 requests in 10.00s, 18.63MB read
Requests/sec:  13956.14
Transfer/sec:      1.86MB

$ wrk http://localhost:8000/async
Running 10s test @ http://localhost:8000/async
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   706.04us  109.90us   7.46ms   98.30%
    Req/Sec     7.12k   136.09     7.39k    90.59%
  143188 requests in 10.10s, 19.12MB read
Requests/sec:  14176.95
Transfer/sec:      1.89MB
```

The app used for the test is as follows
```python
from starlette.applications import Starlette
from starlette.responses import JSONResponse
from starlette.routing import Route
import uvicorn

async def async_page(request):
    return JSONResponse({'status': 'ok'})

async def sync_page(request):
    return JSONResponse({'status': 'ok'})

app = Starlette(routes=[
    Route('/async', async_page),
    Route('/sync', sync_page),
])

if __name__ == "__main__":
    uvicorn.run("app:app", port=8000, log_level="critical")
```
Copy link
Member

@Kludex Kludex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if those changes are worth it, but...

starlette/routing.py Outdated Show resolved Hide resolved
@sk-
Copy link
Contributor Author

sk- commented Mar 16, 2024

I'm not sure if those changes are worth it, but...

Could you share why you think the changes are not worth it?

Any improvements on removing blocking code would help all applications, as by removing unneeded operations we reduce the amount of blocking code, which affects (tail) latencies. This is a no brainer if the code in question is in fact unneeded because it's doing the same thing over an over again.

@Kludex Kludex enabled auto-merge (squash) April 20, 2024 08:24
@Kludex Kludex merged commit 96c90f2 into encode:master Apr 20, 2024
5 checks passed
nixroxursox pushed a commit to nixroxursox/starlette that referenced this pull request Jul 20, 2024
* refactor: check endpoint handler is async only once

We improve the dispatch in the routing module to only check once whether the handler is async. This gives an improvement of 2.5% (sync), 1.82% (async) in the number of requests/s. The average latency decreased 1.6% (sync) and 1.5% (async).

Note that we had to use a cast in the helper function, as the typeguard does not work for the negative case. In the main branch the code is working without a cast, because the typeguard return type is in practice `AwaitableCAllable[Any]`, which end up swallowing the other types in the union.

Benchmarking
We use a simple json app, with both a sync and async endpoint, and the wrk tool to get the measurements.

The measuerements were done on a Macbook Pro with M1 chip, 16GB of memory and macOS 12.3. The Python version used for the tests is Python 3.12.2, and the uvicorn version is 0.27.1

Before
```
$ wrk http://localhost:8000/sync
Running 10s test @ http://localhost:8000/sync
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   733.77us   55.57us   3.21ms   78.35%
    Req/Sec     6.84k   147.06     7.15k    87.13%
  137474 requests in 10.10s, 18.35MB read
Requests/sec:  13610.69
Transfer/sec:      1.82MB

$ wrk http://localhost:8000/async
Running 10s test @ http://localhost:8000/async
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   717.14us   49.05us   1.83ms   71.11%
    Req/Sec     7.00k   112.14     7.36k    76.24%
  140613 requests in 10.10s, 18.77MB read
Requests/sec:  13922.97
Transfer/sec:      1.86MB
````

After
```
$ wrk http://localhost:8000/sync
Running 10s test @ http://localhost:8000/sync
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   721.34us  202.40us  11.13ms   99.32%
    Req/Sec     7.01k   230.04     7.62k    94.00%
  139558 requests in 10.00s, 18.63MB read
Requests/sec:  13956.14
Transfer/sec:      1.86MB

$ wrk http://localhost:8000/async
Running 10s test @ http://localhost:8000/async
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   706.04us  109.90us   7.46ms   98.30%
    Req/Sec     7.12k   136.09     7.39k    90.59%
  143188 requests in 10.10s, 19.12MB read
Requests/sec:  14176.95
Transfer/sec:      1.89MB
```

The app used for the test is as follows
```python
from starlette.applications import Starlette
from starlette.responses import JSONResponse
from starlette.routing import Route
import uvicorn

async def async_page(request):
    return JSONResponse({'status': 'ok'})

async def sync_page(request):
    return JSONResponse({'status': 'ok'})

app = Starlette(routes=[
    Route('/async', async_page),
    Route('/sync', sync_page),
])

if __name__ == "__main__":
    uvicorn.run("app:app", port=8000, log_level="critical")
```

* Apply PR suggestion

---------

Co-authored-by: Marcelo Trylesinski <marcelotryle@gmail.com>
@Masynchin
Copy link

Same optimization can be applied here:

if self.is_async:
await self.func(*self.args, **self.kwargs)
else:
await run_in_threadpool(self.func, *self.args, **self.kwargs)

But is this snippet executed more than once?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants