Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index resources in the UrlDispatcher to avoid linear search for most cases #7829

Merged
merged 64 commits into from
Nov 27, 2023

Conversation

bdraco
Copy link
Member

@bdraco bdraco commented Nov 12, 2023

What do these changes do?

The time complexity of the UrlDispatcher on a hit averaged O(n)

The UrlDispatcher now keeps an index of all resources and will attempt to avoid doing a linear search to resolve. On a hit the performance should be nearly O(url_parts) on a miss the time complexity will be O(domains) + O(url_parts)

This works for prefixed subapps as well.

sub apps added by add_domain (aka MatchedSubAppResource) will fallback to linear searching as they are not indexable since they support wildcard/regexs

Are there changes in behavior for the user?

A fixed/static path will always be preferred over a dynamic path.

Related issue number

fixes #7828

benchmark

import asyncio
import timeit
from unittest.mock import MagicMock, AsyncMock

from yarl import URL

from aiohttp import web
from aiohttp.web_urldispatcher import (
    UrlDispatcher,
)

URL_COUNT = 5000


def make_mock_request(url: str) -> web.Request:
    return web.Request(
        MagicMock(url=URL(url), method="GET"),
        MagicMock(),
        protocol=MagicMock(),
        host="example.com",
        task=MagicMock(),
        loop=asyncio.get_running_loop(),
        payload_writer=MagicMock(),
    )


async def url_dispatcher_performance():
    """Filter out large cookies from the cookie jar."""
    dispatcher = UrlDispatcher()
    for i in range(URL_COUNT):
        dispatcher.add_get(f"/static/sub/path{i}", AsyncMock())

    first_url = "/static/sub/path0"
    last_url = f"/static/sub/path{URL_COUNT-1}"
    long_url = "/static/lots/of/sub/path/segments/0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15"
    mid_size_url = "/static/midsize/of/sub/path/segments"
    miss_url = "/is/a/miss"
    long_miss_url = "/is/a/miss/with/lots/of/sub/path/segments/0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15"

    dispatcher.add_get(long_url, AsyncMock())
    dispatcher.add_get(mid_size_url, AsyncMock())

    request_first_url = make_mock_request(first_url)
    match_info = await dispatcher.resolve(request_first_url)
    assert match_info.route.resource.canonical == first_url

    request_last_url = make_mock_request(last_url)
    match_info = await dispatcher.resolve(request_last_url)
    assert match_info.route.resource.canonical == last_url

    request_long_url = make_mock_request(long_url)
    match_info = await dispatcher.resolve(request_long_url)
    assert match_info.route.resource.canonical == long_url

    request_mid_size_url = make_mock_request(mid_size_url)
    match_info = await dispatcher.resolve(request_mid_size_url)
    assert match_info.route.resource.canonical == mid_size_url

    request_miss_url = make_mock_request(miss_url)
    match_info = await dispatcher.resolve(request_miss_url)
    assert match_info.route.status == 404

    request_long_miss_url = make_mock_request(long_miss_url)
    match_info = await dispatcher.resolve(request_long_miss_url)
    assert match_info.route.status == 404

    async def resolve_times(request: web.Request, times: int) -> float:
        start = timeit.default_timer()
        for _ in range(times):
            await dispatcher.resolve(request)
        end = timeit.default_timer()
        return end - start

    run_count = 2000
    resolve_time_first_url = await resolve_times(request_first_url, run_count)
    resolve_time_last_url = await resolve_times(request_last_url, run_count)
    resolve_time_mid_size_url = await resolve_times(request_mid_size_url, run_count)
    resolve_time_long_url = await resolve_times(request_long_url, run_count)
    resolve_time_miss = await resolve_times(request_miss_url, run_count)
    resolve_time_long_miss = await resolve_times(request_long_miss_url, run_count)
    print(f"resolve_time_first_url: {resolve_time_first_url}")
    print(f"resolve_time_last_url: {resolve_time_last_url}")
    print(f"resolve_time_mid_size_url: {resolve_time_mid_size_url}")
    print(f"resolve_time_long_url: {resolve_time_long_url}")
    print(f"resolve_time_miss: {resolve_time_miss}")
    print(f"resolve_time_long_miss: {resolve_time_long_miss}")


asyncio.run(url_dispatcher_performance())

before (5002 urls)

resolve_time_first_url: 0.0016546659753657877
resolve_time_last_url: 2.253484041953925
resolve_time_mid_size_url: 2.1529546250239946
resolve_time_long_url: 2.136738500033971
resolve_time_miss: 2.152159374963958
resolve_time_long_miss: 2.173428333015181

before (7 urls)

resolve_time_first_url: 0.0019221249967813492
resolve_time_last_url: 0.0038797500310465693
resolve_time_mid_size_url: 0.004664750013034791
resolve_time_long_url: 0.004066875029820949
resolve_time_miss: 0.00846966594690457
resolve_time_long_miss: 0.008330333046615124

after (5002 urls)

resolve_time_first_url: 0.0019938750192523003
resolve_time_last_url: 0.0020484999986365438
resolve_time_mid_size_url: 0.002027958049438894
resolve_time_long_url: 0.0019290419877506793
resolve_time_miss: 0.0035507080028764904
resolve_time_long_miss: 0.007968166959472

after (7 urls)

resolve_time_first_url: 0.0019937920151278377
resolve_time_last_url: 0.002007792005315423
resolve_time_mid_size_url: 0.0019652920309454203
resolve_time_long_url: 0.0019985410035587847
resolve_time_miss: 0.003473666962236166
resolve_time_long_miss: 0.007722083013504744

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes
  • If you provide code modification, please add yourself to CONTRIBUTORS.txt
    • The format is <Name> <Surname>.
    • Please keep alphabetical order, the file is sorted by names.
  • Add a new news fragment into the CHANGES folder
    • name it <issue_id>.<type> for example (588.bugfix)
    • if you don't have an issue_id change it to the pr id after creating the pr
    • ensure type is one of the following:
      • .feature: Signifying a new feature.
      • .bugfix: Signifying a bug fix.
      • .doc: Signifying a documentation improvement.
      • .removal: Signifying a deprecation or removal of public API.
      • .misc: A ticket has been closed, but it is not of interest to users.
    • Make sure to use full sentences with correct case and punctuation, for example: "Fix issue with non-ascii contents in doctest text files."

aiohttp/web_urldispatcher.py Outdated Show resolved Hide resolved
@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Nov 12, 2023
Copy link

codecov bot commented Nov 12, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (c0377bf) 97.41% compared to head (162a9ea) 97.41%.
Report is 1 commits behind head on master.

Files Patch % Lines
aiohttp/web_urldispatcher.py 97.14% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7829   +/-   ##
=======================================
  Coverage   97.41%   97.41%           
=======================================
  Files         107      107           
  Lines       32263    32347   +84     
  Branches     3750     3753    +3     
=======================================
+ Hits        31428    31511   +83     
  Misses        632      632           
- Partials      203      204    +1     
Flag Coverage Δ
CI-GHA 97.33% <98.94%> (+<0.01%) ⬆️
OS-Linux 97.00% <98.94%> (+<0.01%) ⬆️
OS-Windows 95.50% <98.94%> (+<0.01%) ⬆️
OS-macOS 96.81% <98.94%> (-0.01%) ⬇️
Py-3.10.11 95.42% <98.94%> (+<0.01%) ⬆️
Py-3.10.13 96.80% <98.94%> (-0.01%) ⬇️
Py-3.11.5 96.39% <98.94%> (?)
Py-3.11.6 96.51% <98.94%> (+<0.01%) ⬆️
Py-3.12.0 96.61% <98.94%> (+<0.01%) ⬆️
Py-3.8.10 95.39% <98.94%> (+<0.01%) ⬆️
Py-3.8.18 96.73% <98.94%> (+<0.01%) ⬆️
Py-3.9.13 95.39% <98.94%> (+<0.01%) ⬆️
Py-3.9.18 96.77% <98.94%> (+<0.01%) ⬆️
Py-pypy7.3.13 96.23% <98.94%> (+<0.01%) ⬆️
VM-macos 96.81% <98.94%> (-0.01%) ⬇️
VM-ubuntu 97.00% <98.94%> (+<0.01%) ⬆️
VM-windows 95.50% <98.94%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bdraco bdraco marked this pull request as ready for review November 12, 2023 23:54
@bdraco bdraco requested a review from asvetlov as a code owner November 12, 2023 23:54
@bdraco bdraco changed the title Index resources in the UrlDispatcher to avoid linear search on hit Index resources in the UrlDispatcher to avoid linear search for most cases Nov 14, 2023
@Dreamsorcerer Dreamsorcerer merged commit 2a3eaa1 into aio-libs:master Nov 27, 2023
32 of 34 checks passed
Copy link
Contributor

patchback bot commented Nov 27, 2023

Backport to 3.10: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.10/2a3eaa11dc2f8e6150eede9337182f273e14c20a/pr-7829

Backported as #7917

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

patchback bot pushed a commit that referenced this pull request Nov 27, 2023
Dreamsorcerer added a commit that referenced this pull request Nov 27, 2023
…er to avoid linear search for most cases (#7917)

**This is a backport of PR #7829 as merged into master
(2a3eaa1).**

---------

Co-authored-by: J. Nick Koston <nick@koston.org>
Co-authored-by: Sam Bull <git@sambull.org>
bdraco pushed a commit that referenced this pull request Aug 1, 2024
…able is preceded by a fixed string after a slash (#8578)

Co-authored-by: J. Nick Koston <nick@koston.org>
fix for a regression in 3.10.x. Regressed in #7829 fixes #8567
bdraco pushed a commit that referenced this pull request Aug 1, 2024
…able is preceded by a fixed string after a slash (#8579)

Co-authored-by: J. Nick Koston <nick@koston.org>
fix for a regression in 3.10.x. Regressed in #7829 fixes #8567
@bdraco bdraco deleted the fast_url_dispatcher branch November 15, 2024 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bot:chronographer:provided There is a change note present in this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UrlDispatcher improvements
3 participants