-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-126156: Improve performance of creating Morsel
objects
#126157
Conversation
Replaces the manually constructed loop with a call to `dict.update`
Misc/NEWS.d/next/Library/2024-10-30-00-12-22.gh-issue-126156.BOSqv0.rst
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have benchmarks using pyperf instead of timeit please? In addition can you tell us whether benchmarks are using a debug build, a release build or a PGO build? (the last one would be the realistic one). Thanks!
I doubt we'll see much of a speedup on pyperformance, this is a tiny change. The timeit benchmark is more important to me here. |
I wasn't talking about pyperformance but with pyperf. I want to know whether it's a release or debug build as well (this matters a lot). In addition, this could help us in quantizing the precise improvements in a NEWS entry (using the geometric mean, not just max-comparisons). |
Python 3.14.0a1 (v3.14.0a1:8cdaca8b25e, Oct 15 2024, 15:51:57) [Clang 16.0.0 (clang-1600.0.26.3)] on darwin (python mac installer)
Python 3.13.0rc2 (v3.13.0rc2:ec610069637, Sep 6 2024, 17:57:29) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin` (python mac installer)
Python 3.11.8 (main, Feb 6 2024, 21:21:21) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin (homebrew)
Benchmark from http.cookies import Morsel
import timeit
import pyperf
class FastMorselWithUpdate(dict):
"""A class to hold ONE (key, value) pair.
In a cookie, each such pair may have several attributes, so this class is
used to keep the attributes associated with the appropriate key,value pair.
This class also includes a coded_value attribute, which is used to hold
the network representation of the value.
"""
# RFC 2109 lists these attributes as reserved:
# path comment domain
# max-age secure version
#
# For historical reasons, these attributes are also reserved:
# expires
#
# This is an extension from Microsoft:
# httponly
#
# This dictionary provides a mapping from the lowercase
# variant on the left to the appropriate traditional
# formatting on the right.
_reserved = {
"expires": "expires",
"path": "Path",
"comment": "Comment",
"domain": "Domain",
"max-age": "Max-Age",
"secure": "Secure",
"httponly": "HttpOnly",
"version": "Version",
"samesite": "SameSite",
}
_reserved_defaults = dict.fromkeys(_reserved, "")
_flags = {"secure", "httponly"}
def __init__(self):
# Set defaults
self._key = self._value = self._coded_value = None
# Set default attributes
dict.update(self, self._reserved_defaults)
runner = pyperf.Runner()
runner.bench_func(name="Morsel", func=Morsel)
runner.bench_func(name="FastMorsel", func=FastMorselWithUpdate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that those numbers are for a release + PGO build and they look great! Overall, using a class variable for defaults reduces the variation in the running time so it's better as well IMO. The numbers I got are as follows:
Using dict.update(self, self._reserved_defaults)
.....................
Morsel (ref): Mean +- std dev: 484 ns +- 11 ns
.....................
Morsel (new): Mean +- std dev: 124 ns +- 3 ns
+-----------+--------+----------------------+
| Benchmark | ref | new |
+===========+========+======================+
| Morsel | 484 ns | 124 ns: 3.89x faster |
+-----------+--------+----------------------+
Using self |= self._reserved_defaults
.....................
Morsel (ref): Mean +- std dev: 477 ns +- 13 ns
.....................
Morsel (new): Mean +- std dev: 106 ns +- 2 ns
+-----------+--------+----------------------+
| Benchmark | ref | new |
+===========+========+======================+
| Morsel | 477 ns | 106 ns: 4.52x faster |
+-----------+--------+----------------------+
So we can be a bit more faster if we use the |=
operator instead (I think we can also update the NEWS entry to include the improvement factor which is 4.5 in non-debug optimized builds).
Misc/NEWS.d/next/Library/2024-10-30-00-12-22.gh-issue-126156.BOSqv0.rst
Outdated
Show resolved
Hide resolved
Yes, they are
I made the adjustment and updated the NEWS entry |
Ah! I just wondered something. Is there any reason to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Typically, we don't like to accept changes for performance increases on pure-Python code, but I think this one is small enough that it's ok.
AFAICT it would only be a concern for Running the change as aio-libs/aiohttp#9594 as well |
Isn't it assuming that there is no C implementation (and I don't think there is, or is there?)?
Yes, but what about a user overriding it? maybe we should keep |
This reverts commit c8e3f36.
Misc/NEWS.d/next/Library/2024-10-30-00-12-22.gh-issue-126156.BOSqv0.rst
Outdated
Show resolved
Hide resolved
I guess someone might subclass it so I reverted to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that and sorry for the back and forth!
Misc/NEWS.d/next/Library/2024-10-30-00-12-22.gh-issue-126156.BOSqv0.rst
Outdated
Show resolved
Hide resolved
…OSqv0.rst Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
No worries. It was quite constructive. Thanks. |
@ZeroIntensity we do accept performance improvements in pure Python code. They just a) need to either be substantial improvements or in very hot code, b) not hurt readability, c) not meaningfully change semantics. We also consider ourselves to be not perfect judges of "not meaningfully change semantics" (this is e.g. an important reason why we reject cosmetic changes / refactors) This case is motivated by real-world profiling in addition to having a meaningful result on microbenchmark, and is arguably more readable. All to say, thanks for a nice change! :-) |
@hauntsaninja Yeah, agreeing with all of that. I more meant that it's common for performance increases in pure-Python to not meet those requirements, and this PR is small enough that it's not a problem. Sorry for the confusion! |
Thanks |
…thon#126157) Replaces the manually constructed loop with a call to `dict.update`
Replaces the manually constructed loop with using the
dict.update
on a pre-constructeddict
This works out to 3.8x faster, benchmark below
Morsel.__init__
loops reserved values and could be replaced with a simple dict update #126156