-
-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the caching strategy employed in utils.py
#84
Conversation
Many thanks to @ezio-melotti for pointing out that the current caching strategy was probably woefully inefficient, and to @hugovk for coming up with the idea of clearing the caches after checking every file! |
Wow! For me, 1.681s -> 1.004s on CPython docs = 40% faster! 🚀🚀 |
This reverts commit 7d63ef3.
In 715ffb2, I removed a more complex cache that I added in 7d63ef3. I think any speedup I measured from adding that cache must have been a total fluke. Local experiments indicated that the cache never had any successful hits. And that makes sense -- although it is called from two hooks in Since it was an unbounded cache, it was pretty problematic in terms of memory consumption. I can't measure any slowdown from removing it now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar timings, 1.606->0.998s 👍
I just checked all the remaining caches. They all look useful to me: they all either have a huge amount of hits after running on a few |
Various hot functions in
utils.py
are cached, since they are called with the same arguments from multiple hooks incheckers.py
. However, the current cache size is far lower than it needs to be, at 128; many.rst
files in CPython have more than 128 paragraphs in them. It's also inefficient, since the cache entries from previous files are retained when checking a new file, but it's very rare that two different docs files have identical paragraphs between them.We can improve the caching strategy by changing the caches so that they are per-file caches: the cache is allowed to grow without limit while checking any one file, but is cleared after the file has been checked.
This PR cuts around 50% off the time it takes
sphinx-lint
to check cpython'sDoc/
directory on my machine.Part of #76