Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the caching strategy employed in utils.py #84

Merged
merged 3 commits into from
Oct 13, 2023

Conversation

AlexWaygood
Copy link
Collaborator

Various hot functions in utils.py are cached, since they are called with the same arguments from multiple hooks in checkers.py. However, the current cache size is far lower than it needs to be, at 128; many .rst files in CPython have more than 128 paragraphs in them. It's also inefficient, since the cache entries from previous files are retained when checking a new file, but it's very rare that two different docs files have identical paragraphs between them.

We can improve the caching strategy by changing the caches so that they are per-file caches: the cache is allowed to grow without limit while checking any one file, but is cleared after the file has been checked.

This PR cuts around 50% off the time it takes sphinx-lint to check cpython's Doc/ directory on my machine.

Part of #76

@AlexWaygood
Copy link
Collaborator Author

Many thanks to @ezio-melotti for pointing out that the current caching strategy was probably woefully inefficient, and to @hugovk for coming up with the idea of clearing the caches after checking every file!

@hugovk
Copy link
Collaborator

hugovk commented Oct 12, 2023

Wow! For me, 1.681s -> 1.004s on CPython docs = 40% faster! 🚀🚀

sphinxlint/utils.py Outdated Show resolved Hide resolved
@AlexWaygood
Copy link
Collaborator Author

AlexWaygood commented Oct 12, 2023

In 715ffb2, I removed a more complex cache that I added in 7d63ef3. I think any speedup I measured from adding that cache must have been a total fluke. Local experiments indicated that the cache never had any successful hits. And that makes sense -- although it is called from two hooks in checkers.py, from one hook it's called with hidden_block_cb=None, but from the other hook, a function is passed to the hidden_block_cb parameter. We only ever cache the results of the function where hidden_block_cb=None, so the cache is useless!

Since it was an unbounded cache, it was pretty problematic in terms of memory consumption. I can't measure any slowdown from removing it now.

Copy link
Collaborator

@hugovk hugovk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar timings, 1.606->0.998s 👍

@hugovk hugovk merged commit 71f72a5 into sphinx-contrib:main Oct 13, 2023
15 checks passed
@AlexWaygood AlexWaygood deleted the improve-caching branch October 13, 2023 07:11
@AlexWaygood
Copy link
Collaborator Author

I just checked all the remaining caches. They all look useful to me: they all either have a huge amount of hits after running on a few .rst files in cpython, or they have a very good hit/miss ratio.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants