Improve the caching strategy employed in `utils.py` #84

AlexWaygood · 2023-10-12T22:25:13Z

Various hot functions in utils.py are cached, since they are called with the same arguments from multiple hooks in checkers.py. However, the current cache size is far lower than it needs to be, at 128; many .rst files in CPython have more than 128 paragraphs in them. It's also inefficient, since the cache entries from previous files are retained when checking a new file, but it's very rare that two different docs files have identical paragraphs between them.

We can improve the caching strategy by changing the caches so that they are per-file caches: the cache is allowed to grow without limit while checking any one file, but is cleared after the file has been checked.

This PR cuts around 50% off the time it takes sphinx-lint to check cpython's Doc/ directory on my machine.

Part of #76

AlexWaygood · 2023-10-12T22:31:06Z

Many thanks to @ezio-melotti for pointing out that the current caching strategy was probably woefully inefficient, and to @hugovk for coming up with the idea of clearing the caches after checking every file!

hugovk · 2023-10-12T22:36:45Z

Wow! For me, 1.681s -> 1.004s on CPython docs = 40% faster! 🚀🚀

sphinxlint/utils.py

This reverts commit 7d63ef3.

AlexWaygood · 2023-10-12T22:54:47Z

In 715ffb2, I removed a more complex cache that I added in 7d63ef3. I think any speedup I measured from adding that cache must have been a total fluke. Local experiments indicated that the cache never had any successful hits. And that makes sense -- although it is called from two hooks in checkers.py, from one hook it's called with hidden_block_cb=None, but from the other hook, a function is passed to the hidden_block_cb parameter. We only ever cache the results of the function where hidden_block_cb=None, so the cache is useless!

Since it was an unbounded cache, it was pretty problematic in terms of memory consumption. I can't measure any slowdown from removing it now.

hugovk

Similar timings, 1.606->0.998s 👍

AlexWaygood · 2023-10-13T08:26:57Z

I just checked all the remaining caches. They all look useful to me: they all either have a huge amount of hits after running on a few .rst files in cpython, or they have a very good hit/miss ratio.

Improve the caching strategy employed in utils.py

a901d6a

hugovk reviewed Oct 12, 2023

View reviewed changes

sphinxlint/utils.py Outdated Show resolved Hide resolved

hugovk approved these changes Oct 12, 2023

View reviewed changes

AlexWaygood added 2 commits October 13, 2023 00:48

address review

e3515d9

Revert "A more complex cache for hide_non_rst_blocks()"

715ffb2

This reverts commit 7d63ef3.

AlexWaygood requested a review from hugovk October 12, 2023 22:57

hugovk approved these changes Oct 12, 2023

View reviewed changes

ezio-melotti approved these changes Oct 12, 2023

View reviewed changes

hugovk merged commit 71f72a5 into sphinx-contrib:main Oct 13, 2023
15 checks passed

AlexWaygood deleted the improve-caching branch October 13, 2023 07:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the caching strategy employed in `utils.py` #84

Improve the caching strategy employed in `utils.py` #84

AlexWaygood commented Oct 12, 2023

AlexWaygood commented Oct 12, 2023

hugovk commented Oct 12, 2023

AlexWaygood commented Oct 12, 2023 •

edited

Loading

hugovk left a comment

AlexWaygood commented Oct 13, 2023

Improve the caching strategy employed in utils.py #84

Improve the caching strategy employed in utils.py #84

Conversation

AlexWaygood commented Oct 12, 2023

AlexWaygood commented Oct 12, 2023

hugovk commented Oct 12, 2023

AlexWaygood commented Oct 12, 2023 • edited Loading

hugovk left a comment

Choose a reason for hiding this comment

AlexWaygood commented Oct 13, 2023

Improve the caching strategy employed in `utils.py` #84

Improve the caching strategy employed in `utils.py` #84

AlexWaygood commented Oct 12, 2023 •

edited

Loading