Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid unnecessary CMS DB hits for pages that will 404 #15628

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

stevejalim
Copy link
Collaborator

@stevejalim stevejalim commented Dec 3, 2024

This changeset adds a lookahead check before we call wagtail.views.serve to know whether it's worth asking the CMS to serve a page.

The idea is that if the page isn't in the lookahead's data source, we can avoid touching the DB just to ultimately return a 404 earlier.

This will save us DB load, particularly when we get drive-by scans that pepper the site with irrelevant URLs.

  • I used an AI to write some of this code - I bounced some ideas around with ChatGPT, and reworked some of the code suggestions to fit with Bedrock

Significant changes and points to review

Please be sceptical about this, particularly around cache invalidation/cache updating - e.g. when a page is published, unpublished or moved in the page tree.

Important: this change will need to go hand in hand with an infra update that does give us a networked cache (Redis or Memcached) - if we stick with LocMemCache, then while the pods can and will build their own lookahead in local cache, that cache will not be invalidated when a page is published, unpublished or moved.

Issue / Bugzilla link

#15505

Testing

Details to come

Copy link

codecov bot commented Dec 3, 2024

Codecov Report

Attention: Patch coverage is 83.89831% with 19 lines in your changes missing coverage. Please review.

Project coverage is 79.34%. Comparing base (25b38f7) to head (131d341).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
bedrock/cms/signal_handlers.py 51.35% 18 Missing ⚠️
bedrock/cms/apps.py 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15628      +/-   ##
==========================================
+ Coverage   79.28%   79.34%   +0.06%     
==========================================
  Files         159      161       +2     
  Lines        8343     8460     +117     
==========================================
+ Hits         6615     6713      +98     
- Misses       1728     1747      +19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@stevejalim stevejalim force-pushed the 15505-bedrock-perf-pass branch from dc54a7d to dbd3748 Compare December 13, 2024 10:56
@stevejalim stevejalim force-pushed the 15505-add-precog-behaviour branch from 8be921d to c7873f7 Compare December 13, 2024 10:56
@stevejalim
Copy link
Collaborator Author

Talking with Pmac, we're gonna try usign postgres as a DB-backed networked cache, possibly/probably with a locmem cache on each pod.

@stevejalim stevejalim force-pushed the 15505-bedrock-perf-pass branch from dbd3748 to b42a955 Compare January 13, 2025 13:18
Base automatically changed from 15505-bedrock-perf-pass to main January 13, 2025 13:30
Saves 11 SQL queries on the releasnotes page by cacheing the country code
lookup for an hour.

Tested on /en-US/firefox/132.0.1/releasenotes/

Cold cache: 14 queries / 2066ms
Warm cache: 3 queries / 222ms
@stevejalim stevejalim force-pushed the 15505-add-precog-behaviour branch from c7873f7 to 845f97e Compare January 13, 2025 14:23
…CMS for a page

...because if the page isn't in the lookahead, we can avoid touching the DB
just to ultimately return a 404
@stevejalim stevejalim force-pushed the 15505-add-precog-behaviour branch from 845f97e to df5b15d Compare January 13, 2025 14:36
…cache

In order to balance the need for a distributed cache with the speed of a
local-memory cache, we've come up with a couple of helper functions that wrap
the following behaviour:

* If it's in the local-memory cache, return that immediately.
* If it's not, fall back to the DB cache, and if the key exists there,
return that, cacheing it in local memory again on the way through
* If the local memory cache and DB cache both miss, just return the
default value for the helper function

* Set the value in the local memory cache and DB cache at (almost) the same time
* If the DB cache is not reachable (eg the DB is a read-only replica), log
this loudly, as it's a sign the helper has not been used appropriately, but
still set the local-memory version for now, to prevent total failure.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant