-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add in-memory cache to DB cache to deal with DB write failures #9639
Add in-memory cache to DB cache to deal with DB write failures #9639
Conversation
…-deal-with-db-write-failures
Test the buildOption 1. Jetpack Beta
Option 2. Jurassic Ninja - available for logged-in A12s🚀 Launch a JN site with this branch 🚀 ℹ️ Install this Tampermonkey script to get more options. Build info:
Note: the build is updated when a new commit is pushed to this PR. |
Size Change: 0 B Total Size: 1.34 MB ℹ️ View Unchanged
|
…-deal-with-db-write-failures
…-deal-with-db-write-failures
…-deal-with-db-write-failures
includes/class-database-cache.php
Outdated
if ( isset( $this->db_cache_write_errors[ $key ] ) && | ||
$this->db_cache_write_errors[ $key ] > 0 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fairly difficult to follow, but I think I know what it's doing.
But if we get rid of the maybe_
as outlined above and always write to memory first and then attempt a DB write later, then perhaps db_cache_write_errors
will be redundant? Because the $cache_contents
will be whatever is in memory, so we'd catch things in the is_expired
check below.
I'm hoping we could remove db_cache_write_errors
and simplify things to be just one array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This suggestion sounds sane to me. I cannot imagine a performance hit if we always use the in-memory cache fallback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marcinbot, I searched long and hard for a way to forego the use of db_cache_write_errors
, but I haven't found one.
Now that we got rid of the cache size management let me explain the current logic; maybe you spot something I missed:
- we always initialize the in-memory cache when reading (or writing) the cache.
- we always use the data in the in-memory cache (there will be only one
get_option
call per key, per request), unless the cache entry is deleted or we are forced to refresh mid-request - if a cache entry needs to be refreshed (missing, expired, invalid, etc), we need to know if there were prior DB write failures:
- if there were no failures, we will proceed as usual and refresh the cached data
- if there were failures, we don't try the refresh and instead use what we have in the in-memory cache (data fetched from our platform but not written to the DB). Without knowing of the previous DB write failures, we cannot stop continuously refreshing the data on every call to
get_or_add
.
I don't think we have the option of making the in-memory cache sticky per an entire request because of the force_refresh
logic.
Does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there were no failures, we will proceed as usual and refresh the cached data
But looking at the code, whether or not there were failures, we will then proceed to check the in-memory value and return based on that value's timestamp.
So let's say there's a DB error and we didn't have the errors array:
- The in-memory array already has an updated value
- The value has not been propagated to the DB, but that's ok
- On a subsequent call within the same request, we look at the in-memory value and see that it has been refreshed recently
I think the problem might be this block, which I think we could remove since it's all about one request anyway:
if ( ! $cache_written ) {
// If the cache write failed, mark as not refreshed.
$refreshed = false;
}
Regarding the force_refresh
flag mid-request: it seems it's being checked before the errors array so it will force the refresh regardless of the errors even in the current code.
But I haven't done any debugging. I think the way to do it would be to place some breakpoints and custom conditionals to simulate an error on update_option
and see how the code behaves. You might have already done that, though, in which case I'm happy to back off :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, man! It was staring me in the face and I didn't see it - I got too caught up with errored responses, but they are not retried until the cache expires.
I removed the DB write errors handling, and it works like a charm! Tested it with correct platform responses, empty array, and errors (rate limited), all with simulated DB write failures.
Thank you for not backing off so easily, Marcin! Now, we have a truly elegant solution. Excellent!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
Review checklist
Parameter | Result |
---|---|
Approach | ✅ All good |
Implementation | 👍 A couple of observations noted above. |
Automated tests | 👍 Looking good, but we may need to revise them depending on whether we move away from conditional caching based on DB write errors. |
Naming | ✅ All good |
Typing | ✅ All good |
Code comments | 🔝 Excellent! |
Changelog | 👍 Almost good (left a comment) |
Documentation | Not applicable |
PR description | ✅ All good |
Testing instructions | 🔝 Excellent! |
Manual testing | ✅ Works as expected |
Scenarios tested
Scenario | Result | Screenshot |
---|---|---|
Simulate the inability to write to DB options table | ✅ Page loads became painfully slow (100+ seconds), with an equal number of get_cached_account_data and get_cached_account_data_generator calls |
|
Switch to this PR's branch | ✅ Page loads became fast again, with an only 1 call to get_cached_account_data_generator |
|
Delete the wcpay_account_data option from DB |
✅ Only 1 call to get_cached_account_data_generator |
|
Turn off WP options simulation | ✅ Only 1 call to get_cached_account_data_generator on first page load and 0 calls on subsequent page loads |
|
Reset account cache from WCPay Dev Tools | ✅ Only 1 call to get_cached_account_data_generator on first page load and 0 calls on subsequent page loads |
|
Check the top-level Payments pages - Overview, Transaction, Disputes, Deposits, Settings | ✅ No application errors related to missing account cache noted throughout my testing. All pages loaded without errors. |
changelog/update-add-in-memory-cache-to-db-cache-to-deal-with-db-write-failures
Outdated
Show resolved
Hide resolved
/** | ||
* In-memory cache for the duration of a single request. | ||
* | ||
* This is used to avoid multiple database reads for the same data and as a backstop in case the database write fails, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cruel nitpick: It's the first time I've come across the term 'backstop'. IMO, 'fallback' is a more common term and we should use it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick reply: "Backstop" and "fallback" are not perfect synonyms. In our case, we always "pass through" the in-memory cache (as opposed to falling back on it), and, in certain cases, we don't move past it (hence being a backstop).
A true fallback is the system we have in core for Payment Gateways Recommendations: if we are allowed to fetch remotely, we will; but if we are not allowed or fail to, we fall back to using the hard-coded ones (defaults).
WDYT? 😜
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before (when the in-memory cache was used conditionally based on DB write errors), I thought it was a fallback to the WP options cache (not the WP cache object) in most cases.
After (now that in-memory cache is always used), 'backstop' feels more accurate. But that also means we're introducing a potentially unfamiliar term, so it's worth clarifying somewhere. Unless it's just me :)
includes/class-database-cache.php
Outdated
if ( isset( $this->db_cache_write_errors[ $key ] ) && | ||
$this->db_cache_write_errors[ $key ] > 0 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This suggestion sounds sane to me. I cannot imagine a performance hit if we always use the in-memory cache fallback.
…-deal-with-db-write-failures
…-deal-with-db-write-failures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment below, otherwise pre-approving!
…-deal-with-db-write-failures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Related https://github.com/Automattic/woocommerce-payments-server/issues/2826
Changes proposed in this Pull Request
There are situations where a WordPress site will run out of disk space and, as a consequence, all DB writes will fail, but DB reads will continue to work. You end up with a pseudo-working site that seems fine, if left untouched.
The trouble is that our DB cache entries will expire, and the system will want to refresh/re-fetch the data from our platform and attempt to write it to the DB cache - this will fail, causing each cache read to do a new fetch!!! You would think that the WP object cache would provide a backstop to this, but no; when writing a WP option with
update_option
, first the DB query is attempted, and only if it succeeds the data is written to the WP object cache (this behavior makes sense for the purposes of the WP object cache) - in our scenario, the new data is never written to the WP object cache.Of course, the site is in dire need of attention from the admin, but WooPayments needs to do its best not to "throw gasoline at the problem" and also manage its API requests.
To reduce the negative side effects of this scenario, in this PR, we introduce an in-memory cache to hold the data for the duration of a request, regardless of whether it came from the DB or was fetched on the fly from our platform.
This in-memory cache functions as a backstop in case of failures to write DB options - for the remainder of the request we will use the data we got in the in-memory cache and not attempt further fetches and DB writes. As a bonus, the in-memory cache should introduce a slight performance increase since we avoid reaching the WP object cache, for sites with a working object cache, and quite a performance increase for sites with no working object cache since we avoid reaching the DB for every cache read.
Testing instructions
Prep work
develop
branch in your local repodocker/mu-plugins/local-helpers/profiling.php
with the following contents:docker/mu-plugins/0-local-helpers.php
file (create it if you don't have it) by adding this line:\WC_Payments_Account::get_cached_account_data
method and its generator are called:Local testing
docker/wordpress/wp-content/debug.log
file). You should seeget_cached_account_data
called many times (tens or even hundreds of times), but zero calls toget_cached_account_data_generator
since the account data is cached in the DB.docker/wordpress/wp-includes/option.php
file and addreturn false;
in theupdate_option()
function in two places (line 909 and 944, in the latest WP version):wcpay_account_data
option in thewp_options
table, and change thefetched
timestamp at the end to something days in the past; the following value will suffice:get_cached_account_data
andget_cached_account_data_generator
calls (and each page loading terribly slow - tens of seconds):git stash
command, checkout this PR's branch, and reapply the changes by running thegit stash pop
command.get_cached_account_data
but only oneget_cached_account_data_generator
calls:wcpay_account_data
option in thewp_options
table. This will also cause the account data to be fetched since we have no data saved in the DB cache.get_cached_account_data
but only oneget_cached_account_data_generator
calls.trunk
, and simulate theGET.accounts
endpoint returning errors by adding one of the following lines here at the beginning of the endpoint controller callback\WCPay\Accounts_Controller::get_basic_account_info
- you can try all of them, one at a time:get_cached_account_data
but only oneget_cached_account_data_generator
calls.return false;
lines added to thedocker/wordpress/wp-includes/option.php
file to bring your DB back in working order.get_cached_account_data_generator
call since the cache was not set and the data needed to be fetched.get_cached_account_data_generator
callsnpm run changelog
to add a changelog file, choosepatch
to leave it empty if the change is not significant. You can add multiple changelog files in one PR by running this command a few times.Post merge