-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hash: A waiting list match is a formal revalidation #4032
Conversation
Rebased against master and force-pushed to solve a merge conflict introduced by 0835051. |
Rebased against master and force-pushed to solve a merge conflict introduced by 7aeca4d. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the overall goal to make rushes more specific. Also, the Vary check of the referenced oc upon waitinglist return seems to make a lot of sense to me.
But when it comes to the details of the changes, it seems I do not understand their motivation well enough, and until I understand the motivation, they seem questionable to me.
bin/varnishd/cache/cache_hash.c
Outdated
@@ -568,7 +570,7 @@ HSH_Lookup(struct req *req, struct objcore **ocp, struct objcore **bocp) | |||
return (HSH_MISS_EXP); | |||
} | |||
|
|||
AN(busy_found); | |||
CHECK_OBJ_NOTNULL(busy_oc, OBJCORE_MAGIC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: the oc was already magic-checked within the VTAILQ_FOREACH()
(L426)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I did this to really mark the difference between a mere boolean and an actual struct pointer, but it's purely cosmetic.
bin/varnishd/cache/cache_hash.c
Outdated
case HSH_HITMISS_EXP: | ||
CHECK_OBJ_NOTNULL(oc, OBJCORE_MAGIC); | ||
AZ(busy_oc); | ||
xid = VXID(ObjGetXID(wrk, oc)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
existed before, just noticed it here: If we gained a reference, we can move ObjGetXID()
(could be expensive) and EXP_Dttl
(not so much) outside the critical region.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I definitely focused on preserving the same behavior during the split.
bin/varnishd/cache/cache_hash.c
Outdated
oh = hash->lookup(wrk, req->digest, &wrk->wpriv->nobjhead); | ||
} | ||
|
||
lr = hsh_objhead_lookup(oh, req, &oc, &busy_oc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above: I think it should be clear when oc
gained a reference and when not, and we might want to at least add a comment that busy_oc
never gains a reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way gh displays the comments, this "above" actually turned into "below". It was above when I added it.
oh = oc->objhead; | ||
CHECK_OBJ(oh, OBJHEAD_MAGIC); | ||
|
||
Lck_Lock(&oh->mtx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take the oh->mtx
a second time and duplicate the rush code for what exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by second time?
The point here is to unbusy the objcore on behalf of the nonexistent fetch task that would normally have ensured it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm finally seeing the double lock problem, which happens below the code referenced by your comment.
Addressed in ea4ba6f.
bin/varnishd/cache/cache_hash.c
Outdated
CHECK_OBJ_NOTNULL(req->hash_objhead, OBJHEAD_MAGIC); | ||
oh = req->hash_objhead; | ||
oh = req->objcore->objhead; | ||
(void)HSH_DerefObjCore(wrk, &req->objcore); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adds to oh mtx contention
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, and I didn't find a clean solution to avoid that. In theory, this is compensated by less contention when rushing cacheable objects.
edit: which should only be prevented by a vary mismatch.
bin/varnishd/cache/cache_hash.c
Outdated
@@ -731,7 +731,12 @@ hsh_rush1(const struct worker *wrk, struct objcore *oc, struct rush *r) | |||
Lck_AssertHeld(&oh->mtx); | |||
|
|||
AZ(oc->flags & OC_F_BUSY); | |||
max = oc->flags ? cache_param->rush_exponent : INT_MAX; | |||
if (oc->flags & OC_F_WITHDRAW) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we have now removed the rush limit from HSH_DerefObjCore()
and replaced it with a new HSH_Withdraw()
function and a new flag, to basically just implement what was the rush limit before.
How are we not just shuffling stuff from left to right here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am driving this based on the objcore, so yes I'm shifting the rush policy from call sites to the objcore state, and no that's not just shuffling things around.
The "withdraw" logic should have been part of the "abandon" logic but I forgot about it by the time I "completed" the pull request. It was intended from the beginning though, as a couple commit message announce an increase of waiting list activity that should be temporary.
The good news I guess is that when I realized I had forgotten this detail I came up with "withdraw" that describes the use case better than "abandon" (and avoids overloading the latter).
Discussion on IRC, click to expand.
|
I looked at this again, and overall, I would like to ask for the following: For one, I would prefer to split refactoring and semantic changes: I am not convinced of the IIUC, the other changes can be summarized as:
I do not see how we need to refactor so much to achieve these goals, and I would prefer if we first addressed these based on the existing interfaces (with some adjustments, maybe). In particular, the But most importantly, there is something about an argument from 0a87e86 which I do not understand:
I really do not understand how this patch series achieves this, I don't even understand how it can be achieved at all for the general (Vary) case. Yes, this is true if the busy object has no |
That's a summary capturing the essential changes, but slightly off. The The And the new rush "latches" are not so much here to rush earlier but really to nail consistent semantics for
It does so as far as compatible variants go, so a Please check slide 12: https://github.com/varnishcache/varnish-cache/wiki/VDD23Q1#compliance
I think you missed it from 0a87e86's commit message:
Rewinding in your comment:
Some of the refactoring (like the lookup split in two steps) is me feeling uneasy about touching this part, and in particular the locking logic. Separating the lookup search from lookup outcome really helped me (and you seemed to appreciate one of the effects on I can try a patch series without the non-essential refactoring. You found a few things that require the change to be somehow amended anyway (like the overlooked VCF). Now regarding the
In the last case, the moment we know we won't trigger a fetch task there is no reason (besides hurting latency to sell more licenses) to keep the I hope this answers all of your questions, thank you for your patience! |
FTR, need to devour this:
|
Yesterday I rewrote the patch series without the offending refactoring, and taking changes in a slightly different order helped better lay out the moving parts. I tried to avoid confusing wording in the commit log, and tried to be more explicit about semantics, so I hope this will be overall easier to review. |
FTR, I understand you are waiting for my feedback, but this PR need a longer attention span and I need to tend to other work for the remainder of this week. Sorry. |
My goal was to answer everything before you'd take another dive, so in that sense I succeeded ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this PR has a lot of merit. It cleans up the code, making it less magic, and will optimize significantly the case where there are multiple popular VARYants on an OH.
The PR does touch on quite a few of the scary codepaths though. I do wonder about places that rely on OC_F_BUSY being set while under lock and if there are gotchas there. I didn't do complete search for that sort of fallout in this round.
bin/varnishd/cache/cache_hash.c
Outdated
const uint8_t *vary; | ||
|
||
oc = req->objcore; | ||
CHECK_OBJ_ORNULL(oc, OBJCORE_MAGIC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be CHECK_OBJ_NOTNULL(). Callers should not call it if req->objcore == NULL
. That makes it obvious at the call sites that this function needs an OC to operate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objection.
bin/varnishd/cache/cache_hash.c
Outdated
return (0); | ||
|
||
AZ(oc->flags & OC_F_BUSY); | ||
if (oc->flags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test feels wrong to me. Are any (both present and future) OC flags always a reason for the rushing to fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You caught me red handed.
As of today, yes, all OC flags imply that the object is not serviceable in the context of a rush. In the future, much uncertain. We could add an ObjRushable(oc)
function, I believe there's another spot where it would be relevant.
bin/varnishd/cache/cache_hash.c
Outdated
CHECK_OBJ_ORNULL(req->vcf, VCF_MAGIC); | ||
AN(hash); | ||
|
||
hsh_prealloc(wrk); | ||
if (DO_DEBUG(DBG_HASHEDGE)) | ||
hsh_testmagic(req->digest); | ||
|
||
if (req->hash_objhead != NULL) { | ||
if (hsh_rush_match(req)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be if (req->objcore != NULL && hsh_rush_match(req))
, ref comment above. Makes it obvious that this path is only taken if a req->objcore
reference exists before calling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objection, this is basically the call site for your first comment.
bin/varnishd/cache/cache_hash.c
Outdated
* the objcore for pass objects. The other reference is held by | ||
* the current fetch task. | ||
*/ | ||
if (oh == private_oh) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should better be if (oc->flags & OC_F_PRIVATE)
. Paves the way for multiple private_oh
s to reduce lock contention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted.
bin/varnishd/cache/cache_hash.c
Outdated
|
||
if (oc->flags & (OC_F_WITHDRAW|OC_F_FAILED)) | ||
max = 1; | ||
else if (oc->flags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test has me concerned. Why is this the right choice for any OC_F_*
flag other than withdraw and failed? And that reasoning would also apply also to any future flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the other candidate for ObjRushable(oc)
.
include/tbl/oc_flags.h
Outdated
@@ -30,7 +30,8 @@ | |||
|
|||
/*lint -save -e525 -e539 */ | |||
|
|||
OC_FLAG(BUSY, busy, (1<<1)) //lint !e835 | |||
OC_FLAG(WITHDRAW, withdraw, (1<<0)) //lint !e835 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be named WITHDRAWN
instead? When set I believe the past tense is the right one, as in it has already been withdrawn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
I added 6 SQUASHME commits to address @mbgrydeland's review and one more to add an assertion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe most of my concerns have been addressed.
Though I do have a new concern that I'm uncertain of how will impact the changeset. What drives the rushing of HFP/HFM objects with this patch set? It looks to me like it will only rush once for this case, and not have those that was woken because of a HFP/HFM wake up new requests again. Or am I missing something?
@@ -369,7 +369,9 @@ hsh_rush_match(struct req *req) | |||
CHECK_OBJ_NOTNULL(oc, OBJCORE_MAGIC); | |||
|
|||
AZ(oc->flags & OC_F_BUSY); | |||
if (oc->flags) | |||
if (oc->flags & (OC_F_WITHDRAWN|OC_F_HFM|OC_F_HFP|OC_F_CANCEL| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are HFM and HFP being excluded here? Aren't they supposed to explicitly do the rushing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point the request has already been rushed, and it's failing to match an objcore with either HFP or HFM, so it will proceed with a regular lookup. In all likelihood it will proceed in either vcl_pass
or vcl_miss
accordingly, unless another compatible object landed in the cache between rush time and lookup time (when we grab the objhead lock).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see it now how that comes together, at least for HFP objects. The rushing cascade will continue in HSH_Lookup()
when the HFP flag is found and hsh_deref_objhead_unlock()
is called at https://github.com/Dridi/varnish-cache/blob/objcore_rush/bin/varnishd/cache/cache_hash.c#L556. hsh_deref_objhead_unlock()
does a wake of rush_exponent
tasks when finding a HFP object.
Though I worry that there is no similar effect for HFM with this patch set. For those, only rush_exponent
tasks will be woken during HSH_Unbusy()
. But these will not rush again during their HSH_Lookup()
(https://github.com/Dridi/varnish-cache/blob/objcore_rush/bin/varnishd/cache/cache_hash.c#L581), pausing the cascade. It will wake again when one of those does HSH_Unbusy()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HFM will cascade, but not as soon as HFP. Unlike HFP, HFM may break the exponential cascade.
Since we treat it as a miss, and retain the possibility of inserting a cacheable object on the next fetch, we will either have a wholesale rush (cacheable) or an exponential rush at "unbusy" time (if all rush_exponent
fetch tasks trigger rush_exponent
waiting list rushes of their own).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking this again, I guess this behaviour of HFM is the same as before this PR, and with that knowledge I don't have any objection to this PR.
Though I was caught by surprise wrt to the delay inherent to the HFM rushing. My understanding of this was that HFM would behave the same as HFP. The rushing is only used to space out the task scheduling internal to Varnish, and not a method of being gentle with the backend. I think it's worth revisiting this logic regardless of this PR, and seeing if we have the intended behaviour here.
I still did not find the hours which this PR requires for another review round, but I just tried it with one of my test setups and ran into this null pointer deref straight away:
|
It is embarrassing to find that I forgot to query the Taken care of in 141308d, could you please give it another try? |
This adds coverage for a non-delivery transition from vcl_hit, where the busy objcore would drop its sole reference in the event of a grace hit. The lack of coverage was visible in the gcov dashboard: 669 580 if (busy != NULL) { 670 0 (void)HSH_DerefObjCore(wrk, &busy, 0); 671 0 VRY_Clear(req); 672 0 } There should now be at least one pass inside this block. Refs #4032
Thanks to @simonvik who is helping me getting traffic through this pull request we found that a test case for vmod_saintmode would panic. It became quickly obvious to me that this case that slipped through the cracks shows that the model for this pull request is coherent. It's the new The one-liner fix is in cf932ca and coverage was added directly to the master branch in 3eeb8c4. |
FYI: For a multitude of reasons I still did not find the few hours I would like to have for quiet work to thoroughly review the new state of this PR. I could spend about half an hour with it, and I currently feel both under- and overwhelmed by it. Underwhelmed, because I still fail to see why the goals of this PR need so many changes (I see this as my own mistake), and I am overwhelmed by the fact that at this point the PR can effectively only be reviewed as one large diff due to the incremental changes. I am still worried about the change of the busy state model and the general impact this might have, I wonder if we really need the withdrawn facility and I am not convinced that we actually DTRT in all the different places. |
Just in case I will be away for more before-release bugwashing: I would really prefer to move this one to the next cycle to get the chance for a thorough review. See previous comments for reasons. |
I fixed an incorrect assertion that surfaced from production traffic. It otherwise ran successfully on Friday and over the weekend. |
From Friday's bugwash: I will squash all the commits that followed @mbgrydeland's review, close this pull request and open a new one based on this patch series to start a new discussion. |
This is not a nice caching outcome, but it is a good reason to trigger a new fetch if there are clients on the waiting list. If the fetch task failed before reaching HSH_Unbusy() we make sure to "unbusy" it anyway. The OC_F_FAILED flag will prevent it from being looked up, so there is no reason to delay the rush until the final objcore reference is dropped. This also means that once the client task that triggered the fetch task resumes it can only be operating on a non-busy object.
The BOS_PREP_STREAM state is retired. It was used as a stop-gap to "unbusy" objects before waking up the client task for good. Instead, the HSH_Unbusy() and HSH_Fail() functions are called once the gap is closed, depending on the outcome. This removes one unnecessary signal and consolidates multiple call sites, ensuring that objects always drop the OC_F_BUSY flag from a central location, for fetch tasks.
This is the counterpart of HSH_Unbusy() for the cases where the req task will not schedule a fetch task, at which point we know we can already wake up a request, if there is one on the waiting list. This encapsulates the "wake up only one request" semantic using a new OC_F_WITHDRAWN flag. Although this flag is not being used yet, it will when the state of the objcore determines the rush policy instead of the call site.
Objcores created in vcl_synth cannot be looked up, therefore there is never an object going on the waiting list because it sees a synthetic objcore's OC_F_BUSY flag.
Objcores created for request bodies cannot be looked up, therefore there is never an object going on the waiting list because it sees a req.body objcore's OC_F_BUSY flag.
Rushing the waiting list because an objcore was dropped was a source of spurious wakeups, including the case where the dropped objcore is not even relevant to the objhead waiting list. It is now guaranteed that an object either dropped its OC_F_BUSY flag once ready (otherwise failed or withdrawn) or never had this flag to begin with for objcores that can't be looked up.
Operating on an objcore allows to preserve the different rush strategies that exist so far: - rush one request for failed fetch tasks - rush one request for withdrawn objects - rush rush_exponent requests otherwise For the cases where no rush is expected, a null objcore is passed. For cacheable objects, all waiters are optimistically woken up at once.
From now on, the policy will be derived from the objcore.
This reduces a great deal of spurious activity on the private_oh waiting list and removes unncessary conditionals when dealing with cacheable (or at least searchable) objects. There is still some waiting list no-op activity via HSH_Fail() but it can be considered negligible compared to the happy path.
Not only does the waiting list operate on an objcore, it also passes a reference to requests before they reembark a worker. This way when an objcore is already present during lookup we can attempt a cache hit without holding the objhead lock. Instead of repurposing the req::hash_objhead field into an equivalent req::hash_objcore, the field is actually removed. In order to signal that a request comes back from its waiting list, the life time of the req::waitinglist flag is extended until cnt_lookup() is reentered. If the rushed objcore matches a request, the lookup can result in a hit without entering the regular lookup critical section. The objhead lock is briefly acquired to release req's reference on the objhead, to rely solely on the objcore's objhead reference like a normal hit. This shifts the infamous waiting list serialization phenomenon to the vary header match. The rush policy is applied wholesale on cacheable objects instead of exponentially. This improves waiting list latency when the outcome is a cache hit, but forces incompatible variants to reenter a lookup and potentially disembark into the waiting list again. Since most spurious rushes at every corner of objhead activity are gone, this change puts all the spurious activity on the incompatible variants alone instead of all objects on more occasions. If a cacheable object was inserted in the cache, but already expired, this behavior enables cache hits. This can be common with multi-tier Varnish setups where one Varnish server may serve a graced object to an other, but true of any origin server that may serve stale yet valid responses. The waiting list enables a proper response-wide no-cache behavior from now on, but the built-in VCL prevents it by default. This is also the first step towards implementing no-cache and private support at the header field granularity.
From the main commit message:
This is a mix bag of cherry-picks from #3992 and new original commits. See individual commit messages for more details.