-
Notifications
You must be signed in to change notification settings - Fork 380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parent ESI fails even when VCL sets resp.status = 200 for the included ESI #4053
Comments
Github outrageously refuses to accept attachments with the
|
The VTC test passes, and the first The server response does not have As in In the first In the last two
Which made me think that the problem is related to streaming. Now the weird part. In the first
The server sets the Surrogate-Control header with matching contents, so in effect In the third
That is, In that case, we don't get the unexpected result. With the snippet in |
What I think is happening here: For the first client run, For the other two client runs, Possible solutions: So, for now, I think the only workaround is to always set From a Varnish-Cache perspective, two options come to my mind:
|
Could this be caused by 582ded6? Can you please try this commit and its parent? (pun intended) |
Without having tried it, I think this is originally caused by this change from 26097fb req->doclose = SC_REM_CLOSE;
req->acct.resp_bodybytes += VDP_Close(req->vdc);
+
+ if (i && ecx->abrt) {
+ req->top->topreq->vdc->retval = -1;
+ req->top->topreq->doclose = req->doclose;
+ }
} this is where we originally started aborting the parent when an include's delivery fails. |
@dridi cherry-picking that commit and its parent onto 7.4.2 results in conflicts. Instead of fiddling through that, I just ran the VTC in this issue against current master, and it passed. Meaning that master also exhibits the unexpected result. Does that answer what you were getting at? |
So... I must admit I didn't read the whole text initially. I introduced onerror support guarded by a feature flag in 26097fb and the default behavior changed in 582ded6. I still have not read everything, only the ticket description. I'd like to mention that ESI processing has to disable streaming, because we need to process the beresp body before we can pass the object to clients. So there are no latency savings on an ESI fetch. I'll read the third comment (after the VTC) later, but from the look of it we lack VCL control over ESI onerror support. We could copy the feature flag to a req flag exposed in VCL upon entering vcl_deliver, and only use this flag to control the ESI delivery. |
@nigoroll sounds like it might work, but the problem is that in Come to think of it, that's what |
@slimhazard what about something like this sub vcl_miss {
if (req.esi_level > 0) {
set req.http.no-stream = "1";
} else {
unset req.http.no-stream;
}
}
sub vcl_backend_response {
if (bereq.http.no-stream) {
set beresp.do_stream = false;
}
} |
@nigoroll it looks like that works, this test passes against master:
Notice that there's no I'll try the same test against 7.4.2. |
Can confirm that the VCL workaround works in 7.4.2:
So we have a solution for now at the site, and presumably this helps to understand what's going on in Varnish. |
Now I understand the problem better. You have an ESI top request, but sub-requests may be streamed when they aren't ESI requests themselves. And a streaming-but-failing 200 response will still be caught by
Maybe, but this statement makes no difference in the test case:
You can remove it or comment it out and the test case will still pass. The whole construct only makes sense if You could do this instead:
|
I was wrong, but it doesn't mean that it was a bad idea, see #4054. |
@dridi thanks for the PR. I put on a thumbs up, so that will help it get merged, of course. If the PR is merged: it appears to me that, to also have ESI contents included even if |
The tricky part about #4054 is that So, you could need both fine-grained control of the flag and a status workaround, or just disable the flag when you know the backend doesn't specify One thing I'm also wondering is whether to always initialize this from the parameter or do that only for |
During bugwash, @dridi and myself had a discussion which was partly confusing to me. This comment is an attempt to clarify. Please correct me if you think that I am wrong: The case we are looking at goes back to 26097fb, where we would fail the top level esi request if delivery of an ESI sub-request's object resulted in an error and In 582ded6 this was changed, we would now fail the top level esi request when The idea was that the new behavior would be closer to the standard and could still be avoided by overwriting the status, as mentioned in the changes.rst entry. What we overlooked at the time is that, with streaming, we do not have the final status, it could be 200 as inspected by VCL, but be 503 as seen by the ESI VDP. While we never stream ESI objects, ESIs may include ordinary objects, for which steaming is supported. This is @slimhazard's initial test case. So the workaround is to disable streaming for non-esi includes of esis. #4054 proposes a sensible thing, to give vcl control over (edit: bold) |
Pretty good summary, thanks.
The VTC I submitted does just that, hopefully disambiguating the current behavior.
As it states in the description, it was "inspired by" this ticket and never claimed to solve the problem.
And I believe that would amount to a partial revert of 582ded6. I think it's a good idea to consider non 200/204 status codes as errors, but that should have been the extent of the change. With 26097fb we already had this behavior of assuming |
How about:
(and same for the new vcl variable) |
As we are at it: The ESI TR specifies:
The first aspect to return a >400 would imply buffering the ESI response until all subrequests are done, which we certainly won't do, so I think we should decide to deliberately not follow the TR. But regarding the second aspect: This would, IMHO, imply to disable streaming and replace included objects with empty strings. So maybe we should document that, even with something like |
I wrote a shorter test case:
It fails since 582ded6 that really changed the behavior of onerrror (or lack thereof) interpretation. It should have only added "undesirable status codes" to the list of things considered an error and let the onerror handling as-is. If there is one thing to fix, it's that. |
FYI: We (@slimhazard, myself and a Varnish-Cache user) have now established that the VCL code to restore the Varnish-Cache 7.2 behavior is a little more involved than originally assumed. The required workaround is something along these lines (no news in here, just a summary):
if (req.esi_level > 0) {
set req.http.no-stream = "nope";
} else {
unset req.http.no-stream;
}
if (bereq.http.no-stream) {
set beresp.do_stream = false;
}
if (req.esi_level > 0 && resp.status != 200 && resp.status != 204) {
set resp.status = 200;
} We are now working on a simplified solution for pESI, which we would like to give up once we have settled on a solution for Varnish-Cache. In my mind, this could be to
In addition, all solutions should come with VCL control. |
I agree that this needs to always be under VCL control. |
This is a partial revert of 582ded6 to restore the assumed onerror=continue behavior for ESI includes, unless the feature flag esi_include_onerror is raised. The part of the change that considers all status codes besides 200 and 204 to be errors for ESI includes remains. A test case covers VCL's ability to "bless" error responses by overriding resp.status, allowing ESI delivery to continue on this criterion. Fixes varnishcache#4053
I submitted my suggestion: #4065. |
This is a partial revert of 582ded6 to restore the assumed onerror=continue behavior for ESI includes, unless the feature flag esi_include_onerror is raised. The part of the change that considers all status codes besides 200 and 204 to be errors for ESI includes remains. A test case covers VCL's ability to "bless" error responses by overriding resp.status, allowing ESI delivery to continue on this criterion. Fixes varnishcache#4053
This is a partial revert of 582ded6 to restore the assumed onerror=continue behavior for ESI includes, unless the feature flag esi_include_onerror is raised. The part of the change that considers all status codes besides 200 and 204 to be errors for ESI includes remains. A test case covers VCL's ability to "bless" error responses by overriding resp.status, allowing ESI delivery to continue on this criterion. Fixes varnishcache#4053
This is a partial revert of 582ded6 to restore the assumed onerror=continue behavior for ESI includes, unless the feature flag esi_include_onerror is raised. The part of the change that considers all status codes besides 200 and 204 to be errors for ESI includes remains. A test case covers VCL's ability to "bless" error responses by overriding resp.status, allowing ESI delivery to continue on this criterion. Fixes #4053
Expected Behavior
What's New for Varnish 7.3 says of ESI includes:
So we expected that this snippet in
vcl_deliver{}
would prevent failure of the including parent in all cases:Current Behavior
Even with the VCL snippet shown above, the parent ESI can evidently fail under certain circumstances, in fact apparently in default configuration. I'll add a VTC test case, based on Varnish's current
e00035.vtc
, to illustrate that.The VTC suggests that
beresp.do_stream
may be related to the problem. Ifberesp.do_stream
is set to false and the VCL snippet is included invcl_deliver
, then as expected, the parent response does not fail, and the client response body is not truncated. If theberesp.do_stream == true
, which is the default, then the parent response fails and the body is truncated, even with thevcl_deliver
snippet.That would at least make sense to me, since it's familiar that when streaming is on, a client response status may be seen as 200, even when the fetch actually failed (for example when there were too few bytes for the value in
Content-Length
), and would have been set to 503 if streaming had been off. On that logic,vcl_deliver
doesn't "see"resp.status != 200
when that happens.But the VTC shows a bit of strangeness, because it also seems to depend on whether or not
beresp.do_esi
is conditionally or unconditionally set totrue
in VCL. The problem happens ifberesp.do_esi
is set conditionally, as ine00035.vtc
. If VCL just saysset beresp.do_esi = true
, then the problem doesn't happen in the VTC.Obviously that takes some explanation. I'll go through the expectations when I add the VTC test case.
Possible Solution
In
vcl_backend_response
, setberesp.do_esi = true
unconditionally. It may also be necessary to setberesp.do_stream = false
(but in that case we lose the advantages of streaming).Steps to Reproduce (for bugs)
I'll add a VTC test case to the issue, and describe what we expected to see.
Context
The site makes very extensive use of ESI, and has been prepared to handle errors in ESI subrequests for quite some time before the attempted migration to Varnish 7.x.
We know of
onerror="continue"
and-p feature=+esi_include_onerror
. The Varnish team can control the-p
parameter, but using theonerror
attribute in responses depends on all of the developer teams (about 50 of them) always remembering to do so, without exception for every ESI include, now and in the future. One omission and the entire response fails. So the VCL snippet invcl_deliver
is intended to safeguard against that.Varnish Cache version
varnishd (varnish-7.4.2 revision cd1d10a)
Operating system
Debian bookworm (12.4)
Source of binary packages used (if any)
Both packagecloud and build from github source
The text was updated successfully, but these errors were encountered: