why do we not use 413 / 414? #3205

nigoroll · 2020-01-31T12:22:49Z

I wonder why we do not issue 413 Payload Too Large for too many or too long headers and 414 URI Too Long for exactly that, but rather only send a 400 response for either case.
For 413, I traced this back to #1367 and 9c9a990, the respective counter was removed in cbcfce5
For 414 (and 413), there was ticket #1608 which was concluded with Discussed during bugwash today. Conclusion is current behavior (closing session) is fine. Closing. - in other words, not much of an explanation with respect to the reason. Also, while it is possible to generate a 414 in VCL as long as it fits into the rxbuf, I wonder if we should actually have a max url limit paralleling http_req_hdr_len

The text was updated successfully, but these errors were encountered:

dridi · 2020-01-31T13:17:40Z

To me this belongs in an overarching standards compliance discussion.

See also #3016 that refines 5xx statuses for VBE and uses 500 for VRT_fail()/return (fail), but will require a VCL bump to take effect.

ehocdet · 2020-01-31T16:37:58Z

Issue #1913 also. Closing the TCP connection instead of sending an http error break POLA. It is not always possible (for the time being) to solve such problem with VCL.

bsdphk · 2020-02-03T06:00:24Z

Originally we decided "to not leak information" and limited error responses to 503 and one or two 40x responses.

I agree with @dridi that we need to decide how compliant we want to be going forward.

dridi · 2020-02-03T07:11:00Z

Maybe we could have compliance "profiles", and for example one opaque (503 everything) and one faithful (best effort compliance).

nigoroll · 2020-02-03T09:44:19Z

VCL is what makes Varnish so powerful, so IMHO we should try to implement the "profiles" in VCL.

bsdphk · 2020-02-03T15:03:05Z

Bugwash: Not a 2020-march-15 subject, needs either VCL4.1 or V7.0, possibly both.

@dridi to open VIP on subject of greater compliance for 2020-sep-15, everybody to help out.

@hermunn

The goal of this PoC is to enable improvement of the hit ratio through better standards compliance. It will serve as a reference for varnishcache#3205 to build a VIP on better standards compliance. Reviews and feedback are still welcome independently of that work. Another goal is to throw my raw notes away and refer to this PoC for a bunch of issues I ran into. Consider the use case where a resource can be cached, but the backend MAY give a cookie to the client if it's not presenting one: Cache-Control: must-revalidate, max-age=300[, private] [Set-Cookie: foo=bar] Before hit-for-miss was introduced, the default behavior in the scenario where Varnish (on behalf of the client) didn't present a `foo` cookie and the backend as a consequence added the Set-Cookie header, there would have been a 120s hit-for-pass during which all subsequent requests would be passed to the backend. After that the next cache miss would either lead to a cacheable response, or rinse-and-repeat if once again the client is missing the dreaded cookie (or worse, we unset the cookie in `vcl_recv` because the resource is "static"). With hit-for-miss, the current default, in the scenario where the response contains a Set-Cookie header all subsequent requests would get the opportunity to supersede the hit-for-miss object with a proper one. This PoC introduces a new default behavior that still leads to a hit-for-miss unless the backend responds with this variation in the Set-Cookie case: Cache-Control: must-revalidate, max-age=300, private="set-cookie" Set-Cookie: foo=bar https://tools.ietf.org/html/rfc7234#section-5.2.2.6 It is now possible to cache the response if no other condition triggers the `beresp.uncacheable` flag. Only the client that triggered the fetch will see the Set-Cookie header, cache hits will see the redacted object. This variation also works: Cache-Control: must-revalidate, max-age=300, no-cache="set-cookie" Set-Cookie: foo=bar https://tools.ietf.org/html/rfc7234#section-5.2.2.2 This PoC however treats the #field-name form of the no-cache and private directives identically, storing private headers even though the RFC clearly states that we MUST NOT. More on that later. To see it in action, here is a basic test case: varnishtest "private headers" server s1 { rxreq txresp -hdr {Cache-Control: no-cache="xkey", private="set-cookie"} \ -hdr "set-cookie: foo=bar" -hdr "xkey: abc, def" -hdr "foo: bar" } -start varnish v1 -vcl+backend { sub vcl_hit { if (obj.http.set-cookie || obj.http.xkey || !obj.http.foo) { return (fail); } } sub vcl_backend_response { if (!beresp.http.Set-Cookie.is_private) { return (fail); } } } -start logexpect l1 -v v1 -q "Debug:beresp.private" { expect * * Debug "beresp.private: xkey,set-cookie" } -start client c1 { txreq rxresp expect resp.status == 200 expect resp.http.set-cookie == foo=bar expect resp.http.foo == bar } -run logexpect l1 -wait client c2 { txreq rxresp expect resp.status == 200 expect resp.http.set-cookie == <undef> expect resp.http.foo == bar } -run This test case confirms that the Set-Cookie header only reaches c1. That's also true for the XKey header. It also checks the response status because of the VCL sanity checks: we can see that the private headers are not just hidden from the client, but from VCL itself for the hit case. We can also see that the HEADER type grew an `.is_private` property. The `.is_private` property is needed to have a VMOD-less solution to dismiss the Set-Cookie header in the built-in VCL. Here is the new `vcl_backend_response`: sub vcl_backend_response { if (bereq.uncacheable) { return (deliver); } else if (beresp.ttl <= 0s || (beresp.http.Set-Cookie && !beresp.http.Set-Cookie.is_private) || beresp.http.Surrogate-control ~ "(?i)no-store" || (!beresp.http.Surrogate-Control && beresp.http.Cache-Control ~ "(?i:(private|no-cache)(?!=)|no-store)") || beresp.http.Vary == "*") { # Mark as "Hit-For-Miss" for the next 2 minutes set beresp.ttl = 120s; set beresp.uncacheable = true; } return (deliver); } The notable changes are the handling of Set-Cookie and Cache-Control headers: we ignore private Set-Cookie headers and we also ignore no-cache and private Cache-Control directives if they take a value. In order to implement the `HEADER.is_private` property this PoC is built on top of varnishcache#3158. It highlights that it is impossible today to have a method or property attached to the `HEADER` type, because it is almost systematically turned into a `STRINGS`. This PoC spreads the conversion to multiple locations where it is *effectively* needed. After getting some input from @hermunn I managed to find a solution that wouldn't break response headers order. My initial idea was to create a new "transient" object attribute only visible the client transaction that triggerred the fetch. This would have allowed to comply with the wording from the Cache-Control private directive: > If the private response directive specifies one or more field-names, > this requirement is limited to the field-values associated with the > listed response header fields. That is, a shared cache MUST NOT > store the specified field-names(s), whereas it MAY store the > remainder of the response message. Again, with some input from @mbgrydeland I acknowledge that the storage infrastructure doesn't have the ability to do what I want, and don't want to engage that route. So I traded strict compliance with simplicity and merely changed how the headers pack from OA_HEADER is encoded. A single-byte marker is added, but only private headers pay this one byte overhead, maintaining forward compatibility of OA_HEADER's ABI for persistent caches. From an application point of view, this is compliant because we can never access private `obj.http.*` by design^Waccident in VCL. Only inline C code or a VMOD could craft a VCL_HEADER capable of finding a private header in `obj`. Because the private marker "corrupts" private headers they are virtually NOT in storage, but in practice they are resident and may be extracted from RAM, a core file, or a persisted storage. I could technically wipe private headers from OA_HEADER once they found their way to the relevant `resp`, but this is only a PoC and I didn't feel like smashing the read-only view of storage from the client's perspective. Also, could I even ensure that the client transaction wipes the private headers? What it it fails before reaching that step? Could the wipe happen too late for persistence? Maybe it should be the persistent storage's responsibility to wipe it on disk? It should be understood at this point that this PoC is by no means complete, and it should definitely not be a single patch. Many details were omitted and marked with XXX comments or assertions. As a result two test cases should fail and panic because of missing error handling. What the test case above doesn't show about this change is the introduction of a `beresp.private` variable in VCL. It can be read, but also set similar to how `beresp.filters` gets an initial value that can manually be overriden. The initial value of `beresp.private` is extracted from no-cache and private Cache-Control directives using http_GetHdrField(), but this function doesn't take quoted-strings into account and can as a result be confused if it reaches a comma in the middle of a header list: Cache-Control: public, private="Set-Cookie,XKey", max-age=300 (four fields) ###### ################### ##### ########### Ideas that I'm registering here for the lack of a better venue: 1) teach http_GetHdrField about quoted-strings We probably also need a plural version (please, not callback-based) called http_GetHdrFields() for example in case we have directives spread over multiple Cache-Control headers before we collect it. Currently we can only get the first directive for a given name. 2) add a new `reset` action in VCL For values that are computed, do the computation again: sub vcl_backend_response { set beresp.Cache-Control = <expr>; reset beresp.ttl; reset beresp.grace; reset beresp.private; } 3) stop adding SLTs for new variables Having a SLT_Filters dedicated to `beresp.filters` may have been a wasted slot. Instead we could have an SLT_Field for current and future variables to log them when their value change: Field beresp.filters: <string> Field beresp.private: <string> Field req.grace: <duration> EOF

nigoroll mentioned this issue Feb 3, 2020

cache_req_body should return 413 when body is too large #1913

Closed

bsdphk added a=Bugwash Today and removed a=Bugwash Today labels Feb 3, 2020

dridi mentioned this issue Feb 6, 2020

PoC: introduce beresp private headers #3211

Closed

bsdphk added the r=7.0 label Feb 24, 2020

bsdphk mentioned this issue Mar 9, 2020

8.0 Compliance improvements #3246

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why do we not use 413 / 414? #3205

why do we not use 413 / 414? #3205

nigoroll commented Jan 31, 2020 •

edited

Loading

dridi commented Jan 31, 2020

ehocdet commented Jan 31, 2020

bsdphk commented Feb 3, 2020

dridi commented Feb 3, 2020

nigoroll commented Feb 3, 2020

bsdphk commented Feb 3, 2020

why do we not use 413 / 414? #3205

why do we not use 413 / 414? #3205

Comments

nigoroll commented Jan 31, 2020 • edited Loading

dridi commented Jan 31, 2020

ehocdet commented Jan 31, 2020

bsdphk commented Feb 3, 2020

dridi commented Feb 3, 2020

nigoroll commented Feb 3, 2020

bsdphk commented Feb 3, 2020

nigoroll commented Jan 31, 2020 •

edited

Loading