Skip to content

Commit

Permalink
PoC: introduce beresp private headers
Browse files Browse the repository at this point in the history
The goal of this PoC is to enable improvement of the hit ratio through
better standards compliance. It will serve as a reference for varnishcache#3205 to
build a VIP on better standards compliance. Reviews and feedback are
still welcome independently of that work. Another goal is to throw my
raw notes away and refer to this PoC for a bunch of issues I ran into.

Consider the use case where a resource can be cached, but the backend
MAY give a cookie to the client if it's not presenting one:

    Cache-Control: must-revalidate, max-age=300[, private]
    [Set-Cookie: foo=bar]

Before hit-for-miss was introduced, the default behavior in the scenario
where Varnish (on behalf of the client) didn't present a `foo` cookie
and the backend as a consequence added the Set-Cookie header, there
would have been a 120s hit-for-pass during which all subsequent requests
would be passed to the backend. After that the next cache miss would
either lead to a cacheable response, or rinse-and-repeat if once again
the client is missing the dreaded cookie (or worse, we unset the cookie
in `vcl_recv` because the resource is "static").

With hit-for-miss, the current default, in the scenario where the
response contains a Set-Cookie header all subsequent requests would get
the opportunity to supersede the hit-for-miss object with a proper one.

This PoC introduces a new default behavior that still leads to a
hit-for-miss unless the backend responds with this variation in the
Set-Cookie case:

    Cache-Control: must-revalidate, max-age=300, private="set-cookie"
    Set-Cookie: foo=bar

https://tools.ietf.org/html/rfc7234#section-5.2.2.6

It is now possible to cache the response if no other condition triggers
the `beresp.uncacheable` flag. Only the client that triggered the fetch
will see the Set-Cookie header, cache hits will see the redacted object.

This variation also works:

    Cache-Control: must-revalidate, max-age=300, no-cache="set-cookie"
    Set-Cookie: foo=bar

https://tools.ietf.org/html/rfc7234#section-5.2.2.2

This PoC however treats the #field-name form of the no-cache and private
directives identically, storing private headers even though the RFC
clearly states that we MUST NOT. More on that later.

To see it in action, here is a basic test case:

    varnishtest "private headers"

    server s1 {
    	rxreq
    	txresp -hdr {Cache-Control: no-cache="xkey", private="set-cookie"} \
    	    -hdr "set-cookie: foo=bar" -hdr "xkey: abc, def" -hdr "foo: bar"
    } -start

    varnish v1 -vcl+backend {
    	sub vcl_hit {
    		if (obj.http.set-cookie || obj.http.xkey || !obj.http.foo) {
    			return (fail);
    		}
    	}
    	sub vcl_backend_response {
    		if (!beresp.http.Set-Cookie.is_private) {
    			return (fail);
    		}
    	}
    } -start

    logexpect l1 -v v1 -q "Debug:beresp.private" {
    	expect * * Debug "beresp.private: xkey,set-cookie"
    } -start

    client c1 {
    	txreq
    	rxresp
    	expect resp.status == 200
    	expect resp.http.set-cookie == foo=bar
    	expect resp.http.foo == bar
    } -run

    logexpect l1 -wait

    client c2 {
    	txreq
    	rxresp
    	expect resp.status == 200
    	expect resp.http.set-cookie == <undef>
    	expect resp.http.foo == bar
    } -run

This test case confirms that the Set-Cookie header only reaches c1.
That's also true for the XKey header. It also checks the response status
because of the VCL sanity checks: we can see that the private headers
are not just hidden from the client, but from VCL itself for the hit
case. We can also see that the HEADER type grew an `.is_private`
property.

The `.is_private` property is needed to have a VMOD-less solution to
dismiss the Set-Cookie header in the built-in VCL. Here is the new
`vcl_backend_response`:

    sub vcl_backend_response {
        if (bereq.uncacheable) {
            return (deliver);
        } else if (beresp.ttl <= 0s ||
          (beresp.http.Set-Cookie && !beresp.http.Set-Cookie.is_private) ||
          beresp.http.Surrogate-control ~ "(?i)no-store" ||
          (!beresp.http.Surrogate-Control &&
            beresp.http.Cache-Control ~ "(?i:(private|no-cache)(?!=)|no-store)") ||
          beresp.http.Vary == "*") {
            # Mark as "Hit-For-Miss" for the next 2 minutes
            set beresp.ttl = 120s;
            set beresp.uncacheable = true;
        }
        return (deliver);
    }

The notable changes are the handling of Set-Cookie and Cache-Control
headers: we ignore private Set-Cookie headers and we also ignore
no-cache and private Cache-Control directives if they take a value.

In order to implement the `HEADER.is_private` property this PoC is built
on top of varnishcache#3158. It highlights that it is impossible today to have a
method or property attached to the `HEADER` type, because it is almost
systematically turned into a `STRINGS`. This PoC spreads the conversion
to multiple locations where it is *effectively* needed.

After getting some input from @hermunn I managed to find a solution that
wouldn't break response headers order. My initial idea was to create a
new "transient" object attribute only visible the client transaction
that triggerred the fetch. This would have allowed to comply with the
wording from the Cache-Control private directive:

> If the private response directive specifies one or more field-names,
> this requirement is limited to the field-values associated with the
> listed response header fields.  That is, a shared cache MUST NOT
> store the specified field-names(s), whereas it MAY store the
> remainder of the response message.

Again, with some input from @mbgrydeland I acknowledge that the storage
infrastructure doesn't have the ability to do what I want, and don't
want to engage that route. So I traded strict compliance with simplicity
and merely changed how the headers pack from OA_HEADER is encoded. A
single-byte marker is added, but only private headers pay this one byte
overhead, maintaining forward compatibility of OA_HEADER's ABI for
persistent caches.

From an application point of view, this is compliant because we can
never access private `obj.http.*` by design^Waccident in VCL. Only
inline C code or a VMOD could craft a VCL_HEADER capable of finding a
private header in `obj`. Because the private marker "corrupts" private
headers they are virtually NOT in storage, but in practice they are
resident and may be extracted from RAM, a core file, or a persisted
storage.

I could technically wipe private headers from OA_HEADER once they found
their way to the relevant `resp`, but this is only a PoC and I didn't
feel like smashing the read-only view of storage from the client's
perspective. Also, could I even ensure that the client transaction wipes
the private headers? What it it fails before reaching that step? Could
the wipe happen too late for persistence? Maybe it should be the
persistent storage's responsibility to wipe it on disk?

It should be understood at this point that this PoC is by no means
complete, and it should definitely not be a single patch. Many details
were omitted and marked with XXX comments or assertions. As a result
two test cases should fail and panic because of missing error handling.

What the test case above doesn't show about this change is the
introduction of a `beresp.private` variable in VCL. It can be read, but
also set similar to how `beresp.filters` gets an initial value that can
manually be overriden.

The initial value of `beresp.private` is extracted from no-cache and
private Cache-Control directives using http_GetHdrField(), but this
function doesn't take quoted-strings into account and can as a result be
confused if it reaches a comma in the middle of a header list:

    Cache-Control: public, private="Set-Cookie,XKey", max-age=300
    (four fields)  ######  ################### #####  ###########

Ideas that I'm registering here for the lack of a better venue:

1) teach http_GetHdrField about quoted-strings

We probably also need a plural version (please, not callback-based)
called http_GetHdrFields() for example in case we have directives
spread over multiple Cache-Control headers before we collect it.
Currently we can only get the first directive for a given name.

2) add a new `reset` action in VCL

For values that are computed, do the computation again:

    sub vcl_backend_response {
        set beresp.Cache-Control = <expr>;
        reset beresp.ttl;
        reset beresp.grace;
        reset beresp.private;
    }

3) stop adding SLTs for new variables

Having a SLT_Filters dedicated to `beresp.filters` may have been a
wasted slot. Instead we could have an SLT_Field for current and future
variables to log them when their value change:

    Field beresp.filters: <string>
    Field beresp.private: <string>
    Field req.grace: <duration>

EOF
  • Loading branch information
dridi committed Feb 5, 2020
1 parent 316e39d commit bd07815
Show file tree
Hide file tree
Showing 14 changed files with 314 additions and 25 deletions.
4 changes: 2 additions & 2 deletions bin/varnishd/builtin.vcl
Original file line number Diff line number Diff line change
Expand Up @@ -150,10 +150,10 @@ sub vcl_backend_response {
if (bereq.uncacheable) {
return (deliver);
} else if (beresp.ttl <= 0s ||
beresp.http.Set-Cookie ||
(beresp.http.Set-Cookie && !beresp.http.Set-Cookie.is_private) ||
beresp.http.Surrogate-control ~ "(?i)no-store" ||
(!beresp.http.Surrogate-Control &&
beresp.http.Cache-Control ~ "(?i:no-cache|no-store|private)") ||
beresp.http.Cache-Control ~ "(?i:(private|no-cache)(?!=)|no-store)") ||
beresp.http.Vary == "*") {
# Mark as "Hit-For-Miss" for the next 2 minutes
set beresp.ttl = 120s;
Expand Down
7 changes: 5 additions & 2 deletions bin/varnishd/cache/cache.h
Original file line number Diff line number Diff line change
Expand Up @@ -396,6 +396,7 @@ struct busyobj {

struct vfp_ctx *vfc;
const char *filter_list;
const char *private_headers;

struct ws ws[1];
uintptr_t ws_bo;
Expand Down Expand Up @@ -611,8 +612,9 @@ unsigned http_EstimateWS(const struct http *fm, unsigned how);
void http_PutResponse(struct http *to, const char *proto, uint16_t status,
const char *response);
void http_FilterReq(struct http *to, const struct http *fm, unsigned how);
void HTTP_Encode(const struct http *fm, uint8_t *, unsigned len, unsigned how);
int HTTP_Decode(struct http *to, const uint8_t *fm);
void HTTP_Encode(const struct http *fm, uint8_t *, unsigned len, unsigned how,
const char *privhdr);
int HTTP_Decode(struct http *to, const uint8_t *fm, unsigned len, unsigned hit);
void http_ForceHeader(struct http *to, const char *hdr, const char *val);
void http_PrintfHeader(struct http *to, const char *fmt, ...)
v_printflike_(2, 3);
Expand Down Expand Up @@ -648,6 +650,7 @@ int HTTP_IterHdrPack(struct worker *, struct objcore *, const char **);
for ((ptr) = NULL; HTTP_IterHdrPack(wrk, oc, &(ptr));)
const char *HTTP_GetHdrPack(struct worker *, struct objcore *, const char *hdr);
enum sess_close http_DoConnection(struct http *hp);
unsigned http_IsPrivHdr(const char *hd, const char *privhdr);

#define HTTPH_R_PASS (1 << 0) /* Request (c->b) in pass mode */
#define HTTPH_R_FETCH (1 << 1) /* Request (c->b) for fetch */
Expand Down
1 change: 1 addition & 0 deletions bin/varnishd/cache/cache_busyobj.c
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@ VBO_GetBusyObj(struct worker *wrk, const struct req *req)
WS_Init(bo->ws, "bo", p, bo->end - p);

bo->do_stream = 1;
bo->private_headers = "";

bo->director_req = req->director_hint;
bo->vcl = req->vcl;
Expand Down
116 changes: 115 additions & 1 deletion bin/varnishd/cache/cache_fetch.c
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,20 @@ vbf_beresp2obj(struct busyobj *bo)

l2 = http_EstimateWS(bo->beresp,
bo->uncacheable ? HTTPH_A_PASS : HTTPH_A_INS);

/* Ensure space for private header markers */
const char *ph = bo->private_headers;
if (ph != NULL && *ph != '\0') {
do {
ph = strchr(ph, ',');
if (ph != NULL) {
l2++;
ph++;
}
} while (ph != NULL);
l2++;
}

l += l2;

if (bo->uncacheable)
Expand Down Expand Up @@ -180,7 +194,8 @@ vbf_beresp2obj(struct busyobj *bo)
bp = ObjSetAttr(bo->wrk, bo->fetch_objcore, OA_HEADERS, l2, NULL);
AN(bp);
HTTP_Encode(bo->beresp, bp, l2,
bo->uncacheable ? HTTPH_A_PASS : HTTPH_A_INS);
bo->uncacheable ? HTTPH_A_PASS : HTTPH_A_INS,
bo->private_headers);

if (http_GetHdr(bo->beresp, H_Last_Modified, &b))
AZ(ObjSetDouble(bo->wrk, bo->fetch_objcore, OA_LASTMODIFIED,
Expand Down Expand Up @@ -286,6 +301,7 @@ vbf_stp_retry(struct worker *wrk, struct busyobj *bo)
bo->do_esi = 0;
bo->do_stream = 1;
bo->was_304 = 0;
bo->private_headers = "";

// XXX: BereqEnd + BereqAcct ?
VSL_ChgId(bo->vsl, "bereq", "retry", VXID_Get(wrk, VSL_BACKENDMARKER));
Expand Down Expand Up @@ -332,6 +348,102 @@ vbf_304_logic(struct busyobj *bo)
return (1);
}

/*--------------------------------------------------------------------
* Manage the beresp.private variable
*/

void
Beresp_Private(struct busyobj *bo, const char *s)
{

CHECK_OBJ_NOTNULL(bo, BUSYOBJ_MAGIC);
AN(bo->beresp);
AN(bo->vsl);
AN(s);

/* XXX: reject and log invalid list of private headers */

VSLb(bo->vsl, SLT_Debug, "beresp.private: %s", s);
bo->private_headers = s;
}

/*--------------------------------------------------------------------
* Collect private headers from Cache-Control directives
*/

static int
vbf_get_private_headers(struct busyobj *bo, const char *field,
struct vsb *vsb, const char **sep)
{
const char *b, *e;

if (!http_GetHdrField(bo->beresp, H_Cache_Control, field, &b))
return (0);

if (b == NULL || *b == '\0' || *b == ',')
return (0);

if (*b != '"') {
/* token form */
e = strchr(b, ',');
if (e == NULL)
e = strchr(b, '\0');
} else {
/* quoted string form */
b++;
e = strchr(b, '"');
if (b == e)
return (0);
if (e == NULL || e[-1] == '\\')
return (-1);
}
assert(b < e);
(void)VSB_cat(vsb, *sep);
(void)VSB_bcat(vsb, b, (e - b));
*sep = ",";
return (0);
}

static void
vbf_private_headers(struct busyobj *bo)
{
struct vsb vsb[1];
const char *sep;
unsigned r;

AN(bo->private_headers);
AZ(*bo->private_headers);

/* XXX: no way to bail out if syntax < 41 */

CHECK_OBJ_NOTNULL(bo->ws, WS_MAGIC);
r = WS_ReserveAll(bo->ws);
XXXAN(r);
AN(VSB_new(vsb, bo->ws->f, r, VSB_FIXEDLEN));
sep = "";

if (vbf_get_private_headers(bo, "no-cache", vsb, &sep) < 0 ||
vbf_get_private_headers(bo, "private", vsb, &sep) < 0)
INCOMPL();

if (VSB_finish(vsb)) {
VSB_delete(vsb);
WS_MarkOverflow(bo->ws);
INCOMPL();
return;
}

if (VSB_len(vsb) == 0) {
VSB_delete(vsb);
WS_Release(bo->ws, 0);
return;
}

WS_Release(bo->ws, VSB_len(vsb) + 1);
Beresp_Private(bo, VSB_data(vsb));
VSB_delete(vsb);
}

/*--------------------------------------------------------------------
* Setup bereq from bereq0, run vcl_backend_fetch
*/
Expand Down Expand Up @@ -435,6 +547,8 @@ vbf_stp_startfetch(struct worker *wrk, struct busyobj *bo)
);
}

vbf_private_headers(bo);

AZ(bo->do_esi);
AZ(bo->was_304);

Expand Down
63 changes: 61 additions & 2 deletions bin/varnishd/cache/cache_http.c
Original file line number Diff line number Diff line change
Expand Up @@ -910,15 +910,49 @@ http_EstimateWS(const struct http *fm, unsigned how)
* XXX: It could possibly be a good idea for later HTTP versions.
*/

unsigned
http_IsPrivHdr(const char *hd, const char *privhdr)
{
const char *c;
txt p, h;

if (privhdr == NULL)
return (0);

h.b = hd;
h.e = strchr(hd, ':');
Tcheck(h);

c = privhdr - 1;
do {
p.b = c + 1; /* XXX: make p.b skip blanks */
c = strchr(p.b, ',');
if (c == NULL)
c = strchr(p.b, '\0');
p.e = c; /* XXX: make p.e rewind blanks */
Tcheck(p);
if (Tlen(p) == Tlen(h) && !strncasecmp(p.b, h.b, Tlen(p)))
return (1);
} while (*c != '\0');

return (0);
}

void
HTTP_Encode(const struct http *fm, uint8_t *p0, unsigned l, unsigned how)
HTTP_Encode(const struct http *fm, uint8_t *p0, unsigned l, unsigned how,
const char *privhdr)
{
unsigned u, w;
uint16_t n;
uint8_t *p, *e;

AN(p0);
AN(l);
AN(privhdr);

if (how != HTTPH_A_INS || *privhdr == '\0')
privhdr = NULL;

p = p0;
e = p + l;
assert(p + 5 <= e);
Expand All @@ -938,6 +972,16 @@ HTTP_Encode(const struct http *fm, uint8_t *p0, unsigned l, unsigned how)
continue;
#include "tbl/http_headers.h"
http_VSLH(fm, u);
if (u >= HTTP_HDR_FIRST &&
http_IsPrivHdr(fm->hd[u].b, privhdr)) {
/* NB: the HTTP grammar guarantees that header names
* are only made of ASCII characters. We can use the
* MSB to mark the next header as private.
*/
assert(p < e);
*p = 0x80; /* XXX: magic constant */
p++;
}
w = Tlen(fm->hd[u]) + 1L;
assert(p + w + 1 <= e);
memcpy(p, fm->hd[u].b, w);
Expand All @@ -954,16 +998,21 @@ HTTP_Encode(const struct http *fm, uint8_t *p0, unsigned l, unsigned how)
*/

int
HTTP_Decode(struct http *to, const uint8_t *fm)
HTTP_Decode(struct http *to, const uint8_t *fm, unsigned len, unsigned hit)
{
const uint8_t *e;
uint8_t marker;

CHECK_OBJ_NOTNULL(to, HTTP_MAGIC);
AN(to->vsl);
AN(fm);
assert(len > 4);
e = fm + len;
if (vbe16dec(fm) <= to->shd) {
to->status = vbe16dec(fm + 2);
fm += 4;
for (to->nhd = 0; to->nhd < to->shd; to->nhd++) {
assert(fm < e);
if (to->nhd == HTTP_HDR_METHOD ||
to->nhd == HTTP_HDR_URL) {
to->hd[to->nhd].b = NULL;
Expand All @@ -972,10 +1021,20 @@ HTTP_Decode(struct http *to, const uint8_t *fm)
}
if (*fm == '\0')
return (0);
marker = *fm;
if (marker == 0x80)
fm++;
to->hd[to->nhd].b = (const void*)fm;
fm = (const void*)strchr((const void*)fm, '\0');
to->hd[to->nhd].e = (const void*)fm;
fm++;
/* XXX: I know, this branch needs polish */
if (marker == 0x80 && hit) {
to->hd[to->nhd].b = NULL;
to->hd[to->nhd].e = NULL;
to->nhd--;
continue;
}
http_VSLH(to, to->nhd);
}
}
Expand Down
7 changes: 6 additions & 1 deletion bin/varnishd/cache/cache_req_fsm.c
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,8 @@ Resp_Setup_Deliver(struct req *req)
{
struct http *h;
struct objcore *oc;
const uint8_t *hdr;
ssize_t len;

CHECK_OBJ_NOTNULL(req, REQ_MAGIC);
oc = req->objcore;
Expand All @@ -120,7 +122,10 @@ Resp_Setup_Deliver(struct req *req)

HTTP_Setup(h, req->ws, req->vsl, SLT_RespMethod);

if (HTTP_Decode(h, ObjGetAttr(req->wrk, oc, OA_HEADERS, NULL)))
hdr = ObjGetAttr(req->wrk, oc, OA_HEADERS, &len);
AN(hdr);
assert(len > 0);
if (HTTP_Decode(h, hdr, (unsigned)len, req->is_hit))
return (-1);

http_ForceField(h, HTTP_HDR_PROTO, "HTTP/1.1");
Expand Down
1 change: 1 addition & 0 deletions bin/varnishd/cache/cache_varnishd.h
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,7 @@ void VBF_Fetch(struct worker *wrk, struct req *req,
struct objcore *oc, struct objcore *oldoc, enum vbf_fetch_mode_e);
const char *VBF_Get_Filter_List(struct busyobj *);
void Bereq_Rollback(struct busyobj *);
void Beresp_Private(struct busyobj *, const char *);

/* cache_fetch_proc.c */
void VFP_Init(void);
Expand Down
12 changes: 12 additions & 0 deletions bin/varnishd/cache/cache_vrt.c
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,18 @@ VRT_GetHdr(VRT_CTX, VCL_HEADER hs)
return (p);
}

VCL_BOOL
VRT_IsPrivHdr(VRT_CTX, VCL_HEADER hs)
{

CHECK_OBJ_NOTNULL(ctx, VRT_CTX_MAGIC);
if (hs->where != HDR_BERESP)
return (0);

CHECK_OBJ_NOTNULL(ctx->bo, BUSYOBJ_MAGIC);
return (http_IsPrivHdr(hs->what + 1, ctx->bo->private_headers));
}

/*--------------------------------------------------------------------
* Build STRANDS from what is essentially a STRING_LIST
*/
Expand Down
Loading

0 comments on commit bd07815

Please sign in to comment.