Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8.0 Compliance improvements #3246

Open
bsdphk opened this issue Mar 9, 2020 · 5 comments
Open

8.0 Compliance improvements #3246

bsdphk opened this issue Mar 9, 2020 · 5 comments

Comments

@bsdphk
Copy link
Contributor

bsdphk commented Mar 9, 2020

We have agreed to improve RFC compliance in the next "big" release (7.0/vcl x.y)

This ticket is meant to be the coordination point for this, in particular for the necessary documentation updates.

Each of the subtasks will have their own tickets, listed here:

Completed Subtasks for 7.0:

Discarded Subtasks:

Todo list:

  • RFC9110 7.6.1 says that "Proxy-Connection" header should be filtered.
@dridi
Copy link
Member

dridi commented Jun 3, 2020

Registering the idea here: we should try to improve RFC coverage in the code as we make progress in this area.

By that I mean comments like:

if (status < 200 || status == 204) {
// rfc7230,l,1691,1695
http_Unset(req->resp, H_Content_Length);
} else if (status == 304) {
// rfc7230,l,1675,1677
http_Unset(req->resp, H_Content_Length);
} else if (clval >= 0 && clval == req->resp_len) {

@dridi
Copy link
Member

dridi commented Aug 3, 2020

I'm wondering whether VIP24 also falls under the compliance umbrella.

@ThijsFeryn
Copy link
Contributor

ThijsFeryn commented Aug 3, 2020

I had a look at the Edge Architecture Specification, and noticed something that is not properly implemented in our vcl_backend_response logic.

The Surrogate-Control response header contains several directives that influence entity cacheability; specifically, "no-store", "no-store-remote", and "max-age" (see "Surrogate-Control Header" for more information). Collectively, these directives and their behaviors are described by the capability token

Surrogate/1.0

This token should be included in all requests sent by compliant surrogates (see "Surrogate-Capabilities Header").

When any of these directives are present, they override any HTTP cacheability information present in the response.

Surrogate-Control: no-store

In https://github.com/varnishcache/varnish-cache/blob/master/bin/varnishd/builtin.vcl#L154, we unconditionally allow Surrogate-Control: no-store to have precedence over Cache-Control.

According to the spec Surrogate-Capability: varnish="Surrogate/1.0" should be set first.

If we want to improve compliance, we could change the Surrogate check as follows:

bereq.http.Surrogate-Capability  ~ "(?i)Surrogate\/1\.0" && beresp.http.Surrogate-control ~ "(?i)no-store"

Surrogate-Control: max-age

The Surrogate-Control header also supports max-age. Setting the TTL of an object based on this, is a bit more complicated, as it is not part of builtin.vcl.

This would be a change in the core. But the same check on Surrogate-Capability: varnish="Surrogate/1.0" should happen before allowing Surrogate-Control: max-age=30 to take precedence over Cache-Control: max-age=50, for example.

Setting a Surrogate-Capability header in builtin.vcl

If we really want to support surrogates, it would also make sense to already set a Surrogate-Capability header in the vcl_backend_fetch subroutine of the builtin.vcl:

set bereq.http.Surrogate-Capability = "varnish=\"Surrogate/1.0 ESI/1.0\"";

That would allow the origin to know that we support both the surrogate caching syntax and ESI.

Processing ESI based on the Surrogate-Control header

If we send a proper Surrogate-Capability header announcing ESI support, we can just as wel process ESI when the origin sends us a Surrogate-Control header.

We could add the following to vcl_backend_response:

if(beresp.http.Surrogate-Control ~ "content=\"ESI/1\.0\"") {
    set beresp.do_esi = true;
}

If this would undermine any other proxy or CDN that has ESI capabilities, we can use device targeting to make it more specific. The corresponding VCL code would then look like this:

if(beresp.http.Surrogate-Control ~ "content=\"ESI/1\.0\";varnish") {
    set beresp.do_esi = true;
}

Next steps

  • Is this level of compliance in regards to surrogates of interest here?
  • Should I pursue this?
  • Should this be part of PoC: introduce beresp private headers #3211?
  • Should I split this up into 2 different issues? One for builtin.vcl, and one for the TTL?
  • Does it make sense to announce our own surrogate capabilities using the proper header?
  • Does it make sense to process ESI automatically if the right Surrogate-Control header is set?

@dridi
Copy link
Member

dridi commented Oct 8, 2020

I added a list of parsing mistakes in the issue description, two items I'm aware of so far.

dridi added a commit to dridi/varnish-cache that referenced this issue Jun 30, 2021
Because we funnel HTTP header names through the symbol table they have
to be valid VCL identifiers. It means that we can't support all valid
header names, which are tokens in the HTTP grammar. To finally close this
loophole without the help of a VMOD we allow header names to be quoted:

    req.http.regular-header
    req.http."quoted.header"

However we don't want to allow any component of a symbol to be quoted:

    req."http".we-dont-want-this

So we teach the symbol table that wildcard symbols may be quoted. There
used to be several use cases for wildcards but it is now limited to HTTP
headers.

Refs varnishcache#3246
Refs varnishcache#3379
dridi added a commit to dridi/varnish-cache that referenced this issue Jun 30, 2021
Because we funnel HTTP header names through the symbol table they have
to be valid VCL identifiers. It means that we can't support all valid
header names, which are tokens in the HTTP grammar. To finally close this
loophole without the help of a VMOD we allow header names to be quoted:

    req.http.regular-header
    req.http."quoted.header"

However we don't want to allow any component of a symbol to be quoted:

    req."http".we-dont-want-this

So we teach the symbol table that wildcard symbols may be quoted. There
used to be several use cases for wildcards but it is now limited to HTTP
headers.

Refs varnishcache#3246
Refs varnishcache#3379
dridi added a commit to dridi/varnish-cache that referenced this issue Jul 13, 2021
Because we funnel HTTP header names through the symbol table they have
to be valid VCL identifiers. It means that we can't support all valid
header names, which are tokens in the HTTP grammar. To finally close this
loophole without the help of a VMOD we allow header names to be quoted:

    req.http.regular-header
    req.http."quoted.header"

However we don't want to allow any component of a symbol to be quoted:

    req."http".we-dont-want-this

So we teach the symbol table that wildcard symbols may be quoted. There
used to be several use cases for wildcards but it is now limited to HTTP
headers.

Refs varnishcache#3246
Refs varnishcache#3379
dridi added a commit to dridi/varnish-cache that referenced this issue Jul 13, 2021
Because we funnel HTTP header names through the symbol table they have
to be valid VCL identifiers. It means that we can't support all valid
header names, which are tokens in the HTTP grammar. To finally close this
loophole without the help of a VMOD we allow header names to be quoted:

    req.http.regular-header
    req.http."quoted.header"

However we don't want to allow any component of a symbol to be quoted:

    req."http".we-dont-want-this

So we teach the symbol table that wildcard symbols may be quoted. There
used to be several use cases for wildcards but it is now limited to HTTP
headers.

Refs varnishcache#3246
Refs varnishcache#3379
dridi added a commit to dridi/varnish-cache that referenced this issue Jul 15, 2021
Because we funnel HTTP header names through the symbol table they have
to be valid VCL identifiers. It means that we can't support all valid
header names, which are tokens in the HTTP grammar. To finally close this
loophole without the help of a VMOD we allow header names to be quoted:

    req.http.regular-header
    req.http."quoted.header"

However we don't want to allow any component of a symbol to be quoted:

    req."http".we-dont-want-this

So we teach the symbol table that wildcard symbols may be quoted. There
used to be several use cases for wildcards but it is now limited to HTTP
headers.

Refs varnishcache#3246
Refs varnishcache#3379
dridi added a commit to dridi/varnish-cache that referenced this issue Jul 16, 2021
They are meant to convey and abstract the comparison of components of
the HTTP grammar. They are also used as anchors to reference what the
RFCs have to say about case sensitivity for comparisons.

Refs varnishcache#3246
dridi added a commit to dridi/varnish-cache that referenced this issue Jul 16, 2021
This should hopefully signal that we aren't merely comparing strings,
but specific kinds of strings, and prevent regressions in that area.

Refs varnishcache#3246
dridi added a commit to dridi/varnish-cache that referenced this issue Jul 16, 2021
dridi added a commit to dridi/varnish-cache that referenced this issue Jul 16, 2021
dridi added a commit to dridi/varnish-cache that referenced this issue Aug 13, 2021
We should avoid loose parsing of HTTP/1 header line delimiters. This
could potentially create a smuggling vector if Varnish was stacked with
a different server that would interpret the delimiters differently.

Refs varnishcache#3246
dridi added a commit to dridi/varnish-cache that referenced this issue Aug 17, 2021
Because we funnel HTTP header names through the symbol table they have
to be valid VCL identifiers. It means that we can't support all valid
header names, which are tokens in the HTTP grammar. To finally close this
loophole without the help of a VMOD we allow header names to be quoted:

    req.http.regular-header
    req.http."quoted.header"

However we don't want to allow any component of a symbol to be quoted:

    req."http".we-dont-want-this

So we teach the symbol table that wildcard symbols may be quoted. There
used to be several use cases for wildcards but it is now limited to HTTP
headers.

Refs varnishcache#3246
Refs varnishcache#3379
dridi added a commit that referenced this issue Aug 17, 2021
Because we funnel HTTP header names through the symbol table they have
to be valid VCL identifiers. It means that we can't support all valid
header names, which are tokens in the HTTP grammar. To finally close this
loophole without the help of a VMOD we allow header names to be quoted:

    req.http.regular-header
    req.http."quoted.header"

However we don't want to allow any component of a symbol to be quoted:

    req."http".we-dont-want-this

So we teach the symbol table that wildcard symbols may be quoted. There
used to be several use cases for wildcards but it is now limited to HTTP
headers.

Refs #3246
Refs #3379
dridi added a commit to dridi/varnish-cache that referenced this issue Aug 23, 2021
That is, only when http_range_support is on, which is the default.

Refs varnishcache#3246
dridi added a commit to dridi/varnish-cache that referenced this issue Aug 31, 2021
That is, only when http_range_support is on, which is the default.

Refs varnishcache#3246
dridi added a commit that referenced this issue Aug 31, 2021
That is, only when http_range_support is on, which is the default.

Refs #3246
@bsdphk bsdphk changed the title 7.0 Compliance improvements 8.0 Compliance improvements Sep 20, 2021
@dridi dridi added the r=8.0 label Sep 28, 2021
@karptonite
Copy link

karptonite commented Jan 18, 2022

@ThijsFeryn

  • Is this level of compliance in regards to surrogates of interest here?
  • Should I pursue this?
  • Does it make sense to process ESI automatically if the right Surrogate-Control header is set?

I'm super late to this, but I would certainly make use of at least two of these improved use of the Surrogate-Control headers. In particular, we use ESI, and have custom handling in our vcl to process ESI based on that header. It's only a few lines, but it would be simpler for everyone using ESI if those few lines were unnecessary.

Also, I sometimes run into an issue when I want to set Cache-control no-cache for the browse, but cache on Varnish. Being able to override the no-cache in Surrogate-Control with a max-age would simplify things, again removing the need for custom VCL.

nigoroll added a commit to nigoroll/varnish-cache that referenced this issue Jul 23, 2024
Since 4ab1100 (see also varnishcache#3246), we fail backend
responses with an unexpected Content-Range.

Yet RFC9110 states:

	The Content-Range header field has no meaning for status codes that do
	not explicitly describe its semantic.

	https://www.rfc-editor.org/rfc/rfc9110.html#section-14.4

The semantic is described for status codes 206 and 416, so the obvious
implementation change would be for core code to only consider Content-Range for
these status codes.

But there might be scenarios where a stricter-than-RFC check is intended, so we
keep that in core code and change builtin.vcl to remove the header where it has
no semantic.
nigoroll added a commit to nigoroll/varnish-cache that referenced this issue Jul 31, 2024
Since 4ab1100 (see also varnishcache#3246), we fail backend
responses with an unexpected Content-Range.

Yet RFC9110 states:

	The Content-Range header field has no meaning for status codes that do
	not explicitly describe its semantic.

	https://www.rfc-editor.org/rfc/rfc9110.html#section-14.4

The semantic is described for status codes 206 and 416, so the obvious
implementation change would be for core code to only consider Content-Range for
these status codes.

But there might be scenarios where a stricter-than-RFC check is intended, so we
keep that in core code and change builtin.vcl to remove the header where it has
no semantic.
nigoroll added a commit to nigoroll/varnish-cache that referenced this issue Aug 12, 2024
Since 4ab1100 (see also varnishcache#3246), we fail backend
responses with an unexpected Content-Range.

Yet RFC9110 states:

	The Content-Range header field has no meaning for status codes that do
	not explicitly describe its semantic.

	https://www.rfc-editor.org/rfc/rfc9110.html#section-14.4

The semantic is described for status codes 206 and 416, so the obvious
implementation change would be for core code to only consider Content-Range for
these status codes.

But there might be scenarios where a stricter-than-RFC check is intended, so we
keep that in core code and change builtin.vcl to remove the header where it has
no semantic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants