Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2: Content negotiation #2665

Closed
mholt opened this issue Jul 11, 2019 · 12 comments
Closed

v2: Content negotiation #2665

mholt opened this issue Jul 11, 2019 · 12 comments
Labels
discussion 💬 The right solution needs to be found help wanted 🆘 Extra attention is needed
Milestone

Comments

@mholt
Copy link
Member

mholt commented Jul 11, 2019

Caddy 2 needs a generic, powerful way to do content negotiation.

Currently, the http.handlers.encode module negotiates its encoding from the Accept-Encoding header. However, for pre-compressed assets or other kinds of content negotiation, different facilities are required.

Content negotiation can be based on MIME type (Accept header), encoding (Accept-Encoding header), or language (Accept-Language) header.

I propose a content negotiation request matcher:

"content_negotiation": {
    "mime": ["text/plain", "text/html"],
    "encoding": ["br", "gzip", "flate"],
    "language": ["en-US"]
}

For each negotiated variance, a placeholder would be added indicating what was negotiated, for example: {http.matchers.content_negotiation.mime}

TODO: We would also need a way to add a Vary header to the response. Maybe a matcher is not the right solution. This might have to be a middleware... hm.

@mholt mholt added the Caddy2 label Jul 11, 2019
@mholt mholt added this to the 2.0 milestone Jul 11, 2019
@mholt mholt removed the v2 label Mar 23, 2020
@mholt
Copy link
Member Author

mholt commented Mar 26, 2020

As discussed in https://caddy.community/t/how-to-serve-gzipped-files-automatically-in-caddy-v2/7311?u=matt it would be useful for Caddy to be able to serve pre-compressed files as well. If enabled, this would involve using the Accept-Encoding header to see if there is a matching file on disk, and if so, serve it with the proper headers. Maybe as a guest module to the file_server handler.

@francislavoie francislavoie added the discussion 💬 The right solution needs to be found label Apr 16, 2020
@averri
Copy link

averri commented Jun 21, 2020

What is the workaround for implementing this in the actual version v2? Suppose the following scenario:

The browser requests the resource /js/app.js with the header accept-encoding: gzip. If the resource exists with an extension gz, like /js/app.js.gz, so the HTTP response should include the compressed file /js/app.js.gz with the appropriate response header set: content-encoding: gzip.

@polarathene
Copy link

It'd be nice for Caddy to be able to support pre-compressed brotli .br files.

Unlike gzip, brotli isn't advised for on-demand compression from what I've read, would it also be something Caddy could do if encoding was requested for static assets to compress them, but write that compressed file to disk for serving in future, or is this something that should be handled outside of Caddy? Such as a scheduled compression pass if compressible files are added to the server?(eg via user uploads)

@mholt
Copy link
Member Author

mholt commented Aug 1, 2020

It'd be nice for Caddy to be able to support pre-compressed brotli .br files.

I think so too (see my reply above)! It just needs to be discussed, designed, and implemented...

@ueffel
Copy link
Contributor

ueffel commented Sep 1, 2020

An important point I'm missing here is, that the decision to use compression at all should depend on the type of data to be served. It is wasted CPU cycles to compress images, web fonts (woff2) or any other content that is already compressed.
Maybe there is a way to delay the decision to compress until the Content-Type of the response is known. So meaning the mime type of a file or the Content-Type header itself from a reverse proxied response.
Caddy then should have a way to configure the Content-Types to be compressed, something like nginx's gzip_types but not limited to one encoder but across all of them.

To pre-compression:
I think encoder modules should implement optionally an interface to provide a file extension to add to a filepath. This is then used to check for pre-compressed files. The question is which module does the checking. I'm not familiar enough with the caddy source code to make a good suggestion here. Maybe also via interface that is be implemented by modules, first of all file_server?

Next question would be how determine the order of the encodings. In #3692 I suggested a implemention of a prefered order of encodings. But this should only apply to dynamic content. The prefered order of encodings for pre compressed content may/should be different. (Brotli for pre compression because of the better compression ratio and gzip for dynamic content because of the better speed). Maybe an separate setting?
This begs a further question then: What to do first:

  • Determine the encoding to be used according to the Accept-Encoding header and then check for pre compressed files of that encoding or
  • Checking for pre compressed files first (all accepted encodings) and then determine the encoding according to the results

(I think as a first step pre compression should be done outside of caddy.
Transparently pre compress assets would be a nice feature but there are questions: When to do this? At startup? If requested x amount of times? As a extra step before starting caddy but built-in? How to determine when a pre compressed asset is out of date and needs to be recompressed?)

@mholt mholt added the help wanted 🆘 Extra attention is needed label Sep 1, 2020
@ueffel
Copy link
Contributor

ueffel commented Mar 1, 2021

I tried myself on an implementation for this. Let me know what you think.
master...ueffel:file_server-precompressed

Things implemented on this branch:

  • What I already mentioned in Prefer order for encodings and minimum length via Caddyfile #3692 (minimum length configurable via caddyfile, preferred encoding server side)

  • types setting for the encode which determines which content-types should be encoded (glob patterns supported)

    encode zstd gzip {
        minimum_length 256
        prefer zstd gzip
        types text/* application/json application/*+xml application/javascript image/svg+xml
    }
  • precompress setting which determines the encoders that should be used to look for precompressed files (in definition order, if the client has no preference via q-factors)

    file_server {
        precompress zstd gzip
    }
    

    Encoders have to implement the new encode.Precompress interface, which means implementing a Suffix method that returns the file extension to look for. (for gzip it would be gz, for zstd it would be zst) The file_server module then looks for filename + "." + Suffix if filename is requested and the encoding is accepted (So for example /static/style.css: if gzip is accepted, it looks for /static/style.css.gz)

What I'm unsure about:

  • the file server module depends on the encoder (AcceptedEncodings function, Precompress interface). I don't know of these kind cross dependencies may be undesired
  • adding a "Vary: Accept-Encoding" header, when serving precompressed files
  • removing "Accept-Ranges" header, when serving precompressed files

Edit: Creating compressed versions of the files should be done outside of caddy.

@francislavoie
Copy link
Member

I'm not sure if I'm convinced by the types subdirective, because that can be done via request matching with https://caddyserver.com/docs/caddyfile/matchers#header or https://caddyserver.com/docs/caddyfile/matchers#header-regexp. Generally, we want to avoid request matching and handling separate. Maybe a header_glob matcher might be appropriate if necessary (but I'm not convinced it is necessary yet). -- Edit: nevermind, I realized that this is response header matching. 🤔 I guess this is probably fine then. Tricky.

I think we should change precompress to precompressed, because the former implies a verb, or that Caddy might be doing the precompressing. Using past tense helps imply the files were compressed before Caddy runs.

I'm trying to think whether the Precompressed interface shouldn't be bound to http.encoders.* modules, because it's not actually doing any encoding. I think it should probably be a separate module namespace like http.precompressed.*. The reason I think this is because we'd probably want to have br for precompressed, but not have it as an actual encoder (at least not yet, cause performance etc), and if we only implemented the precompressed part in Caddy standard modules but let an third-party plugin provide actual brotli encoding (should someone want to have Caddy do it on the fly, such as via your https://github.com/ueffel/caddy-brotli plugin), then there would be a module conflict.

@francislavoie
Copy link
Member

@ueffel you can open a PR for this, it'll probably be easier to discuss specific aspects of the changes. I like where you're going with this! ❤️

@mholt
Copy link
Member Author

mholt commented Mar 2, 2021

@ueffel Yeah that looks great. I might suggest renaming the precompress subdirective to precompressed to make it clear that pre-compressing must have already happened. (I suppose if we're pendantic about shorter names, precomp is fine too, but lately I'm in the habit of just spelling things out.)

the file server module depends on the encoder (AcceptedEncodings function, Precompress interface). I don't know of these kind cross dependencies may be undesired

It is OK for the staticfiles package to import the encode package, if that's what you mean?

Speaking of dependencies though, I do want to see if we can get away without https://github.com/gobwas/glob or any other new external dependencies for this.

adding a "Vary: Accept-Encoding" header, when serving precompressed files
removing "Accept-Ranges" header, when serving precompressed files

Manipulate headers as necessary to serve the content properly. 👍

Create a pull request and let's review!

@mholt
Copy link
Member Author

mholt commented Jun 7, 2021

Going to close this since @ueffel's PR #4045 seems to address most of the points, and I'm not sure there's any specific, actionable requests left in this issue. Any more specific content negotiation requests can go in a new issue.

@scy
Copy link

scy commented Jul 15, 2024

Just a quick heads-up that this issue had been linked in several places (examples 1, 2) as the issue that's tracking content negotiation, especially regarding Accept headers and qvalues.

In the end, it has been closed in favor of #4045 which basically only cares about negotiating encoding (compression) for the encode directive.

There's the conneg plugin by @mpilhlt which doesn't look terribly mature (12 commits, last one two years ago) that's supposed to provide flexible content negotiation for Content-Type, languages, charsets and encodings.

Other than that, is there anything else available or planned that I didn't see, that would support this kind of content negotiation, in particular dealing with q-values?

@mholt
Copy link
Member Author

mholt commented Jul 22, 2024

Except for fine-tuning some related headers recently, I haven't really done much more with content negotiation because it's a vast area and there hasn't been specific requests (in particular from sponsors).

To see something more, we'll probably need specific requests and use cases to justify the effort -- since there are many ways to do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion 💬 The right solution needs to be found help wanted 🆘 Extra attention is needed
Projects
None yet
Development

No branches or pull requests

6 participants