Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Make it easier to set long-lived cache headers on assets with hashed filenames #1172

Closed
edmorley opened this issue Oct 12, 2018 · 1 comment

Comments

@edmorley
Copy link
Member

edmorley commented Oct 12, 2018

One of the things the @neutrinojs/web preset tries to do, is to make the build output suitable for long-term caching (eg Cache-Control: max-age=315360000, public, immutable) with the use of hashed filenames and webpack options set such that the filenames are as deterministic as possible.

To actually make use of this, deployments need to set the Cache-Control header for the correct subset of the build output. It should be set for anything that has a hashed filename, but not files like index.html, robots.txt or favicon.ico. Accidentally matching against a file whose filename does not contain a hash can be disastrous, since each client will then have to force-reload the page or manually clear their cache.

For generated assets, currently Neutrino uses filenames of form:

[name].[hash:8].[ext]
[name].[contenthash:8].[ext]

Which gives filenames like:

index.1d85033a.js
index.1d85033a.js.map
about.d7fea1e4.css
fontawesome-webfont.af7ae505.woff2
1.0cb55c2c.js

There are a few things that make matching the files hard:

  1. A blacklist approach (eg "match everything but .html") is risky, since if people forget to add additional types when new unhashed files are added to the build later (eg favicon.ico), then they'll be incorrectly treated as immutable.

  2. Whitelisting by file extension is also not reliable, since it's not guaranteed that all files with that extension will have a hashed filename. For example someone might use copy-webpack-plugin to copy in favicon.ico, or a plugin/loader might not use the hashed filenames we set (this is the case when using html-webpack-plugin's favicon option; I'm going to file an upstream issue but not sure if we'll get anywhere)

  3. Whitelisting by matching against filenames that appear to have the name.hash.ext pattern can also be error prone. For example a lenient regex like this:

    /\.[a-f0-9]{8}\./

    ...could incorrectly match against foo.faded100.bar-baz.min.js (yes somewhat contrived and the list of 8-character hex words is pretty short, but still doesn't seem ideal to rely on luck).

    And anything stricter then has to take into account that (a) file extensions might be longer than three characters, contain digits and not necessarily be lowercase (eg .WOFF2), (b) some file types can have optional .map suffixes, (b) there may also be compressed variants (eg .br, .gz) for people that use tools/web server plugins to pre-generate them for static assets.

    For Treeherder I was thinking of using the Python equivalent of:

    /\.[a-f0-9]{8}\.[A-Za-z0-9]{2,5}(\.map)?(\.br|\.gz)?$/

    ...and even that still has false-positive potential.

  4. Not all hosting options support regex (for example Netlify header rules), and wildcards are not adequate to match against the hashed filenames in a safe way.

A possible way to avoid all of this, is to have the generated assets be output under a subdirectory, which could then be whitelisted entirely for the cache-header. For example:

index.html
static/index.1d85033a.js
static/index.d7fea1e4.css
static/fontawesome-webfont.af7ae505.woff2

If we do this, what naming/structure should we use?

  1. Everything under static/
  2. Everything under assets/
  3. Split according to file-type (ie: js/ css/ media/)
  4. Split according to file-type but also nest under a directory (eg static/js/ static/css/ static/media/)
  5. ...something else?

Considerations:

  • the filepath is output in the yarn build final summary, and having too long of a directory name (such as with (4)) causes annoying wrapping.
  • option (3) would mean needing multiple duplicate rules for hosting options that don't support regex (such as Netlify)
  • for projects that only have one entrypoint and few assets, it might be overkill to have separate js/, css/ etc directories containing only one file. That said for projects with multiple entrypoints or lots of code-splitting, there can be many many assets (example).
  • whilst static/ is probably more conventional than assets/, it feels slightly wrong to be calling only some of the build output "static" when really it all is?
  • CRA does (4) (see here), but then they customise the build output summary removing most of the information that would wrap
  • vue-cli does (3) (see here), but gives the user the option to add additional prefixes prior to those, using assetsDir
  • ultimately the exact naming is somewhat unimportant, since most of the time it will be invisible to users (given builds on remote machines, and not really exposed when using webpack-dev-server)

I think my preference is for (1) or (2).
(And either way, this would be a breaking change)

Thoughts?

@edmorley edmorley added this to the v9 milestone Oct 12, 2018
@edmorley edmorley self-assigned this Oct 12, 2018
@eliperelman
Copy link
Member

My vote is for 2.

edmorley added a commit that referenced this issue Oct 15, 2018
This makes it easier to write `Cache-Control` header rules for files
with hashed filenames, since the web server rule can now just match
the entire `assets/` directory rather than having to use false-positive
prone regex to match the hash in the filename.

In addition, this PR removes some redundant configuration:
* The `@neutrinojs/node` and `@neutrinojs/library` presets no longer
  set `output.filename` / `output.chunkFilename`, since they were
  previously only being set to the defaults anyway.
* The `@neutrinojs/web` preset no longer sets `output.chunkFilename`
  since by default it inherits from `output.filename`, so setting both
  to the same value is redundant:
  https://github.com/webpack/webpack/blob/v4.20.2/lib/WebpackOptionsDefaulter.js#L102-L112

Fixes #1172.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

2 participants