Make it easier to set long-lived cache headers on assets with hashed filenames #1172

edmorley · 2018-10-12T17:20:25Z

One of the things the @neutrinojs/web preset tries to do, is to make the build output suitable for long-term caching (eg Cache-Control: max-age=315360000, public, immutable) with the use of hashed filenames and webpack options set such that the filenames are as deterministic as possible.

To actually make use of this, deployments need to set the Cache-Control header for the correct subset of the build output. It should be set for anything that has a hashed filename, but not files like index.html, robots.txt or favicon.ico. Accidentally matching against a file whose filename does not contain a hash can be disastrous, since each client will then have to force-reload the page or manually clear their cache.

For generated assets, currently Neutrino uses filenames of form:

[name].[hash:8].[ext]
[name].[contenthash:8].[ext]

Which gives filenames like:

index.1d85033a.js
index.1d85033a.js.map
about.d7fea1e4.css
fontawesome-webfont.af7ae505.woff2
1.0cb55c2c.js

There are a few things that make matching the files hard:

A blacklist approach (eg "match everything but .html") is risky, since if people forget to add additional types when new unhashed files are added to the build later (eg favicon.ico), then they'll be incorrectly treated as immutable.
Whitelisting by file extension is also not reliable, since it's not guaranteed that all files with that extension will have a hashed filename. For example someone might use copy-webpack-plugin to copy in favicon.ico, or a plugin/loader might not use the hashed filenames we set (this is the case when using html-webpack-plugin's favicon option; I'm going to file an upstream issue but not sure if we'll get anywhere)
Whitelisting by matching against filenames that appear to have the name.hash.ext pattern can also be error prone. For example a lenient regex like this:
```
/\.[a-f0-9]{8}\./
```
...could incorrectly match against foo.faded100.bar-baz.min.js (yes somewhat contrived and the list of 8-character hex words is pretty short, but still doesn't seem ideal to rely on luck).

And anything stricter then has to take into account that (a) file extensions might be longer than three characters, contain digits and not necessarily be lowercase (eg .WOFF2), (b) some file types can have optional .map suffixes, (b) there may also be compressed variants (eg .br, .gz) for people that use tools/web server plugins to pre-generate them for static assets.

For Treeherder I was thinking of using the Python equivalent of:
```
/\.[a-f0-9]{8}\.[A-Za-z0-9]{2,5}(\.map)?(\.br|\.gz)?$/
```
...and even that still has false-positive potential.
Not all hosting options support regex (for example Netlify header rules), and wildcards are not adequate to match against the hashed filenames in a safe way.

A possible way to avoid all of this, is to have the generated assets be output under a subdirectory, which could then be whitelisted entirely for the cache-header. For example:

index.html
static/index.1d85033a.js
static/index.d7fea1e4.css
static/fontawesome-webfont.af7ae505.woff2

If we do this, what naming/structure should we use?

Everything under static/
Everything under assets/
Split according to file-type (ie: js/ css/ media/)
Split according to file-type but also nest under a directory (eg static/js/ static/css/ static/media/)
...something else?

Considerations:

the filepath is output in the yarn build final summary, and having too long of a directory name (such as with (4)) causes annoying wrapping.
option (3) would mean needing multiple duplicate rules for hosting options that don't support regex (such as Netlify)
for projects that only have one entrypoint and few assets, it might be overkill to have separate js/, css/ etc directories containing only one file. That said for projects with multiple entrypoints or lots of code-splitting, there can be many many assets (example).
whilst static/ is probably more conventional than assets/, it feels slightly wrong to be calling only some of the build output "static" when really it all is?
CRA does (4) (see here), but then they customise the build output summary removing most of the information that would wrap
vue-cli does (3) (see here), but gives the user the option to add additional prefixes prior to those, using assetsDir
ultimately the exact naming is somewhat unimportant, since most of the time it will be invisible to users (given builds on remote machines, and not really exposed when using webpack-dev-server)

I think my preference is for (1) or (2).
(And either way, this would be a breaking change)

Thoughts?

The text was updated successfully, but these errors were encountered:

eliperelman · 2018-10-12T17:43:24Z

My vote is for 2.

This makes it easier to write `Cache-Control` header rules for files with hashed filenames, since the web server rule can now just match the entire `assets/` directory rather than having to use false-positive prone regex to match the hash in the filename. In addition, this PR removes some redundant configuration: * The `@neutrinojs/node` and `@neutrinojs/library` presets no longer set `output.filename` / `output.chunkFilename`, since they were previously only being set to the defaults anyway. * The `@neutrinojs/web` preset no longer sets `output.chunkFilename` since by default it inherits from `output.filename`, so setting both to the same value is redundant: https://github.com/webpack/webpack/blob/v4.20.2/lib/WebpackOptionsDefaulter.js#L102-L112 Fixes #1172.

edmorley added the breaking change label Oct 12, 2018

edmorley added this to the v9 milestone Oct 12, 2018

edmorley self-assigned this Oct 12, 2018

edmorley mentioned this issue Oct 13, 2018

Output generated files under an assets/ subdirectory #1174

Merged

edmorley added the feature label Oct 15, 2018

edmorley closed this as completed in #1174 Oct 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it easier to set long-lived cache headers on assets with hashed filenames #1172

Make it easier to set long-lived cache headers on assets with hashed filenames #1172

edmorley commented Oct 12, 2018 •

edited

Loading

eliperelman commented Oct 12, 2018

Make it easier to set long-lived cache headers on assets with hashed filenames #1172

Make it easier to set long-lived cache headers on assets with hashed filenames #1172

Comments

edmorley commented Oct 12, 2018 • edited Loading

eliperelman commented Oct 12, 2018

edmorley commented Oct 12, 2018 •

edited

Loading