Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asset fingerprinting and SRI #519

Closed
RealOrangeOne opened this issue Nov 15, 2018 · 28 comments
Closed

Asset fingerprinting and SRI #519

RealOrangeOne opened this issue Nov 15, 2018 · 28 comments

Comments

@RealOrangeOne
Copy link
Contributor

It'd be nice if there were a way of accessing a fingerprint or hash of static files, for a more reliable way of caching assets.

I've given this a bit of thought, and probably need some input.

Ideally, it'd just be an additional function, say get_static, which returned a structure which contained the path, fingerprinted URL, hash, things like that. Because this requires the static files to exist before running, it would also serve as a link checker. However, What do we do about live reloading? When editing a static file, we'd either need to keep track of the linked resources per page, and only reload the linked pages, or just reload the whole site. (The latter is so much easier!)

I see no reason why this couldnt work for any static file in the site, eg images. It wouldnt play well with images after they've been processed, but static images should work fine!

@Keats
Copy link
Collaborator

Keats commented Nov 16, 2018

The current timestamp is not a very good solution I agree.
I would probably just change the cachebust attribute of get_url to hash it though rather than having a new function

@RealOrangeOne
Copy link
Contributor Author

RealOrangeOne commented Nov 16, 2018

That would work for fingerprinting the URL, but wouldnt generate the required SRI data. I could change the return type to an object, but not sure I want to make a breaking change to something so fundamental to existing sites. Hence the new function.

If you're happy for me to add a breaking change, I can do that too. Or, I could just remove the cachebust argument to get_url, so people have to use get_static when they want their cache busted

@Keats
Copy link
Collaborator

Keats commented Nov 16, 2018

Do you have a link for SRI?

@RealOrangeOne
Copy link
Contributor Author

https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity is a pretty good one. Or the spec at https://w3c.github.io/webappsec-subresource-integrity/, which is incredibly dry!

@dancek
Copy link
Contributor

dancek commented May 16, 2020

Is there still interest for this? I'd like hash-based cache busting. SRI is nice in theory too, but practically only useful for external resources (which zola couldn't hash during compilation).

@tcurdt
Copy link

tcurdt commented May 16, 2020

IMO this is still relevant.

@Keats
Copy link
Collaborator

Keats commented May 18, 2020

I would like a PR for that.

@dancek
Copy link
Contributor

dancek commented May 18, 2020

I'll try to make a PR within the next couple of days.

dancek added a commit to dancek/zola that referenced this issue May 18, 2020
Cache-busting was previously done with a compile-time timestamp. Change
to the SHA-256 hash of the file to avoid refreshing unchanged files.

The implementation could be used to add a new global fn (say,
get_file_hash) for subresource integrity use, but that's for another
commit.

Fixes getzola#519.
dancek added a commit to dancek/zola that referenced this issue May 18, 2020
Cache-busting was previously done with a compile-time timestamp. Change
to the SHA-256 hash of the file to avoid refreshing unchanged files.

The implementation could be used to add a new global fn (say,
get_file_hash) for subresource integrity use, but that's for another
commit.

Fixes getzola#519.
@Keats Keats closed this as completed in 36ec33f May 25, 2020
@mattly
Copy link

mattly commented May 28, 2020

You know, I got all excited because I saw some activity on this ticket and punted on doing some work to my hugo site because I thought, yay, they're tackling this finally! Hugo's template system is just as insane as golang, and migrating my fledgling site would be a worthwhile yak shave.

But the PR that closed this issue does not provide the desired feature. It changes the cache busting mechanism. There's still no way for me to fill out the value of an integrity='{{ get_file_hash("styles.css")}}' attribute.

@Keats
Copy link
Collaborator

Keats commented May 28, 2020

{{ get_file_hash("styles.css")}} if that's all that's needed then it's very easy to add! Does it need to be a specific hash (I'm thinking sha256) or anything is fine?

@Keats Keats reopened this May 28, 2020
@dancek
Copy link
Contributor

dancek commented May 28, 2020

The SRI spec says SHA-256, -384 and -512 MUST be supported by browsers IIRC. But SHA-384 is generally recommended and used for some reason. I think it'd make sense to let the user choose between those three hashes.

@dancek
Copy link
Contributor

dancek commented May 28, 2020

I can make another PR for this. I didn't include it in the original one because I wasn't whether anyone would use it (and whether the project would want it).

@Keats
Copy link
Collaborator

Keats commented May 28, 2020

Let's start with just SHA-384, unless they are all coming from the same crate.

@dancek
Copy link
Contributor

dancek commented May 28, 2020

They're all from the sha-2 crate which we already depend on.

@dancek
Copy link
Contributor

dancek commented Jun 1, 2020

While working on SRI support I realized I basically broke all cache-busting in my previous PR, as now get_url expects to find the file in content_path when it's actually in static/. Oops, sorry for that!

I've got get_file_hash implemented with the same issue. I'll send a PR to show what I'm up to, but I'll fix the static/ path once I'm sure what the best way to do it is.

@dancek dancek mentioned this issue Jun 1, 2020
3 tasks
@dancek
Copy link
Contributor

dancek commented Jun 10, 2020

get_file_hash has been merged and will be in the next release.

@Keats Keats closed this as completed Sep 4, 2020
@Absolucy
Copy link

get_file_hash can't be used for SRI: it returns hex, the integrity attribute expects base64.

@Keats
Copy link
Collaborator

Keats commented Jan 17, 2021

There is a base64_encode filter, would {{ get_file_hash(..) | base64_encode }} work?

@Absolucy
Copy link

get_file_hash returns hex tho, so you'd get the base64 of the hex?

@SkypLabs
Copy link
Contributor

SkypLabs commented Feb 3, 2021

get_file_hash returns hex tho, so you'd get the base64 of the hex?

Correct, so get_file_hash cannot be used for SRI. This section in the documentation should be updated accordingly.

I think the current implementation of get_file_hash can still be useful in some situations so I would be in favour of adding another function like get_file_b64_hash for SRI. If you agree with me, I'm willing to work on the related PR.

@Absolucy
Copy link

Absolucy commented Feb 3, 2021

Yeah, I agree.

@Keats
Copy link
Collaborator

Keats commented Feb 3, 2021

Just make it an argument of the get_file_hash function?

@SkypLabs
Copy link
Contributor

SkypLabs commented Feb 3, 2021

Just make it an argument of the get_file_hash function?

Why not!

Given that I don't see any scenario in which returning the raw binary value of the hash could be exploitable in Markdown / HTML, I suggest that the additional parameter, when set to true, makes the function return directly the base64-encoded binary value of the hash.

Something like:

// Returns 'fa6bad40f388d8f12525971a9d60a05797ee69b34bdb929b08a3c8ce1dc1c39116088286bf62e43b2435394f501964b1'.
get_file_hash(path="js/app.js", sha_type=384)

// Returns '+mutQPOI2PElJZcanWCgV5fuabNL25KbCKPIzh3Bw5EWCIKGv2LkOyQ1OU9QGWSx'.
get_file_hash(path="js/app.js", sha_type=384, base64=true)

What do you think?

@Keats
Copy link
Collaborator

Keats commented Feb 3, 2021

Yep sounds good

@SkypLabs
Copy link
Contributor

SkypLabs commented Feb 4, 2021

I'm working on the PR but there is something I don't get about the can_get_file_hash_* unit tests.

create_file(&dir.join("app.css"), "// Hello world!")

According to this piece of code, the test functions compute the hash values of a file named app.css and containing the string // Hello world!. The problem is that when I create a file having the same name and content, I get different hash values.

For example with SHA256:

$ cat app.css
// Hello world!
$ sha256sum app.css
72d204026937df55cf1eec5fc77d8dbc323b7103b37b788b9a03e9357eddca5a  app.css

And according to the related unit test:

"572e691dc68c3fcd653ae463261bdb38f35dc6f01715d9ce68799319dd158840"

Am I missing something here?

@Absolucy
Copy link

Absolucy commented Feb 4, 2021

@SkypLabs

❯ printf '// Hello world!' | sha256sum -
572e691dc68c3fcd653ae463261bdb38f35dc6f01715d9ce68799319dd158840  -

❯ printf "// Hello world\!\n" | sha256sum -
72d204026937df55cf1eec5fc77d8dbc323b7103b37b788b9a03e9357eddca5a  -

it's just the newline that changes the hash

@SkypLabs
Copy link
Contributor

SkypLabs commented Feb 4, 2021

I actually thought about it but strangely, when I added a new line with Vim, it didn't get the right hash value.

Thanks anyway. I know the issue comes from me now.

@SkypLabs
Copy link
Contributor

SkypLabs commented Feb 4, 2021

PR created: #1339.

I'm updating the documentation.

Keats pushed a commit that referenced this issue Feb 20, 2021
…1339)

* Add support for base64-encoded hash values

The global template function 'get_file_hash' can now return a
base64-encoded hash value when its 'base64' parameter is set to true.

See discussion in #519.

* Fix integrity attribute's value in test site

SRI hash values must be base64-encoded.

* Update documentation about 'get_file_hash'

* Fix 'can_get_hash_for_static_files' unit test
adworacz added a commit to adworacz/zerm that referenced this issue Mar 20, 2021
This allows users to set extremely high caching values for their CSS and
assets without worrying about poor UX whenever an update to said
assets is made.

Note: Can't add this for JS assets yet, the cachebusting logic is
broken, see getzola/zola#1416

Ref:
* https://www.getzola.org/documentation/templates/overview/#get-url
* getzola/zola#519
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants