Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persist in-memory cache, change hash. #116

Merged
merged 11 commits into from
Aug 30, 2021
Merged

Persist in-memory cache, change hash. #116

merged 11 commits into from
Aug 30, 2021

Conversation

zeroby0
Copy link
Contributor

@zeroby0 zeroby0 commented Jul 19, 2021

Fixes #51
Fixes #115

If the processed file exists in the output folder, processing it is skipped. Thus persisting the cache.
Output images are reprocessed if useCache: false

Because we are using cache across runs, we can't use just the file name for calculating hash:

  • Sharp options can change between runs
  • The image content can change

So I had to change hash to use image content + sharp options. This hashing is done with the built-in crypto library, using SHA1 because it seemed to have the highest throughput.

This PR was meant to solve #115, but it solves #51 too by accident 😂

For testing, please use:

npm r @11ty/eleventy-img 
npm i zeroby0/eleventy-img#cache

See summary of changes: #116 (comment)
See it in action: https://github.com/zeroby0/demo-eleventy-img-cache

@zeroby0 zeroby0 changed the title Add FS cache, change hashing Persist in-memory cache, change hash. Jul 19, 2021
@zeroby0
Copy link
Contributor Author

zeroby0 commented Jul 19, 2021

Benchmarks

Run No cache persistence with cache persistence
1st run (empty cache) 11.74 seconds 11.52 seconds
2nd run (filled cache) 11.32 seconds 131.82 milli seconds

These aren't 'true' benchmarks, just the times taken to run the sample/sample.js file. Further, I ran them only once, ideally you would run each case several times and take the median value.

All the values are taken with the remote files cache filled, so my Internet connection doesn't influence the results.

Also, Huge thanks to everyone who wrote the tests, they were super helpful in identifying the bugs!

@zeroby0 zeroby0 marked this pull request as ready for review July 19, 2021 21:26
@zeroby0
Copy link
Contributor Author

zeroby0 commented Jul 19, 2021

I simplified the hashing and fixed a bug in it in f265638

Images using svgShortCircuit are not cached because it seems to complicate the patch. If you're using svgShortcircuit and want it implemented, let me know.

With this, I think it's now ready for review. I do expect some bugs, especially in metadata, but everything seems fine so far (fixed in 4332dec). Please review :)

@zeroby0
Copy link
Contributor Author

zeroby0 commented Aug 4, 2021

svgShortCircuit is now supported in 4332dec! Turns out it wasn't that complicated after all.

stat.sizes is calculated from the filesystem directly, and stat.buffer is set if dry-run. The previous commits didn't set these properties. There were the metadata bugs I had a hunch about, and now they're fixed.

Please review and merge.

@zachleat zachleat self-requested a review August 5, 2021 21:29
@zachleat
Copy link
Member

zachleat commented Aug 5, 2021

Reviewing!

img.js Outdated Show resolved Hide resolved
img.js Show resolved Hide resolved
img.js Show resolved Hide resolved
@zachleat
Copy link
Member

zachleat commented Aug 6, 2021

Did some more testing this morning and this is looks like it’s working great! Super exciting PR, thank you!

@zachleat zachleat added the breaking change This will have to be included with a major version as it breaks backwards compatibility. label Aug 6, 2021
@zachleat zachleat added this to the Next Major Version milestone Aug 6, 2021
img.js Outdated Show resolved Hide resolved
@zeroby0
Copy link
Contributor Author

zeroby0 commented Aug 7, 2021

Thank you so much for reviewing, and for the incredible feedback! :D

Here is a summary of changes (will be kept up to date), to help with documentation:

Caching

If the output files exist in the output directory, they will not be reprocessed. We could cache the output images in .cache and copy them over on every build, but writing to disk can be really slow, so let's not do that.

The flipside is that some people like to clear the _site directory before building, and that clears the cache too, so the image output directory must be excluded:

  • Excluding _site/img/: rimraf '_site/!(img)'
  • Excluding _site/assets/images/: rimraf '_site/!(assets)' '_site/assets/!(images)'

Images will be reprocessed if useCache: false.

Hashes

The hash is calculated from image content and the options sent to Sharp. The default length is 10 url-safe base64 letters (60 bits). The hashing algorithm is SHA256 from the crypto module built into nodejs, and works on Glitch and Stackblitz too.

As the hash truly reflects the image contents, and is included in the name, the images generated can be safely HTTP cached forever. cache-control: public, max-age=31536000, immutable

The hash function is also available in the public API as getHash(src, imgOptions={}, length=10).

src can be either a filepath, a url, or a string buffer.

  • If it's a file path, the content of the file is used for hashing.
  • If it's a URL, the remote asset content should be passed in via imgOptions.remoteAssetContent, and that is used for hashing.
  • If not, the value of src is used for hashing.

imgOptions should contain the Sharp options used for the current image.

  • imgOptions.userOptions: {} can be used to include other data the hash should depend on.
  • All other keys in imgOptions are ignored.

length controls the length of the hash.

  • The default value is 10.
  • length=45 can be used to get the complete SHA256 hash.

The hashing algorithm and the encoding are not configurable by the user yet, but we can let them be, if it seems like a good idea.

statsSync and statsByDimensionsSync Breaking change

if src is a URL, the file content at the URL should be passed in via options.remoteAssetContent for calculating hash.

Yup, this rabbit hole went really deep really quick. 😂

const fileContent = fs.readFileSync(src);
hash.update(fileContent);
} else {
hash.update(src);
Copy link
Contributor Author

@zeroby0 zeroby0 Aug 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved in 091f430

This line means the hash for urls will always be the same, even if the remote content changes.

So even if eleventy-cache-assets fetches the new asset, it will not be replaced in the output because the old version with the same hash exists.

queueImage downloads the remote asset so we can use the content for hashing, but statsSync and statsByDimensionsSync will be broken for urls.

Unless

  1. The remote asset content be passed in by the user as options.remoteAssetContent (091f430 implements this). or
  2. We can somehow synchronously fetch assets with eleventy-cache-assets, or
  3. Use the options.__validAssetCache and regenerate images if cache is invalid. But the user will have to treat remote images and local images differently, because remote images are generated from url, not content. For example, such remote images cannot be http cached.

The first option is the most flexible of the three: If the user is unable to pass in the remote content, they can pass in the url as options.remoteAssetContent and the build will not fail. Or they can pass in a random string in production, and be sure that the hash is unique.

@zachleat zachleat merged commit ffef727 into 11ty:master Aug 30, 2021
@zachleat
Copy link
Member

Shipping with 1.0, thanks! I did remove the remoteAssetContent piece for hashing, though. We can’t rely on the remote content for the synchronous hash methods, as the content fetching is asynchronous

@tusamni
Copy link

tusamni commented Oct 12, 2021

Just wanted to note that I tested this in 1.0.0-beta.2 with a lot of local images and it's incredibly awesome. Makes rebuilds happen lightning quick.

Thanks!

@edvind
Copy link

edvind commented Oct 14, 2021

Just wanted to note that I tested this in 1.0.0-beta.2 with a lot of local images and it's incredibly awesome. Makes rebuilds happen lightning quick.

Thanks!

Came here to say the same! Very nice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change This will have to be included with a major version as it breaks backwards compatibility.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Persist cache in Netlify Persist in-memory cache
4 participants