Add Dockerfile and pre-commit hook config #132

wlritchi · 2023-09-12T08:26:20Z

User story

I'd like to run ECT automatically on images that are newly added to a git repository. pre-commit is a relatively common tool for running a configurable set of git hooks on changed files. It supports arbitrary hooks using Docker (among other ways).

Proposal

This PR adds a Dockerfile for ECT, and a .pre-commit-hooks.yaml file describing how pre-commit should run the hook. An example user might configure the following in their .pre-commit-config.yaml:

repos:
- repo: https://github.com/fhanau/Efficient-Compression-Tool
  rev: v0.10.0
  hooks:
  - id: optimize-files
  - args: ['--strict', '-9']

With this config, pre-commit would use ECT to optimize newly added or changed PNGs, JPEGs, ZIPs, and GZIPs.

Notably, this PR does not include any automation for pushing Docker images to a repository such as Docker Hub, so there's no additional infrastructure or maintenance burden (other than updating the Dockerfile if you add new compilation dependencies). On first use, pre-commit clones the repo and builds the image from source.

Assumptions

In order to use ECT as an effective pre-commit hook, I assume that once a file has been optimized, re-running ECT on that file will not change the file, and will be reasonably fast. In my testing, this appears to be the case, but it would be nice to have some theoretical backing for this claim.

fhanau · 2023-09-24T23:25:43Z

As I understand, the changes in the PR should be useful for this use case, and optimizing images automatically upon commit certainly sounds useful. However, I'm not familiar with pre-commit and hesitant to add new CI or related tooling while I'm not actively developing ECT. Are pre-commit hooks generally designed to live in the repository of the tools they are using? I do think this is a useful feature though, so I'll certainly revisit this.

As for your assumption at the end, this is usually but not always true, at least not for PNG. ECT does both PNG color encoding optimizations (e.g. greyscale vs. true color, palette, the filters being used) as well as optimizing the actual deflated image data. In the less aggressive compression levels, there are also heuristics being used, e.g. the approach to filtering being chosen early. For the color optimizations, the output is based on the input to at least some extent. For example, ECT might decide to filter an image in a certain way and then do some color optimizations on an image, resulting in a different input and potentially better decision on filtering, so in rare cases another small improvement is possible.

wlritchi · 2023-09-24T23:52:54Z

Are pre-commit hooks generally designed to live in the repository of the tools they are using?

That seems to be the most common approach, though other options are supported. I can think of at least one reason to prefer it: pre-commit has the ability to check for updates to tools, based on new tags in the git repo. If the hooks lived elsewhere, auto-update (and versioning in general) would only be possible if there was some process bumping the tag versions on the hook repo, which adds infrastructure overhead for the hook maintainer. I think that's what's going on here.

in rare cases another small improvement is possible.

Hmm, that's maybe not ideal. Here's my thinking: another one of pre-commit's main design decisions is that, if a hook makes changes to files, the commit is aborted and it's the user's responsibility to review changes, stage them, and re-run the commit. Of course, then the hooks run again. If it's a relatively uncommon case (and rerunning won't ever make the compression worse), I think it's still reasonable, and I'll still be using this hook myself.

fhanau · 2023-10-23T01:07:25Z

Thought about it some more – this certainly feels like a useful thing to have. I'd prefer to not have a Dockerfile in the global scope though – can it be moved to a new directory, perhaps pre_commit? I was also thinking it could be done using just a script instead of a docker container, but I'd imagine that would limit cross-platform portability.

Regarding the compression issue you mentioned – ECT is designed to never make compression worse when optimizing files (as opposed to creating new ZIP/GZIP files from an uncompressed file, which will be bigger for incompressible data). As compression being better on a consecutive run is rare I think this should still work relatively well for your use case.

Add Dockerfile and pre-commit hook config

7374514

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Dockerfile and pre-commit hook config #132

Add Dockerfile and pre-commit hook config #132

wlritchi commented Sep 12, 2023

fhanau commented Sep 24, 2023

wlritchi commented Sep 24, 2023

fhanau commented Oct 23, 2023

Add Dockerfile and pre-commit hook config #132

Are you sure you want to change the base?

Add Dockerfile and pre-commit hook config #132

Conversation

wlritchi commented Sep 12, 2023

User story

Proposal

Assumptions

fhanau commented Sep 24, 2023

wlritchi commented Sep 24, 2023

fhanau commented Oct 23, 2023