Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Dockerfile and pre-commit hook config #132

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

wlritchi
Copy link

User story

I'd like to run ECT automatically on images that are newly added to a git repository. pre-commit is a relatively common tool for running a configurable set of git hooks on changed files. It supports arbitrary hooks using Docker (among other ways).

Proposal

This PR adds a Dockerfile for ECT, and a .pre-commit-hooks.yaml file describing how pre-commit should run the hook. An example user might configure the following in their .pre-commit-config.yaml:

repos:
- repo: https://github.com/fhanau/Efficient-Compression-Tool
  rev: v0.10.0
  hooks:
  - id: optimize-files
  - args: ['--strict', '-9']

With this config, pre-commit would use ECT to optimize newly added or changed PNGs, JPEGs, ZIPs, and GZIPs.

Notably, this PR does not include any automation for pushing Docker images to a repository such as Docker Hub, so there's no additional infrastructure or maintenance burden (other than updating the Dockerfile if you add new compilation dependencies). On first use, pre-commit clones the repo and builds the image from source.

Assumptions

In order to use ECT as an effective pre-commit hook, I assume that once a file has been optimized, re-running ECT on that file will not change the file, and will be reasonably fast. In my testing, this appears to be the case, but it would be nice to have some theoretical backing for this claim.

@fhanau
Copy link
Owner

fhanau commented Sep 24, 2023

As I understand, the changes in the PR should be useful for this use case, and optimizing images automatically upon commit certainly sounds useful. However, I'm not familiar with pre-commit and hesitant to add new CI or related tooling while I'm not actively developing ECT. Are pre-commit hooks generally designed to live in the repository of the tools they are using? I do think this is a useful feature though, so I'll certainly revisit this.

As for your assumption at the end, this is usually but not always true, at least not for PNG. ECT does both PNG color encoding optimizations (e.g. greyscale vs. true color, palette, the filters being used) as well as optimizing the actual deflated image data. In the less aggressive compression levels, there are also heuristics being used, e.g. the approach to filtering being chosen early. For the color optimizations, the output is based on the input to at least some extent. For example, ECT might decide to filter an image in a certain way and then do some color optimizations on an image, resulting in a different input and potentially better decision on filtering, so in rare cases another small improvement is possible.

@wlritchi
Copy link
Author

Are pre-commit hooks generally designed to live in the repository of the tools they are using?

That seems to be the most common approach, though other options are supported. I can think of at least one reason to prefer it: pre-commit has the ability to check for updates to tools, based on new tags in the git repo. If the hooks lived elsewhere, auto-update (and versioning in general) would only be possible if there was some process bumping the tag versions on the hook repo, which adds infrastructure overhead for the hook maintainer. I think that's what's going on here.

in rare cases another small improvement is possible.

Hmm, that's maybe not ideal. Here's my thinking: another one of pre-commit's main design decisions is that, if a hook makes changes to files, the commit is aborted and it's the user's responsibility to review changes, stage them, and re-run the commit. Of course, then the hooks run again. If it's a relatively uncommon case (and rerunning won't ever make the compression worse), I think it's still reasonable, and I'll still be using this hook myself.

@fhanau
Copy link
Owner

fhanau commented Oct 23, 2023

Thought about it some more – this certainly feels like a useful thing to have. I'd prefer to not have a Dockerfile in the global scope though – can it be moved to a new directory, perhaps pre_commit? I was also thinking it could be done using just a script instead of a docker container, but I'd imagine that would limit cross-platform portability.

Regarding the compression issue you mentioned – ECT is designed to never make compression worse when optimizing files (as opposed to creating new ZIP/GZIP files from an uncompressed file, which will be bigger for incompressible data). As compression being better on a consecutive run is rare I think this should still work relatively well for your use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants