Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standardization around compression params #1145

Closed
rchincha opened this issue Oct 25, 2023 · 5 comments
Closed

standardization around compression params #1145

rchincha opened this issue Oct 25, 2023 · 5 comments

Comments

@rchincha
Copy link

rchincha commented Oct 25, 2023

For a given source tar, compression params matter even if the same compression algorithm is used. Often tar.gz layers are produced but different tools set buffer size, compression levels etc differently, so the final layer ends up being different although the source tar is the same and consequently the sha256sum.

As a result, reproducibility and deduplication suffers. Either clients use identical tooling end-to-end (which is unrealistic) or the standards evolve to encode this variability in the spec.

@rchincha
Copy link
Author

rchincha commented Oct 26, 2023

https://datatracker.ietf.org/doc/html/rfc7231#section-3.1.1.1

"application/vnd.oci.image.layer.v1.tar+gzip"

-->

"application/vnd.oci.image.layer.v1.tar+gzip; param1=x; param2=y" etc?

@jonjohnsonjr
Copy link
Contributor

There is not one standard algorithm for encoding DEFLATE. Even with the same parameters, different gzip implementations will produce different archives. If you want to record exactly how a layer was produced, the place for that would be in something like an SBOM, not in the mediaType.

@sudo-bmitch
Copy link
Contributor

This might be another case for pushing uncompressed content in the descriptor, and compressing at the transport level.

@rchincha
Copy link
Author

rchincha commented Oct 26, 2023

There is not one standard algorithm for encoding DEFLATE

Yup.

Also requiring another optional thing (SBOM) for reproducibility?

@rchincha
Copy link
Author

Closing this due to the following:

  1. OCI image-spec is a spec and a standard but there are limits

  2. The spec itself is a lot of JSON and there is no canonical JSON to begin with

  3. Output of compression is affected by the algorithm and parameters so you really have to pick and stick with a certain tool that produces the OCI layers (tar and tar.gz). Also make sure you record the tool and version used in order to reproduce the bits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants