-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(gzip): change compression block size #508
Conversation
Several tools set this as the default value. image/copy/copy.go 56:var compressionBufferSize = 1048576 skopeo/vendor/github.com/containers/image/v5/copy/compression.go 24: compressionBufferSize = 1048576 Choosing a different value causes the final gzip'ed layer to become different although the source tar is identical, causing an content-addressable invariants to break. Signed-off-by: Ramkumar Chinchani <rchincha@cisco.com>
Unfortunately, a lot of things can cause the output blobs to change despite having the same input -- updates to the gzip compression library will cause changes, for instance (this is why we have tests for this in CI, and is the reason I haven't updated the gzip library we use -- but I think the data is too small to detect the difference in the buffer size).
I'm not sure what invariants you're referring to -- there is no guarantee that you can get the same gzip blob from the same input with different programs/libraries (like all compression algorithms, gzip does not have a canonical representation). The best we've been able to do with umoci is try our best to guarantee that we will not produce different blobs for the same input across versions. (IMHO it seems pretty clear at this point that baking in compression to the image blob format was a mistake.) Or are you running into this because you use umoci as a library as well as containers/image, and the different settings cause the gzip stream of the same input to have different outputs? That's annoying, but it seems possible for (FWIW, I don't mind merging this, though it will mean that we will start outputting streams with different hashes to previous versions...) |
Codecov Report
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #508 +/- ##
==========================================
- Coverage 73.53% 73.48% -0.05%
==========================================
Files 60 60
Lines 4881 4884 +3
==========================================
Hits 3589 3589
- Misses 933 935 +2
- Partials 359 360 +1
|
Without question - anything that affects the final output should have been considered. In the OCI world, unfortunately no single tool does all jobs and one has to be careful that they interoperate well. Our use case was to deduplicate content and unfortunately changing compression params changes the output although the input was identical (across tools of course) As you may be already aware, we use umoci as a library in https://github.com/project-stacker/stacker.
At least speaking just for us, we are ok with this ^ |
I think this might be better off as a configuration, there may indeed be cases where this will break assumptions in tools using this. |
containers/image@5b23c4b |
Closing this in favor of PR #509 |
Several tools set this as the default value.
image/copy/copy.go
56:var compressionBufferSize = 1048576
skopeo/vendor/github.com/containers/image/v5/copy/compression.go
24: compressionBufferSize = 1048576
Choosing a different value causes the final gzip'ed layer to become different although the source tar is identical, causing an content-addressable invariants to break.