Skip to content

Conversation

@brandonc
Copy link
Contributor

@brandonc brandonc commented Sep 9, 2022

We notice that the default gzip compression level (6) can take a long time to pack large slugs for transmission, and would like to experiment with different compression level options.

I wrote a quick benchmark using different levels with a very large target directory (800 MB) and it seems to indicate that the default compression level (6) takes 300% longer to compress 8% better than "best speed" compression.

$ time ./bench 6 testdir
Created slug, compression 878787615 -> 231762298 (73.6%)

real	0m23.843s
user	0m23.678s
sys	0m0.387s

$ time ./bench 1 testdir
Created slug, compression 878787615 -> 272378609 (69.0%)

real	0m7.146s
user	0m7.040s
sys	0m0.511s

$ time ./bench 0 testdir
Created slug, compression 878787615 -> 879579192 (-0.1%)

real	0m1.912s
user	0m1.782s
sys	0m1.125s

$

We notice that the default gzip compression level (6) can take a long time to pack large slugs for transmission, and would like to experiment with different compression level options, especially when used with archivist which supports at-rest encryption
Copy link
Member

@ryanuber ryanuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how gzip is an internal implementation detail, to me it seems like we should just change it to what we believe is the best balance of performance and compression. Having this be configurable is secondary to the default value being set correctly for our use case. IMO we should only expand our API surface area when it is clearly the right solution to the problem at hand, and in this case I'm not convinced that this should be configurable.

@brandonc
Copy link
Contributor Author

@ryanuber I take your point about the packing abstraction leaking gzip details in the API. It would be bad to break that if we switched compression algorithms altogether. But, you pointed out that there are some situations where more compression might be better, for instance, for slower links. Because go-slug is not used in a single use case in the run pipeline (we use it in terraform when uploading the configuration version as well as in tfc-agent when storing the planned configuration version) we may not be able to choose a satisfactory balance in those two vastly different network topologies. Also, it's worth considering that Archivist has the option of compressing files at rest, but enabling that is not advised and redundant when go-slug is used for packing so it seems to limit compression strategy to not have an option.

I experimented with #25 and got good results for large working directories as well but thought this change might be less controversial because the overhead is less pronounced for smaller archives. I'm wondering if you would be open to reviewing #25 or perhaps I can reopen this as a "Don't Compress" option (instead of a gzip level) that works in tandem with Archivist's compression at rest option.

@ryanuber
Copy link
Member

At the moment, swapping out the stdlib for a 3rd party package isn't something I'm able to commit to evaluating. I think a "don't compress" option would be sub-optimal, seeing how you'd need to send the uncompressed bytes over the wire. That results in more bandwidth and transfer time, and it requires an assumption to be made about the behavior that Archivist will use when storing the uploaded data (to compress or store raw).

I still feel that the most pragmatic solution here is to set a new value as our compression level, and continue to adjust if needed. It provides predictable artifact characteristics across all of our known scenarios, and at the numbers you quoted (takes 300% longer to compress 8% better), it seems pretty clear to me that going to the "best speed" compression level is a very easy win.

@brandonc brandonc closed this Sep 14, 2022
@brandonc brandonc deleted the brandonc/compression_level_option branch September 14, 2022 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants