Adds compression level option to NewPacker #30

brandonc · 2022-09-09T17:52:00Z

We notice that the default gzip compression level (6) can take a long time to pack large slugs for transmission, and would like to experiment with different compression level options.

I wrote a quick benchmark using different levels with a very large target directory (800 MB) and it seems to indicate that the default compression level (6) takes 300% longer to compress 8% better than "best speed" compression.

$ time ./bench 6 testdir
Created slug, compression 878787615 -> 231762298 (73.6%)

real	0m23.843s
user	0m23.678s
sys	0m0.387s

$ time ./bench 1 testdir
Created slug, compression 878787615 -> 272378609 (69.0%)

real	0m7.146s
user	0m7.040s
sys	0m0.511s

$ time ./bench 0 testdir
Created slug, compression 878787615 -> 879579192 (-0.1%)

real	0m1.912s
user	0m1.782s
sys	0m1.125s

$

We notice that the default gzip compression level (6) can take a long time to pack large slugs for transmission, and would like to experiment with different compression level options, especially when used with archivist which supports at-rest encryption

ryanuber

Given how gzip is an internal implementation detail, to me it seems like we should just change it to what we believe is the best balance of performance and compression. Having this be configurable is secondary to the default value being set correctly for our use case. IMO we should only expand our API surface area when it is clearly the right solution to the problem at hand, and in this case I'm not convinced that this should be configurable.

brandonc · 2022-09-12T20:20:07Z

@ryanuber I take your point about the packing abstraction leaking gzip details in the API. It would be bad to break that if we switched compression algorithms altogether. But, you pointed out that there are some situations where more compression might be better, for instance, for slower links. Because go-slug is not used in a single use case in the run pipeline (we use it in terraform when uploading the configuration version as well as in tfc-agent when storing the planned configuration version) we may not be able to choose a satisfactory balance in those two vastly different network topologies. Also, it's worth considering that Archivist has the option of compressing files at rest, but enabling that is not advised and redundant when go-slug is used for packing so it seems to limit compression strategy to not have an option.

I experimented with #25 and got good results for large working directories as well but thought this change might be less controversial because the overhead is less pronounced for smaller archives. I'm wondering if you would be open to reviewing #25 or perhaps I can reopen this as a "Don't Compress" option (instead of a gzip level) that works in tandem with Archivist's compression at rest option.

ryanuber · 2022-09-13T22:11:17Z

At the moment, swapping out the stdlib for a 3rd party package isn't something I'm able to commit to evaluating. I think a "don't compress" option would be sub-optimal, seeing how you'd need to send the uncompressed bytes over the wire. That results in more bandwidth and transfer time, and it requires an assumption to be made about the behavior that Archivist will use when storing the uploaded data (to compress or store raw).

I still feel that the most pragmatic solution here is to set a new value as our compression level, and continue to adjust if needed. It provides predictable artifact characteristics across all of our known scenarios, and at the numbers you quoted (takes 300% longer to compress 8% better), it seems pretty clear to me that going to the "best speed" compression level is a very easy win.

adds compression level option

44aaf88

We notice that the default gzip compression level (6) can take a long time to pack large slugs for transmission, and would like to experiment with different compression level options, especially when used with archivist which supports at-rest encryption

brandonc requested a review from ryanuber September 9, 2022 17:52

brandonc mentioned this pull request Sep 10, 2022

Utilize a parallel gzip implementation #25

Closed

ryanuber requested changes Sep 12, 2022

View reviewed changes

brandonc closed this Sep 14, 2022

brandonc deleted the brandonc/compression_level_option branch September 14, 2022 22:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adds compression level option to NewPacker #30

Adds compression level option to NewPacker #30

Uh oh!

brandonc commented Sep 9, 2022

Uh oh!

ryanuber left a comment

Uh oh!

brandonc commented Sep 12, 2022

Uh oh!

ryanuber commented Sep 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Adds compression level option to NewPacker #30

Adds compression level option to NewPacker #30

Uh oh!

Conversation

brandonc commented Sep 9, 2022

Uh oh!

ryanuber left a comment

Choose a reason for hiding this comment

Uh oh!

brandonc commented Sep 12, 2022

Uh oh!

ryanuber commented Sep 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants