Enabling Compression Level configuration of ZSTD and ZSTD (No Dict) #7555

sarthakaggarwal97 · 2023-05-14T11:45:47Z

Is your feature request related to a problem? Please describe.
In the present implementation, zstd and zstd_no_dict uses default compression level i.e. 6 as mentioned here, here and here. According to documentation, these algorithms can support the compression level range of 1-22. Theoretically, configurable compression levels can help customers tune the tradeoff between storage and throughput according to their requirements.

Describe the solution you'd like
An introduction of configurable compression level as an index setting for zstd and zstd-no-dict

Additional context
#3354

cc: @mgodwan @backslasht @shwetathareja

shwetathareja · 2023-05-15T05:10:03Z

@sarthakaggarwal97 OpenSearch users can use some of the existing benchmarks for their reference when choosing the compression level. It would be great if you can share some of the benchmark runs as well across different levels.

dblock · 2023-05-16T17:07:47Z

We've been collaborating with the Intel team and benchmarking compression levels in additional real world scenarios with @backslasht. Stay tuned @shwetathareja!

dblock · 2023-05-17T15:55:37Z

Related, in #7475 we are proposing to make the block size configurable as well.

sarthakaggarwal97 · 2023-07-04T12:45:47Z

Currently, we have set the default compression level for zstd and zstd_no_dict as level 6 over here.

In the experiments, I'm observing that there is a improvement of roughly 5-6% in the average indexing throughput with level 3 as the default compression level.

Benchmarks

NYC Taxis Dataset

HTTP Logs Dataset

I think we should switch to level 3 as the default compression level. Moreover, zstd also internally uses level 3 as the default compression level as mentioned over here

cc: @mgodwan @shwetathareja @backslasht @dblock @reta

mgodwan · 2023-07-04T13:03:07Z

Thanks for sharing these numbers.
Given level 3 is the default level used in zstd implementation as well and yields better results in terms of throughput in majority of the cases, it should be alright to proceed with the same.

Is there any comparison of storage used as well across these to get insights around the trade-off?

sarthakaggarwal97 · 2023-07-04T13:58:59Z

Thanks for sharing these numbers. Given level 3 is the default level used in zstd implementation as well and yields better results in terms of throughput in majority of the cases, it should be alright to proceed with the same.

Is there any comparison of storage used as well across these to get insights around the trade-off?

@mgodwan Adding comparison for store size along with indexing throughput. We roughly see 4% increase in the store size with level 3 having level 6 as baseline.

NYC Taxis

HTTP Logs

backslasht · 2023-07-06T04:01:45Z

Thanks @sarthakaggarwal97. This seems like a safe bet as the increase in indexing throughput and storage size are proportional.

Have we run tests for sufficiently large time? wondering if it will also increase the background merge time as the segments might be large with compression level 3?

sarthakaggarwal97 · 2023-07-06T05:47:36Z

@backslasht yes, I observed background merges were happening for the runs. The segments with level 3 should not be large enough to start affecting background merge time when compared to level 6. Level 3 looks like a good sweet spot.
I will raise the PR to change the default compression level to 3.

dblock · 2023-07-06T20:39:45Z

Can this be closed with #8471?

backslasht · 2023-07-07T02:58:20Z

@dblock - I think this will require #8312 to be closed.

sarthakaggarwal97 added enhancement Enhancement or improvement to existing feature or request untriaged labels May 14, 2023

shwetathareja removed the untriaged label May 15, 2023

sarthakaggarwal97 mentioned this issue Jun 28, 2023

Compression Levels Settings for zstd and zstd_no_dict #8312

Merged

6 tasks

This was referenced Jul 6, 2023

Updating default compression level for ZStandard Codecs #8471

Merged

[Manual Backport 2.x] Updating default compression level for ZStandard Codecs (#8471) #8482

Merged

reta closed this as completed in #8312 Jul 7, 2023

sarthakaggarwal97 mentioned this issue Jul 9, 2023

[Backport] [2.x] Enabling Compression Level Index Settings for ZSTD and ZSTD Codecs #8558

Merged

6 tasks

reta added the v2.9.0 'Issues and PRs related to version v2.9.0' label Jul 10, 2023

This was referenced Jul 10, 2023

Backport/backport 8312 to 2.x #8589

Closed

[Backport-2.x] Fix CodecTests #8591

Merged

Compression level to not be set for default or best_compression codec #8677

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabling Compression Level configuration of ZSTD and ZSTD (No Dict) #7555

Enabling Compression Level configuration of ZSTD and ZSTD (No Dict) #7555

sarthakaggarwal97 commented May 14, 2023 •

edited

Loading

shwetathareja commented May 15, 2023

dblock commented May 16, 2023

dblock commented May 17, 2023

sarthakaggarwal97 commented Jul 4, 2023 •

edited

Loading

mgodwan commented Jul 4, 2023

sarthakaggarwal97 commented Jul 4, 2023 •

edited

Loading

backslasht commented Jul 6, 2023

sarthakaggarwal97 commented Jul 6, 2023

dblock commented Jul 6, 2023

backslasht commented Jul 7, 2023

Enabling Compression Level configuration of ZSTD and ZSTD (No Dict) #7555

Enabling Compression Level configuration of ZSTD and ZSTD (No Dict) #7555

Comments

sarthakaggarwal97 commented May 14, 2023 • edited Loading

shwetathareja commented May 15, 2023

dblock commented May 16, 2023

dblock commented May 17, 2023

sarthakaggarwal97 commented Jul 4, 2023 • edited Loading

Benchmarks

mgodwan commented Jul 4, 2023

sarthakaggarwal97 commented Jul 4, 2023 • edited Loading

backslasht commented Jul 6, 2023

sarthakaggarwal97 commented Jul 6, 2023

dblock commented Jul 6, 2023

backslasht commented Jul 7, 2023

sarthakaggarwal97 commented May 14, 2023 •

edited

Loading

sarthakaggarwal97 commented Jul 4, 2023 •

edited

Loading

sarthakaggarwal97 commented Jul 4, 2023 •

edited

Loading