-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enabling Compression Level configuration of ZSTD and ZSTD (No Dict) #7555
Comments
@sarthakaggarwal97 OpenSearch users can use some of the existing benchmarks for their reference when choosing the compression level. It would be great if you can share some of the benchmark runs as well across different levels. |
We've been collaborating with the Intel team and benchmarking compression levels in additional real world scenarios with @backslasht. Stay tuned @shwetathareja! |
Related, in #7475 we are proposing to make the block size configurable as well. |
Currently, we have set the default compression level for In the experiments, I'm observing that there is a improvement of roughly 5-6% in the average indexing throughput with level 3 as the default compression level. BenchmarksNYC Taxis Dataset ![]() HTTP Logs Dataset ![]() I think we should switch to level 3 as the default compression level. Moreover, |
Thanks for sharing these numbers. Is there any comparison of storage used as well across these to get insights around the trade-off? |
@mgodwan Adding comparison for store size along with indexing throughput. We roughly see 4% increase in the store size with level 3 having level 6 as baseline. NYC Taxis ![]() HTTP Logs ![]() |
Thanks @sarthakaggarwal97. This seems like a safe bet as the increase in indexing throughput and storage size are proportional. Have we run tests for sufficiently large time? wondering if it will also increase the background merge time as the segments might be large with compression level 3? |
@backslasht yes, I observed background merges were happening for the runs. The segments with level 3 should not be large enough to start affecting background merge time when compared to level 6. Level 3 looks like a good sweet spot. |
Can this be closed with #8471? |
Is your feature request related to a problem? Please describe.
In the present implementation, zstd and zstd_no_dict uses default compression level i.e. 6 as mentioned here, here and here. According to documentation, these algorithms can support the compression level range of 1-22. Theoretically, configurable compression levels can help customers tune the tradeoff between storage and throughput according to their requirements.
Describe the solution you'd like
An introduction of configurable compression level as an index setting for zstd and zstd-no-dict
Additional context
#3354
cc: @mgodwan @backslasht @shwetathareja
The text was updated successfully, but these errors were encountered: