-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hardware-accelerated codecs for DEFLATE and LZ4 #122
Add hardware-accelerated codecs for DEFLATE and LZ4 #122
Conversation
Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com>
@mulugetam thanks for raising the PR. Could you please share some performance numbers for these modes? |
licenses/qat-java-LICENSE.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case anyone's wondering, it looks like there is already BSD software in the project: https://github.com/search?q=repo%3Aopensearch-project%2FOpenSearch+bsd+license&type=code
Here are some performance numbers for indexing using stack overflow workload
|
Thank you @asonje. @sarthakaggarwal97 we will also share the performance numbers for search when they're ready. |
thanks @mulugetam @asonje for initial numbers. Out of curiosity, if the underlying algorithm is still same (in this case lz4, zlib), how are we seeing differences in store size? |
This is expected. As you know the store size will vary from run to run but it is still a good approximation of compression ratio. |
src/main/java/org/opensearch/index/codec/customcodecs/CustomCodecPlugin.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/index/codec/customcodecs/Lucene99CustomCodec.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com>
src/main/java/org/opensearch/index/codec/customcodecs/Lucene99QatStoredFieldsFormat.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/index/codec/customcodecs/QatLz4CompressionMode.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/index/codec/customcodecs/QatDeflateCompressorTests.java
Outdated
Show resolved
Hide resolved
@mulugetam it looks pretty cool, could you please share what arch/oses it is available? (the arch part is somewhat clear, but not arch + os combinations, windows / linux / intel macs, ...) |
…xception. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com>
The QAT built-in accelerator is available on 4th and 5th gen Intel (R) Xeon Processors. This version requires amd64/Linux. For all other systems, the |
Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com>
@mulugetam when you have chance, could you please resolve the conflicts? thank you |
Looks like it's asking me to insert new lines because existing code is not spotless formatted. |
Signed-off-by: mulugetam <mulugeta.mammo@intel.com>
Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com>
src/main/java/org/opensearch/index/codec/customcodecs/Lucene99QatCodec.java
Show resolved
Hide resolved
src/main/java/org/opensearch/index/codec/customcodecs/Lucene99QatStoredFieldsFormat.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/index/codec/customcodecs/CustomCodecService.java
Outdated
Show resolved
Hide resolved
Thanks. I believe some of us discussed similar ideas on Slack. An issue entry was also created: #130. |
@reta do we have any pending issues that need to be resolved? |
@mulugetam we still need a signoff #122 (comment) |
@sarthakaggarwal97 I am not sure we could pull it off, for |
Adding the suggested test in the current implementation is not sufficient and probably will not work, as the My thinking was that we should treat |
Certainly +1 to that |
If this is the case, I would highly advocate for this to be behind a plugin setting. #148 |
But it is separate codec already, which users have to opt-in to use (not a default one)? So users have to pick a codec AND set a setting to use it? Sounds like unreasonably complicated process to me |
So was the case when zstd was added to OpenSearch in the first iteration, but it was done via a sandbox plugin. Since the plugin is no longer sandbox, I think it would benefit to have a way to denote this is experimental and innovate with time on the settings, codec management, issue handling, etc. for these new codecs without worrying about breaking changes. The only reason I'm saying this is that the codec impacts storage of data and in case of issues, just disabling may not bring users out of any issues unless already written data is also fixed. That said, I'm okay with the call you and @sarthakaggarwal97 take on this. |
Given the precedent we have had with Zstd, I think its okay to keep the new QAT codecs as experimental for now. With that, it would be nice if we can come up with a plan as well to make new codecs, here QAT, generally available in the future. Let me think back on it. One of the list I created earlier for Zstd correctness was this for reference: opensearch-project/OpenSearch#9502 |
@sarthakaggarwal97 we have #148 to address the problem for every custom codec. |
@reta @sarthakaggarwal97 Are we on hold now until #148 is implemented? I think we should not be, as it is a separate codec that users would have to opt in to use. |
@mulugetam I don't think we need to wait for #148, we could addressed that right after (before 2.15.0) |
@reta I think we would need to introduce the experimental settings in OpenSearch, since we do the validation of index codecs in EngineConfig Ideally would want to stop the creation of index only if the codecs is not experimental. I feel we would need a two changes. A new method in CodecSettings can tell us if the codec is experimental or not, and a feature flag setting will tell us whether we should make the experimental codecs available or not. |
Thank you @mulugetam for this change. |
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/custom-codecs/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/custom-codecs/backport-2.x
# Create a new branch
git switch --create backport/backport-122-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 c8b0d80a8286459857f2db2c0e9d3c1c076ada9d
# Push it to GitHub
git push --set-upstream origin backport/backport-122-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/custom-codecs/backport-2.x Then, create a pull request where the |
@mulugetam would you please help with the backport as well? Thank you |
@sarthakaggarwal97 it would be great but we validation logic won't help us here I think: the codecs are registered by Apache Lucene SPI (the validation logic you are referring to only helps with ensuring the codec settings validness).
That's one of the problems: we could extend |
I think a check like this will definitely help to disable its usage in the write path. For write path, NamedSPI interface is not used. |
…ct#122) * Add QAT accelerated compression. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com> * Use own classes for QAT codec. Apply SpotlessJavaCheck. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com> * Declare fields final, unless required not to. Throw a valid type of exception. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com> * Use assumeThat in the Qat test classes. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com> * Add more QAT availability check in QatCodecTests. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com> * Make LZ4 the default algorithm for QAT. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com> * Make 'auto' the default execution mode for QAT. Also, minor clean up work. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com> * Revert compression level for ZSTD to 3. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com> * Replace QatLz4/DeflateCompressionMode classes with QatCompressionMode. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com> * Fix a MultiCodecMergeIT test fail. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com> * Remove hard-coded values for default compression level. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com> --------- Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com> Signed-off-by: mulugetam <mulugeta.mammo@intel.com> Co-authored-by: Mulugeta Mammo <cppx86@gmail.com> (cherry picked from commit c8b0d80)
|
* Add QAT accelerated compression. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com> * Use own classes for QAT codec. Apply SpotlessJavaCheck. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com> * Declare fields final, unless required not to. Throw a valid type of exception. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com> * Use assumeThat in the Qat test classes. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com> * Add more QAT availability check in QatCodecTests. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com> * Make LZ4 the default algorithm for QAT. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com> * Make 'auto' the default execution mode for QAT. Also, minor clean up work. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com> * Revert compression level for ZSTD to 3. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com> * Replace QatLz4/DeflateCompressionMode classes with QatCompressionMode. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com> * Fix a MultiCodecMergeIT test fail. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com> * Remove hard-coded values for default compression level. Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com> --------- Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com> Signed-off-by: mulugetam <mulugeta.mammo@intel.com> Co-authored-by: Mulugeta Mammo <cppx86@gmail.com> (cherry picked from commit c8b0d80)
Description
Adds hardware-accelerated DEFLATE and LZ4 compression codecs for stored fields. The hardware in focus here is Intel (R) QAT, which is an integrated, built-in accelerator on the latest 4th and 5th Gen Intel Xeon processors. The implementation relies on the Qat-Java library.
The PR adds two additional valid values for
index.codec
:qat_deflate
andqat_lz4
. It also introduces a new setting,index.codec.qatmode
, that specifies the mode of execution for QAT.Two values are supported for
index.codec.qatmode
:hardware
andauto
. Ahardware
execution mode uses only the QAT hardware, while anauto
execution mode may switch to software if hardware resources are not available.Closes
#130
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.