-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removing the vec file extension from INDEX_STORE_HYBRID_NIO_EXTENSIONS, to ensure the no performance degradation for vector search via Lucene Engine. #9528
Conversation
@msfroh vec and vem lucene files only. These files are used by Lucene to store information related to vectors.
Yes we should had it always. The way it was happening earlier was K-NN plugin was adding them(vec and vex files) from 2.5 version of OpenSearch to MMap Extension list. After the release of OpenSearch 2.4 we found that k-NN search for Lucene engine became slower. Please refer these graphs for lucene: opensearch-project/k-NN#576 (comment) . The same was also reported by a customer opensearch-project/k-NN#637 (comment) (detailed analysis is there) after they upgrade to OpenSearch 2.4 from OpenSearch 2.3 . Some workaround was also provided to customers but we fixed the issue in 2.5 release of OpenSearch. This is the reason I am pushing towards a better solution, which is removing .vec file from NIO file extension list and completely remove the logic from K-NN plugin. If we don't do that K-NN plugin needs to keep on overriding the files in future which is not a scalable solution as these settings can be deprecated and new settings can come in future. I am adding @martin-gaievski from k-NN plugin maintainers who did the deep-dive if he wants to add anything else. |
For reference I am adding the quick benchmarks that I did after removing the k-NN logic which add these files in the MMAP.
if vec and vex files are mmaped.
|
Right -- my question was less around the k-nn plugin vector files and more about the Lucene vector files (where both implementations decided to use I'm pretty sure the Lucene vector files should be MMapped -- do we have numbers to confirm? |
@msfroh what I am saying is K-NN plugin uses the Lucene Vector field, when customer try to use Lucene engine for vector search. So, vec and vex are Lucene files only. They are not k-NN plugin specific. This is the quick benchmarks I have done: #9528 (comment) |
Okay, thanks! Now I (almost) get it. I'm a little slow. (So what's up with the |
Here is the official documentation: https://lucene.apache.org/core/9_7_0/core/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsFormat.html So .vex is the main file that has the HNSW graph present. and during the vector search this is an important file As per the last deep-dive done by @martin-gaievski vec and vex needs to be mmaped for performance. .vem didn't have impact on the latency. Hence vec and vex needs to be MMapped. @martin-gaievski please add anything if I have missed. |
Confirming, as per our testing |
Awesome! Thanks for clearing all of that up. Feel free to @ me as soon as you've fixed the changelog conflict and I'm happy to merge this. |
…S, to ensure the no performance degradation for vector search via Lucene Engine. Signed-off-by: Navneet Verma <navneev@amazon.com>
@msfroh fixed the conflicts. Please check |
Gradle Check (Jenkins) Run Completed with:
|
…S, to ensure the no performance degradation for vector search via Lucene Engine. (opensearch-project#9528) Signed-off-by: Navneet Verma <navneev@amazon.com>
…S, to ensure the no performance degradation for vector search via Lucene Engine. (opensearch-project#9528) Signed-off-by: Navneet Verma <navneev@amazon.com>
…S, to ensure the no performance degradation for vector search via Lucene Engine. (opensearch-project#9528) Signed-off-by: Navneet Verma <navneev@amazon.com>
…S, to ensure the no performance degradation for vector search via Lucene Engine. (opensearch-project#9528) Signed-off-by: Navneet Verma <navneev@amazon.com> Signed-off-by: Gagan Juneja <gjjuneja@amazon.com>
…S, to ensure the no performance degradation for vector search via Lucene Engine. (opensearch-project#9528) Signed-off-by: Navneet Verma <navneev@amazon.com> Signed-off-by: Gagan Juneja <gjjuneja@amazon.com>
…S, to ensure the no performance degradation for vector search via Lucene Engine. (opensearch-project#9528) Signed-off-by: Navneet Verma <navneev@amazon.com> Signed-off-by: Kiran Reddy <kkreddy@amazon.com>
…S, to ensure the no performance degradation for vector search via Lucene Engine. (#9528) (#9540) (cherry picked from commit a4024e7) Signed-off-by: Navneet Verma <navneev@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…S, to ensure the no performance degradation for vector search via Lucene Engine. (opensearch-project#9528) Signed-off-by: Navneet Verma <navneev@amazon.com> Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
…S, to ensure the no performance degradation for vector search via Lucene Engine. (opensearch-project#9528) Signed-off-by: Navneet Verma <navneev@amazon.com> Signed-off-by: Ivan Brusic <ivan.brusic@flocksafety.com>
…S, to ensure the no performance degradation for vector search via Lucene Engine. (opensearch-project#9528) Signed-off-by: Navneet Verma <navneev@amazon.com> Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Description
Removing the vec file extension from INDEX_STORE_HYBRID_NIO_EXTENSIONS, to ensure the no performance degradation for vector search via Lucene Engine.
This PR: https://github.com/opensearch-project/OpenSearch/pull/8508/files added .vec file extension in INDEX_STORE_HYBRID_NIO_EXTENSIONS and deprecated the setting : INDEX_STORE_HYBRID_MMAP_EXTENSIONS. Which made .vec not to be Mmapped. This resulted in below problems:
This change will ensure that no custom logic is present in k-NN plugin going forward and if new list get created going forward k-NN plugin doesn't require any change.
Related Issues
NA
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.