Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement our own index builder functionality for faiss instead of index_description #1894

Open
jmazanec15 opened this issue Jul 29, 2024 · 0 comments
Labels
Refactoring Improve the design, structure, and implementation while preserving its functionality

Comments

@jmazanec15
Copy link
Member

Description

In faiss right now, we use something called "index description" that is passed to the faiss index factory to build our index. See https://github.com/facebookresearch/faiss/blob/main/faiss/index_factory.cpp and https://github.com/facebookresearch/faiss/wiki/The-index-factory. While this is nice if you want to try out a lot of different index combinations, it has a couple downsides:

  1. the grammar is pretty confusing
  2. it is unclear which indices we may be building and how we are building them - when debugging, to figure out the different components of the index, it requires walking through with a debugger
  3. building the index description adds complexity to the java side code. It creates a branch from other engines and requires following their defined grammar

For those reasons, I think moving away from building this index description will make the code more maintainable and allow us to more easily extend the code in the future. Instead of the index factory, I think we should just pass a map of the parameters as is to the JNI library and let the JNI figure out how to configure the index.

From a migration perspective, we add the index description to the field type attributes (https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/mapper/MethodFieldMapper.java#L62-L63), but this will not impact new segments being created because these attributes are created for each segment. Because the mapping passed by the user has all of the parameters, it should be seemless from this perspective. Similarly, for training, it will not have any compatibility issues either because the description is just created at the time of model creation and not serialized. On the JNI side, we will need to basically duplicate how we are creating the indices in the faiss index_factory so that there are no regressions. This will need some investigation.

@jmazanec15 jmazanec15 added Refactoring Improve the design, structure, and implementation while preserving its functionality and removed untriaged labels Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Refactoring Improve the design, structure, and implementation while preserving its functionality
Projects
Status: Backlog
Development

No branches or pull requests

1 participant