Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for reloading the SPI for KnnVectorsFormat class #13393

Closed
navneet1v opened this issue May 20, 2024 · 16 comments
Closed

Add support for reloading the SPI for KnnVectorsFormat class #13393

navneet1v opened this issue May 20, 2024 · 16 comments

Comments

@navneet1v
Copy link
Contributor

navneet1v commented May 20, 2024

Description

Description

Lucene uses SPI to get the instance for various classes like Codec, KNNVectorsFormat etc.

Currently Codec class provide a way to reload the SPIs by providing an interface which takes a ClassLoader and reload the SPIs. Ref: https://github.com/apache/lucene/blob/branch_9_10/lucene/core/src/java/org/apache/lucene/codecs/Codec.java#L126-L137

but similar functionality is not present in the KNNVectorsFormat.(I checked main branch and branch_9_10 too). Ref: https://github.com/apache/lucene/blob/branch_9_10/lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsFormat.java

I am not sure if this is a miss or there is some other way to reload the SPI of KNNVectorsFormat class.

Solution

What I am looking for here is to add the support for reload SPI function in KNNVectorsFormat class too so that external libraries/application can load their own KNNVectorsFormat.

I am willing to pick up this issue and contribute back.

@benwtrent please let me know your thoughts on this

@navneet1v
Copy link
Contributor Author

I have raised a PR for the fix: #13394

@benwtrent
Copy link
Member

My only concern is why is this necessary?

Is feature parity the only reason?

@navneet1v
Copy link
Contributor Author

navneet1v commented May 24, 2024

My only concern is why is this necessary?

Is feature parity the only reason?

Currently in my application I want to extend the KNNVectorsFormat just like Codec, to plug in a completely different implementation of Vector Search Algorithm(which not in Java).

As KNNVectorsFormat is part of Codec which allows applications on top of Lucene to extend the codec and their format. But due to lack of support of reloading of SPI in KNNVectorsFormat this not possible.

The change is a pretty straightforward change and adding extensibility support on KNNVectorsFormat only.

@benwtrent please let me know if you have further questions.

@benwtrent
Copy link
Member

I still don't understand.

If the SPIs are there, it should work. If the SPI extension didn't work, none of the current formats would work either as they all rely on it.

Why doesn't your application work?

@navneet1v
Copy link
Contributor Author

navneet1v commented May 24, 2024

@benwtrent
Thats correct, if the classes are in the main application the SPI will work.

But in my case during runtime I need to load some other jars(consider them as plugin) which has the KNNVectorFormat classes. And these classes as they are getting loaded at runtime when instance of KNNVectorsFormat class(an implementation present in loaded plugins) is getting created via SPI, the SPI is not able to find the class as they are not present in the Holder object.

This is the same reloading of SPI via Classpaths is present for Codec and DocValuesformat too as I have referenced in description.

Ref:

  1. https://github.com/apache/lucene/blob/branch_9_10/lucene/core/src/java/org/apache/lucene/codecs/Codec.java#L126-L137
  2. https://github.com/apache/lucene/blob/branch_9_10/lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsFormat.java

@ChrisHegarty
Copy link
Contributor

The dynamic addition of formats from other class loaders seems reasonable, though I don't have a use case for it myself. Maybe I'm missing something but before adding a new API, is it not possible to implement the functionality through the existing Codec and reload ( possibly using FilterCodec ). And/or PerFieldKnnVectorsFormat. OR is it a more straightforward way that is being suggested?

@navneet1v
Copy link
Contributor Author

navneet1v commented May 24, 2024

@ChrisHegarty

The dynamic addition of formats from other class loaders seems reasonable, though I don't have a use case for it myself. Maybe I'm missing something

A use case of this is Opensearch where Opensearch loads the plugins jars during runtime, and plugins have classes(Codecs, DocValuesFormat etc) thats get init using SPI. For similar kind of usecases a public API to reload the SPI using class loaders is very useful. Opensearch uses this for DocValuesFormat and Codec just like this. Ref: https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/plugins/PluginsService.java#L763-L765

is it not possible to implement the functionality through the existing Codec and reload ( possibly using FilterCodec ). And/or PerFieldKnnVectorsFormat. OR is it a more straightforward way that is being suggested?

Actually I implemented this exact same way, where I have my codec that extends the FilterCodec. In that codec I have override the knnVectorsFormat() function to return a perFieldKnnVectorsFormat. This perFieldKnnVectorsFormat is my own implementation which based on some condition either return Lucene99HnswVectorsFormat or a new VectorsFormat say NavneetVectorsFormat. NavneetVectorsFormat has its own writer and reader classes.

Now during indexing everything works well. But when the IndexReader is getting opened, it needs to create an Instance of NavneetVectorsFormat using SPI. This is where exception start to happen because KNNVectorsFormat SPI Holder object doesn't know that NavneetVectorsFormat class is present in the class path, as the jar where this class was present is loaded during the application runtime.

I hope this clarifies.

@msfroh
Copy link
Contributor

msfroh commented May 24, 2024

Is it fair to say that this change just brings KnnVectorsFormat in line with:

  1. PostingsFormat:
    public static void reloadPostingsFormats(ClassLoader classloader) {
  2. DocValuesFormat:
    public static void reloadDocValuesFormats(ClassLoader classloader) {
  3. SortFieldProvider:
    public static void reloadSortFieldProviders(ClassLoader classLoader) {

These other formats are dynamically pluggable, and this change is just making KnnVectorsFormat consistent with them, right?

@navneet1v
Copy link
Contributor Author

These other formats are dynamically pluggable, and this change is just making KnnVectorsFormat consistent with them, right?

Yes that is correct.

@benwtrent
Copy link
Member

Thanks for the clarification @navneet1v I didn't know folks were dynamically loading jars for different vector formats.

The idea sounds good to me. I haven't reviewed the PR.

I just wanted to make sure we weren't adding code without a practical and current use.

@navneet1v
Copy link
Contributor Author

Thanks for the clarification @navneet1v I didn't know folks were dynamically loading jars for different vector formats.

The idea sounds good to me. I haven't reviewed the PR.

I just wanted to make sure we weren't adding code without a practical and current use.

@benwtrent Sounds good to me. Will wait for your review on the PR.

@ChrisHegarty
Copy link
Contributor

Thanks for the explanation - sounds like a reasonable idea.

@navneet1v
Copy link
Contributor Author

navneet1v commented May 25, 2024

@benwtrent , @ChrisHegarty as we are inclined that this is a valid usecase, can you guys please review the PR #13394 I would like to get this feature in the 9.11 version of Lucene.

@uschindler
Copy link
Contributor

Thanks for bringing this up.

Unfortunately we have no way to automatically enforce this for our NamedSPI provides (as it is static methods), but we should keep all of them in sync.

At moment there's also missing the getter for list of names. I argued about this in the other PR already. The testframework uses expensive extra walk though ServiceLoader to do get those. This should be also brought in line with the other codec/analysis components. I will open a PR for this. The code in test-framework is inconsistent and a duplicates existing logic in a different way (the one that randomly chooses a KNN format).

@uschindler
Copy link
Contributor

see #13428

This also fixed the concern in other PR with SPI named similarity functions

@navneet1v
Copy link
Contributor Author

At moment there's also missing the getter for list of names.

@uschindler, when raising the PR I saw this and thought is it needed or not. To ensure the scope of the PR was minimal I didn't add the getter method. I see that you have raised the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants