Support Multiple Vectors per Field #379

akoumjian · 2020-11-11T20:16:25Z

akoumjian
Nov 11, 2020

I don't know how difficult this would be, since I haven't investigated the elastiknn field implementation. Most fields in Elasticsearch can accept an array of values that you can match and sort against, and we could use the same thing for the dense vector field. Here is our use case:

We have long form descriptions of objects we want to index at the sentence-vector level. Each object description therefore has multiple vectors associated with it. When I perform a knn query, I want to get back a document if any of those sentence-vectors is close.

We considered having a separate document for every sentence-vector along with the object id, but this creates a lot of challenges if we want to filter against other object properties.

Thank you for the truly excellent software!

alexklibisz · 2020-11-11T22:05:11Z

alexklibisz
Nov 11, 2020
Maintainer

Hi @akoumjian . Thanks for the kind words and for trying out the plugin.

Very interesting use-case!

I've only thought about it a couple minutes, but here's one idea that might work with the current functionality:

Have a separate field per vector. So your docs would look like this:

{
  "vec_0": [1,2,3,...],
  "vec_1": [1,2,3,...],
  ...
  "vec_n": [1,2,3,...]
 ...
}

Use the dynamic mapping feature to specify that any field called "vec_*" should be an elastiknn vector.

Of course that's not ideal because you have to keep track of your n in your application code. If your docs can have different values of n, you'd have to always query for vec_0...vec_{largest n in corpus}, which could be wasteful depending on your distribution of ns.

Accepting a asingle vector or an array of vectors is interesting. I think I could make it work just in terms of pure mechanics. I'd have to think harder about how scoring should behave when you have > 1 vector. Should it return the highest score, average score, be configurable by a parameter? I won't get to work on this for at least a couple more weeks. But I appreciate any more details you might have about how you'd expect it to behave.

If you or anyone in your org want to take a pass at it, LMK and I'll write up a getting-started guide for devs. I've been meaning to do that anyways.

0 replies

akoumjian · 2020-11-11T23:58:06Z

akoumjian
Nov 11, 2020
Author

Thank you for taking the time to think it over, and such a quick response. Our current work around was precisely your suggestion, but I hadn't been clever enough to think of using dynamic mapping combined with matching on the field name to support the type. Instead we had decided to hard code a limited number of sentence vectors (vector_0, vector_1`, etc.) and simply be okay with not capturing every sentence from longer object descriptions.

As for the scoring, I believe scoring is cumulative when you have multiple matches, but I can't recall and preliminary searches in the documentation are not helping.

I'm not sure yet if we would be available to take a full try at it, but a getting started guide for devs would be appreciated for sure.

0 replies

alexklibisz · 2020-11-12T14:52:06Z

alexklibisz
Nov 12, 2020
Maintainer

Our current work around was precisely your suggestion, but I hadn't been clever enough to think of using dynamic mapping combined with matching on the field name to support the type. Instead we had decided to hard code a limited number of sentence vectors (vector_0, vector_1`, etc.) and simply be okay with not capturing every sentence from longer object descriptions.

Good to hear this at least partially works. I won't have time to work on it for at least another week. I'll keep the issue open though. I think it would be a neat feature.

As for the scoring, I believe scoring is cumulative when you have multiple matches, but I can't recall and preliminary searches in the documentation are not helping.

Cumulative would be tricky if you can have different numbers of vectors. What happens when one doc has 100 and another doc has 2. Not really a fair comparison.

I'm not sure yet if we would be available to take a full try at it, but a getting started guide for devs would be appreciated for sure.

I just added a developer guide here: https://github.com/alexklibisz/elastiknn/blob/master/developer-guide.md

Feel free to ask questions on gitter or email me aklibisz@gmail.com if you or someone on your team ends up digging into it. I'm sure there are things I haven't documented thoroughly.

0 replies

alexklibisz · 2020-12-19T19:52:14Z

alexklibisz
Dec 19, 2020
Maintainer

Another interesting use-case for this is when you might have multiple images, and thus multiple image vectors, for a single product

0 replies

muzaluisa · 2022-02-21T13:20:02Z

muzaluisa
Feb 21, 2022

Hey, @alexklibisz ! We have a similar need in storing paragraph level embeddings for document. One option to calculate the score would be to take the average of max value and the mean of top X matches (since max value and mean value both matter). Can you tell me if any updates on this feature happened? Would you have some examples of writing custom scoring for this case?

0 replies

alexklibisz · 2022-02-21T16:20:23Z

alexklibisz
Feb 21, 2022
Maintainer

Hi Luiza, no - no progress on this feature. Also no examples. I don't have this on my roadmap for the project right now, but would be happy to review and provide guidance on PRs. There's a developer guide in the repo.

0 replies

muzaluisa · 2022-02-22T13:02:30Z

muzaluisa
Feb 22, 2022

Thank you, if we decide to go with this approach later using elastiknn, I will share updates here.

0 replies

alexklibisz · 2022-07-17T20:04:47Z

alexklibisz
Jul 17, 2022
Maintainer

I don't plan on implementing this. Will happily review if someone else takes a pass at it.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Multiple Vectors per Field #379

{{title}}

Replies: 8 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Support Multiple Vectors per Field #379

akoumjian Nov 11, 2020

Replies: 8 comments

alexklibisz Nov 11, 2020 Maintainer

akoumjian Nov 11, 2020 Author

alexklibisz Nov 12, 2020 Maintainer

alexklibisz Dec 19, 2020 Maintainer

muzaluisa Feb 21, 2022

alexklibisz Feb 21, 2022 Maintainer

muzaluisa Feb 22, 2022

alexklibisz Jul 17, 2022 Maintainer

akoumjian
Nov 11, 2020

alexklibisz
Nov 11, 2020
Maintainer

akoumjian
Nov 11, 2020
Author

alexklibisz
Nov 12, 2020
Maintainer

alexklibisz
Dec 19, 2020
Maintainer

muzaluisa
Feb 21, 2022

alexklibisz
Feb 21, 2022
Maintainer

muzaluisa
Feb 22, 2022

alexklibisz
Jul 17, 2022
Maintainer