Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Hybrid Search for Keywords #50

Open
GregorBiswanger opened this issue May 18, 2024 · 2 comments
Open

[Feature Request] Hybrid Search for Keywords #50

GregorBiswanger opened this issue May 18, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@GregorBiswanger
Copy link

I have a feature request for the Vectra library. Currently, I am having difficulty obtaining accurate results from my data when searching for a single word occurrence. Other vector databases support a hybrid search that includes keyword searches to achieve more precise results.

Feature Request:
I would like to see a hybrid search implemented that includes both traditional vector search and keyword search. This would significantly improve the accuracy of search results, especially for data containing frequently occurring words.

Benefit:
Such a feature would enable more precise and relevant search results by combining the strengths of both search methods. This is particularly useful in cases where vector search alone is insufficient to find relevant results.

Examples:

  • When searching for the word "apple," I want to see results that are not only vector-similar but also explicitly contain the keyword "apple."
  • When searching for specific topics or terms that frequently appear in my data, the hybrid search would greatly improve relevance and accuracy.

Thank you for your great work on Vectra and for considering this feature request!

Cheers,
Gregor

@Stevenic Stevenic self-assigned this May 25, 2024
@Stevenic Stevenic added the enhancement New feature or request label May 25, 2024
@Stevenic
Copy link
Owner

I don't disagree but this is a pretty big add... It would mean adding a keyword index which would mean loading two indexes into memory. I've thought a lot about this myself and I've just been reluctant to add the additional complexity (and memory hit.) I'm open to ideas for simple ways to implement hybrid search that doesn't involve a big memory hit or add a lot of complexity.

@GaureeshAnvekar
Copy link

@Stevenic, thanks for the clarification! There's a way to integrate "BM-25" keyword matching algorithm. But we'll need to store the extracted text from documents/urls as well. We can always keep the hybrid search optional, both during indexing and querying. Looking into it currently. Thanks again for this cool repo!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants