Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Hybrid search using keyword matching and kNN #717

Closed
rhvaz opened this issue Jan 11, 2023 · 5 comments
Closed

[FEATURE] Hybrid search using keyword matching and kNN #717

rhvaz opened this issue Jan 11, 2023 · 5 comments
Assignees
Labels
enhancement Features Introduces a new unit of functionality that satisfies a requirement

Comments

@rhvaz
Copy link

rhvaz commented Jan 11, 2023

Is your feature request related to a problem?
I would like to search against multiple text fields and a kNN vector. Currently I can only filter the results based on keyword matching against the text fields and then re-rank them using vector metrics such as cosine similarity or l2 norm. This means I miss out on many relevant candidates.

What solution would you like?
This solution from ES https://www.elastic.co/guide/en/elasticsearch/reference/master/knn-search.html#_combine_approximate_knn_with_other_features

What alternatives have you considered?
I could do two distinct queries to the same index and add logic to rank the final results, but this is quite messy and not good for latency.

@navneet1v
Copy link
Collaborator

navneet1v commented Jan 11, 2023

@rhvaz
Hi,
Thanks for putting up the feature request and providing the example. Yes this feature is not present in OpenSearch as of now(I am editing my comment). But will look into it in more details so that I am not missing anything.

@navneet1v
Copy link
Collaborator

@rhvaz On doing some deep-dive on the proposed solution, the way ES has implement the K-NN search and the way OpenSearch has implemented the K-NN search is very different. The ES K-NN clause is outside the main query clause and they uses search type as DFS search type to first gather the results for K-NN Query at Coordinator Node level and then pass them along for text search at shard level.

But in OpenSearch, as we have implemented the K-NN clause inside the query the clause the full query object is passed to shards and filtering happens at shard level.

New Proposed Solution:
I am currently working on a generic solution, which solve the problems mentioned by you and also provided in this issue. opensearch-project/OpenSearch#4557. I will update the RFC once I have some concrete approach.

In the meanwhile please add some more details about your use case which will help us prioritizing the work.

@navneet1v navneet1v self-assigned this Jan 19, 2023
@navneet1v navneet1v added Features Introduces a new unit of functionality that satisfies a requirement and removed untriaged labels Jan 19, 2023
@rhvaz
Copy link
Author

rhvaz commented Jan 19, 2023

Sure @navneet1v! I would say one of the main use cases is to provide good search results even for small queries.

When a consumer starts typing the name of a product or category if we simply do semantic search using kNN the results are poor. If we combine that with keyword matching we can provide good recommendations for any query length.

I think this has applications in multiple e-commerce sites if people are looking to build a search product using OpenSearch.

@navneet1v
Copy link
Collaborator

navneet1v commented Jan 19, 2023

Hi @rhvaz thanks for the support.

Have you tried using this type of query:

{
 "query": {
    "bool": {
      "should": [
        {
          // text based query
        },
        {
          // k-NN query
        }
      ]
    }
  }
}

Example:

{
 "size": 10,
 "query": {
    "bool": {
      "should": [
        {
          "match": {
            "summary": "smart tv"
          }
        },
        {
          "knn": {
            "text_vector": {
              "vector": [2, 3, 5, 6],
              "k": 10,
              "_name": "knn"
            }
          }
        }
      ]
    }
  }
}

If yes, I want to know problem did you saw by using above query, where the scores of K-NN and text-match queries are getting combined via bool query at per shard level. Also as per your comments you can change the function score via sigmoid function. Link: opensearch-project/OpenSearch#4557 (comment)

@navneet1v
Copy link
Collaborator

Hi @rhvaz I am closing this issue as the work to do the hybrid search which includes:

  1. Combining score at Corpus level.
  2. Normalization Scores before combining the scores from 2 different queries.

is being taken care via this feature request. opensearch-project/neural-search#123 . I will add more details on that issue going forward. Please do a +1 on the feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Features Introduces a new unit of functionality that satisfies a requirement
Projects
None yet
Development

No branches or pull requests

2 participants