Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplicate after merging lexical results #1161

Merged
merged 2 commits into from
Dec 8, 2023

Conversation

ggordonhall
Copy link
Contributor

No description provided.

@rmuller-ml
Copy link
Contributor

MMR over the merged results will affect the RRF (hybrid) ranking, because MMR it is basically ranking by cosine distance + diversity term so the lexical results will go to the bottom.

What exactly we are trying to add to the lexical results? Filter overlapping chunks? File path diversity? Prog. language diversity?

@ggordonhall
Copy link
Contributor Author

ggordonhall commented Dec 8, 2023

MMR over the merged results will affect the RRF (hybrid) ranking, because MMR it is basically ranking by cosine distance + diversity term so the lexical results will go to the bottom.

What exactly we are trying to add to the lexical results? Filter overlapping chunks? File path diversity? Prog. language diversity?

We're trying to ensure that we have path and lang diversity in the final result list.

@ggordonhall ggordonhall merged commit 9f1b7b3 into main Dec 8, 2023
1 check passed
@ggordonhall ggordonhall deleted the gabriel/code-search-parity branch December 8, 2023 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants