Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize BM25 and TF-IDF scores #401

Closed
davidmezzetti opened this issue Dec 27, 2022 · 0 comments
Closed

Normalize BM25 and TF-IDF scores #401

davidmezzetti opened this issue Dec 27, 2022 · 0 comments
Assignees
Milestone

Comments

@davidmezzetti
Copy link
Member

BM25 and TF-IDF scores are unbounded. They will continue to grow as the match improves. This change adds a flag to normalize scores from 0 - 1.

This method will calculate an average score using the average document length, average frequency and average idf score. A max score will then be calculated as 4 * average score. Lastly, scores will be scaled between 0 to 1 using the range of 0 to max score.

Formulas for this shown below.

average = score(average frequency, average idf, average document length)
max = 4 * average
score = score / max
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant