Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thresholds for all quality signals #92

Open
torshie opened this issue Dec 19, 2023 · 2 comments
Open

Thresholds for all quality signals #92

torshie opened this issue Dec 19, 2023 · 2 comments

Comments

@torshie
Copy link

torshie commented Dec 19, 2023

After all quality signals are generated, what are the thresholds used to classify a document as good/bad for each quality signal ?

@mauriceweber
Copy link
Collaborator

Hi @torshie, that's a great question -- I think the answer is not entirely clear in the community, but as a starting point you can study the thresholds used in literature (e.g. the gopher rules or the rules used in refined web). Studying data quality and data mixes is an active area of research and getting such an understanding is one of the core motivations behind RPv2.

@ZhenweiAn
Copy link

I am eager to know the reference thresholds of all signals too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants