You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @torshie, that's a great question -- I think the answer is not entirely clear in the community, but as a starting point you can study the thresholds used in literature (e.g. the gopher rules or the rules used in refined web). Studying data quality and data mixes is an active area of research and getting such an understanding is one of the core motivations behind RPv2.
After all quality signals are generated, what are the thresholds used to classify a document as good/bad for each quality signal ?
The text was updated successfully, but these errors were encountered: