You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In #88, @erikyao implemented a very useful new parameter for minimum_should_match. That allows users to submit a list of entities, and then return documents that match at least some number of the input queries.
Based on this comment from @erikyao, scoring is currently based only on the number of matched entities. I can imagine two ways of improving the scoring based on specificity.
Specificity of the query term: Consider the situation where the gene list in my query includes both TP53 (a very commonly-studied gene) and ANKRD37 (an almost completely-uncharacterized gene). Right now, a match to TP53 is scored the same as a match to ANKRD37, but matches to TP53 are much more common. It would be reasonable to weight matches to query terms differently based on how commonly they are found in the PFOCR dataset.
Specificity of the matched pathway: Consider the situation where we have two pathways that match the exact same query terms. Currently, those would be scored the same. But if one pathway overall has 100 genes and the second pathway has just 10 genes, then the second pathway is probably more relevant to the input query set, so it should be score higher.
The text was updated successfully, but these errors were encountered:
In #88, @erikyao implemented a very useful new parameter for
minimum_should_match
. That allows users to submit a list of entities, and then return documents that match at least some number of the input queries.Based on this comment from @erikyao, scoring is currently based only on the number of matched entities. I can imagine two ways of improving the scoring based on specificity.
The text was updated successfully, but these errors were encountered: