-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Nerd standoff #37
Comments
Each tool has its dedicated usage and should not be used for another purpose:
|
Sure, but so far we're taking the first paragraphs (not necessarily the title and the abstract) |
We were taking the first paragraphs just because if time constraints for the demo last year! We should take the whole for the NERD… I thought I changed it at some point to take the whole document. NERD is not weighting the concepts in term of significance, it's grobid-keyterm which is doing that using various distributional information. NERD is disambiguating locally and try to disambiguate all mentions. We can set a different threashold while indexing NERD annotations for instance if we want to improve precision but there will always be some noise at this level. The point is that for semantic search it's the accumulation of the matches that set the scores (tf/idf or BM25) so it should be robust to noise from a ranking perspective. It is a bit difference with the query disambiguation maybe - less context and more sensitive to noise. Currently the pruning threasholds are the same, but it could be refine based on experiments depending on the mode of usage… For the facets, concepts and categories from the keyterm annotator make more sense than NERD annotations because there are already a selection of the key aspect of a document. |
Actually, we select the first paragraphs, but it would be more fruitful to calculate the most significant concepts rather that pick them randomly.
The text was updated successfully, but these errors were encountered: