You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This makes the mention count 352,321 instead of the 352,496 documented, if removing these exact duplicates
There's also rows with the same spans referring to different UMLS concepts, which I don't see documented in the repo.
28548949 1809 1812 XPA T028 C1337030
28548949 1809 1812 XPA T116,T123 C1506534
Presumably if an entity linking model predicts either of these, it's marked as correct? Instead of marking the span twice and always being wrong in one of the two cases
Could this be caused by the annotation quality step?
The text was updated successfully, but these errors were encountered:
These duplicates are a result of the annotation quality. Exact duplicates can be accounted for in measurements by using sets as the basis, e.g. set of (start, end, concept-ID). Having same span refer to multiple entities will still cause a problem if the model assumes a single entity per span. It looks from your analysis that the number of these is quite small.
There are many entries that are duplicates (I count 175 on my parsed version of the dataset but manually confirmed the first few in the original text)
for example
26316050 783 790 filling T052 C0441655
26316050 783 790 filling T052 C0441655
27259326 525 529 AHCS T061 C0010408
27259326 525 529 AHCS T061 C0010408
27262362 730 738 increase T169 C0442805
27262362 730 738 increase T169 C0442805
This makes the mention count 352,321 instead of the 352,496 documented, if removing these exact duplicates
There's also rows with the same spans referring to different UMLS concepts, which I don't see documented in the repo.
28548949 1809 1812 XPA T028 C1337030
28548949 1809 1812 XPA T116,T123 C1506534
Presumably if an entity linking model predicts either of these, it's marked as correct? Instead of marking the span twice and always being wrong in one of the two cases
Could this be caused by the annotation quality step?
The text was updated successfully, but these errors were encountered: