Replies: 3 comments 5 replies
-
Hello Boris, Thank you for your post and interest in Truvari. I’ve reviewed your code and believe it’s a reasonable approach. But I have hesitancies.
I understand the motivation behind this, and I believe you've implemented it well enough. In order to keep the tool simple, I've tried to keep the functionality minimal around "This is how comparison should be done". The Have a great day, |
Beta Was this translation helpful? Give feedback.
-
Thank you @ACEnglish, I appreciate your detailed reply and your point of view. I agree that these edge cases need to be addressed somehow. In my evaluations 10-20% of false negative calls arise because the matching variant in the comparison set is just outside of the high confidence region. Simply extending the regions by 500 bases is not the same as using the
So do you think one could introduce the Or alternatively one could think of adding a post-processing script flagging variants in the Many thanks and best wishes, |
Beta Was this translation helpful? Give feedback.
-
That proposal definitely solves most of my concerns. I think I'm becoming convinced. I'll review the code more to see if how you've implemented it is best. But something we'll still need to address is point number 5. When the So |
Beta Was this translation helpful? Give feedback.
-
I would like to suggest "softening" the boundaries of regions used to prefilter variants in
truvari bench
.Truvari bench
optionally takes as an argument a bed file of regions of interest (--includebed
argument). Only truth and comparison set variants that fully overlap the regions are considered for matching. As a result often a variant is called false positive or false negative when it would match a variant in the other set, but that match is just outside of the region. The typical situation where this problem is encountered is for example a comparison of structural variants called from HG002 long read data to the truth set of variants in high confidence regions described here.I am suggesting a strategy similar to the one implemented in
truvari bench
for size matching - only the truth set variants longer thansizemin
(equal to 50 by default) are considered, but they are allowed to match shorter comparison set variants of sizesizefilt
(30 by default) or longer.I suggest to introduce a parameter
extend
for extending the regions on each side. By default it is equal to therefdist
parameter. Variants that are not fully included in the extended regions are filtered out from the beginning. Then if one of the matching variants is fully covered by the original unextended region then the other is allowed a more relaxed overlap - if it is included in the extended region then both variants are called as true positive, if both matching variants are not fully covered by the original region then they are just skipped, same if the variant doesn't have a match and is not fully covered by the original region. To be counted as FP or FN the variant has to be fully included in the unextended region and should have no match in the extended region.The approach is implemented in the fork of Truvari here:
bnoyvert@6465454
I would like to submit a merge request, please let me know if you would consider the merge.
Thank you,
Boris
Beta Was this translation helpful? Give feedback.
All reactions