-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds VcfComparator tool #8933
Adds VcfComparator tool #8933
Conversation
post-reblocking when many deletion "cleanup" changes occur
Github actions tests reported job failures from actions build 10169605080
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code I gave you to start with is very much gradware that we used to be able to hide in private classes in GATK3, but that era is over. Maybe @droazen will let us create a different walker category to hide it in?
@@ -469,6 +469,13 @@ static List<Genotype> cleanupGenotypeAnnotations(final VariantContext vc, final | |||
builder.AD(AD); | |||
} | |||
|
|||
//convert GQ0s that were reblocked back to no-calls for better AN and InbreedingCoeff annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like there shouldn't be any edits in this class on this branch if the rebase went properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* | ||
* Client tools must implement apply(List<VariantContext> variantContexts, ReferenceContext referenceContext) | ||
*/ | ||
public abstract class MultiVariantWalkerGroupedByOverlap extends MultiVariantWalker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not gonna lie, I have no memory of where this class comes from. Apparently this was two years ago, so maybe it shouldn't be a surprise. I think it's a modified version of MultiVariantWalkerGroupedOnStart. I suspect we needed it because I wanted the star alleles to be returned with their deletions? I just talked myself into keeping this, although we probably don't need it for this exact implementation if you turned off the spandel checks.
@@ -83,6 +85,16 @@ public static boolean isAlleleSpecific(final VariantAnnotation annotation) { | |||
return annotation instanceof AlleleSpecificAnnotation; | |||
} | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be refactored to use the method below: isAlleleSpecificGatkKey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/main/java/org/broadinstitute/hellbender/tools/walkers/variantutils/ReblockGVCF.java
Show resolved
Hide resolved
} | ||
|
||
@Test(dataProvider = "getTestFiles") | ||
public void testAnnotationDifferences(String args, String expectedWarnings) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test that puts a huge delta on the VQSLOD (such that -4.8149 and -3.2178 from the existing test are acceptable)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see an argument to adjust that threshold for VQSLOD currently (am I just missing it?). I see tolerances for other attributes though. Should I add one for VQSLOD and AS_VQSLOD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess if that arg isn't there then we don't need to test it. :-/ I mostly want to demonstrate usage of the inexact match that's already there. I guess you could manually edit a QUAL score.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha, I added a QUAL test
} | ||
} | ||
|
||
private void validateSingleSampleDeletions(final VariantContext vc, final VariantContext match, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is specific to a particular GATK version update, but I can't remember which one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would imagine that future updates will need adjustments to this tool to account for expected differences. Are we hoping that those changes get committed into master each time? Or is it ok to leave in some of these examples for future users to model their own after?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine leaving old stuff, but it would probably be good to commit whatever we end up using for WARP test comparisons
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I'll add a note about this to the "how to update gatk in warp" doc.
throw new UserException("Alleles are mismatched at " + actual.getContig() + ":" + actual.getStart() + ": actual has " | ||
+ actual.getAlternateAlleles() + " and expected has " + expected.getAlternateAlleles()); | ||
|
||
} else if (allowNewStars && hasNewStar(actual, expected)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should be !allowNewStars...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is hard to write a test for without some example input, but I agree with you so I changed it here. If you know of an example VCF with some changed star alleles let me know.
continue; | ||
} | ||
final Object actualValue = actual.get(key); | ||
if (expectedValue instanceof List && actualValue instanceof List || key.equals(GATKVCFConstants.AS_SB_TABLE_KEY)) { //|| (key.contains("RAW") joint in OR with AS_SB_TABLE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember what this comment means -- we can probably remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
for (int i = 0; i < iterationEnd; i++) { | ||
if (i >= actualList.size() || i >= expectedList.size()) { | ||
//TODO: the toStrings here don't do a great job | ||
//throw makeVariantExceptionFromDifference(key, actualValue.toString(), expectedValue.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this should throw, but maybe I commented it out for testing for some reason...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done (for now at least)
} else { | ||
alleleBalance = ad[1] / ((double) ad[0] + ad[1]); | ||
} | ||
if (alleleBalance >= 0.2 && alleleBalance <= 0.8) { //TODO: this isn't super robust |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why this TODO. This is what the gnomAD criteria state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
@ldgauthier Well, there is always the |
@ldgauthier back to you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine leaving this as an ExperimentalFeature with no refactor. I worry a little about there being logic in there to ignore certain differences by default. If you could just do a quick skim with that in mind the I think this is good enough to merge for now.
} | ||
|
||
@Test(dataProvider = "getTestFiles") | ||
public void testAnnotationDifferences(String args, String expectedWarnings) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess if that arg isn't there then we don't need to test it. :-/ I mostly want to demonstrate usage of the inexact match that's already there. I guess you could manually edit a QUAL score.
} | ||
} | ||
|
||
private void validateSingleSampleDeletions(final VariantContext vc, final VariantContext match, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine leaving old stuff, but it would probably be good to commit whatever we end up using for WARP test comparisons
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked in WARP and everywhere that we call this tool we also check that the VCFs are identical with diff (after removing the header). So I think it's ok if we're skipping some differences by default with this tool since the raw diff will catch them. I also skimmed the tool to see what we're skipping and other than some of the defaults for QUAL and inbreedingCoeff, I think most checks are fairly robust.
} | ||
|
||
@Test(dataProvider = "getTestFiles") | ||
public void testAnnotationDifferences(String args, String expectedWarnings) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha, I added a QUAL test
} | ||
} | ||
|
||
private void validateSingleSampleDeletions(final VariantContext vc, final VariantContext match, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I'll add a note about this to the "how to update gatk in warp" doc.
Tool is useful for WARP testing so that expected differences can be removed from the check.