-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix no data hom refs #8715
fix no data hom refs #8715
Conversation
Github actions tests reported job failures from actions build 8161782218
|
Github actions tests reported job failures from actions build 8164035876
|
@@ -306,7 +306,7 @@ public static void makeGenotypeCall(final int ploidy, | |||
final GenotypeAssignmentMethod assignmentMethod, | |||
final double[] genotypeLikelihoods, | |||
final List<Allele> allelesToUse, | |||
final List<Allele> originalGT, | |||
final Genotype originalGT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to change this method signature so that we could use the GQ info to distinguish hom-ref genotypes that we want to no-call from hom-refs that stay hom-refs because with reblocking there are no PLs to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add to method doc
@@ -588,7 +588,7 @@ public void testCombineReblockedGVCFs() { | |||
Assert.assertFalse(g0.hasPL()); | |||
Assert.assertFalse(g0.hasGQ()); | |||
Assert.assertFalse(g0.hasExtendedAttribute(VCFConstants.DEPTH_KEY)); | |||
Assert.assertTrue(g1.isHomRef()); | |||
Assert.assertTrue(g1.isNoCall()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a lot of these tests I was explicitly testing the behavior of the GQ0 hom-ref, but that's exactly what we're trying to change here
Assert.assertTrue(vc1.hasAttribute(GATKVCFConstants.INBREEDING_COEFFICIENT_KEY)); | ||
Assert.assertEquals(vc1.getAttributeAsInt(VCFConstants.ALLELE_NUMBER_KEY, 0), 362); | ||
Assert.assertEquals(vc1.getAttributeAsInt(VCFConstants.ALLELE_NUMBER_KEY, 0), 300); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes because of the new no-calls
final VariantContext vc1 = outputVCs.get(1); | ||
Assert.assertTrue(vc1.getAttributeAsDouble(GATKVCFConstants.EXCESS_HET_KEY, 1000.0) < 10.0); //will be ~72 if homRefs aren't counted | ||
Assert.assertTrue(vc1.getAttributeAsDouble(GATKVCFConstants.EXCESS_HET_KEY, 0.0) > 50.0); //will be ~72 if homRefs aren't counted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some hom-refs in previous test result go to no call, so InbreedingCoeff and ExcessHet also change
@@ -976,4 +982,39 @@ public void dbSNPError() { | |||
//make sure it doesn't throw an error | |||
runCommandLine(args); | |||
} | |||
|
|||
@Test | |||
public void testNoReadsOutputAsNoCall() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input data for this test comes from a GVCF produced from an empty region of one of the NA12878 test bams and a GVCF that was edited to have a variant so we can force that position to be output. (I should probably put this test description in the code.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some very minor documentation/comment requests. Without getting very deep into the weeds of those integration test annotation changes (which look fine as per your comments) i think this looks good on the surface.
There are two more abstract questions I will pose:
Do we want to preserve the ability for anybody ever to output the intermidate outputs? We could create a new mode in GenotypeAssignmentMethod for that and change this default one to use the new behavior to satisfy twitter...
The second one is just to ask if these are the only two places (GnarlyGenotyper and CombineGVCFs) that need to change? Its only at the combine step we are throwing these out so the genomicsDB path shouldn't be a problem no? Do we want HC outputting ./. genotypes?
@@ -306,7 +306,7 @@ public static void makeGenotypeCall(final int ploidy, | |||
final GenotypeAssignmentMethod assignmentMethod, | |||
final double[] genotypeLikelihoods, | |||
final List<Allele> allelesToUse, | |||
final List<Allele> originalGT, | |||
final Genotype originalGT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add to method doc
if (assignmentMethod == GenotypeAssignmentMethod.PREFER_PLS) { | ||
if (originalGT == null) { | ||
throw new IllegalArgumentException("original GT cannot be null if assignmentMethod is PREFER_PLS"); | ||
} else if (originalGT.hasGQ() && originalGT.getGQ() > 0){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see. this makes sense and was certainly better than my hack attempt to fix this... so its now "prefer_PLs EXCEPT for nocalls" Can you add a note about this in GenotypeAssignmentMethod
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to preserve the intermediate behavior at all? We could split PREFER_PLS to also have a mode that treats the 0/0s like they used to be.
@@ -158,6 +158,7 @@ public Object[][] gvcfsToGenotype() { | |||
|
|||
//23 highly multi-allelic sites across 54 1000G exomes to test allele subsetting and QUAL calculation | |||
//plus one 10-allele WGS variant that's all hom-ref with one GT that has unnormalized PLs from some sort of GenomicsDB corner case | |||
//this VCF still has the haploid-looking GDB no-calls as in sample NA21137 at position chr1:3836468 -- allegedly GATK 4.2.5.0 from February 7, 2022, possibly due to --call-genotypes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔍
@@ -804,6 +805,9 @@ public void testWithReblockedGVCF() { | |||
final List<VariantContext> actualVC = VariantContextTestUtils.getVariantContexts(output); | |||
Assert.assertFalse(actualVC.stream().anyMatch(vc -> vc.getGenotype(1).isHomRef() && vc.getGenotype(1).hasPL())); //second sample has a bunch of 0/0s -- shouldn't have PLs | |||
|
|||
//this comes from a callset of NYGC 1000G samples plus syndip | |||
//it seems likely that there's a variant that wasn't discovered in the graph because a bunch of samples are hom-ref with PL=[0,0,X] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a minor comment... but its possible to manufacture a sample with lots of assembly failures if you need to test this. You can set the kmer size to 1 and disallow stepped increases to get a sample that can assemble nothing.... Doesn't look necessary for you here though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, that is good to know.
@@ -976,4 +982,39 @@ public void dbSNPError() { | |||
//make sure it doesn't throw an error | |||
runCommandLine(args); | |||
} | |||
|
|||
@Test | |||
public void testNoReadsOutputAsNoCall() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
.../java/org/broadinstitute/hellbender/tools/walkers/gnarlyGenotyper/GnarlyGenotyperEngine.java
Show resolved
Hide resolved
src/test/java/org/broadinstitute/hellbender/tools/walkers/GenotypeGVCFsIntegrationTest.java
Outdated
Show resolved
Hide resolved
M2 WDL tests failed due to a transient 429 "too many requests" error -- re-running them. |
No description provided.