Sf mito fix #4751

lucidtronix · 2018-05-09T15:01:04Z

Small bugfix to prevent crashes on the ends of the mitochondrial contig.

codecov-io · 2018-05-09T16:31:59Z

Codecov Report

Merging #4751 into master will increase coverage by 0.117%.
The diff coverage is n/a.

@@              Coverage Diff               @@
##             master     #4751       +/-   ##
==============================================
+ Coverage     80.07%   80.187%   +0.117%     
- Complexity    17420     18035      +615     
==============================================
  Files          1080      1083        +3     
  Lines         63131     66108     +2977     
  Branches      10200     10906      +706     
==============================================
+ Hits          50549     53010     +2461     
- Misses         8590      8963      +373     
- Partials       3992      4135      +143

Impacted Files	Coverage Δ	Complexity Δ
...pleNovelAdjacencyAndChimericAlignmentEvidence.java	`62.963% <0%> (-24.537%)`	`14% <0%> (+4%)`
...overy/inference/NovelAdjacencyAndAltHaplotype.java	`75.556% <0%> (-3.81%)`	`46% <0%> (+17%)`
...iscoverFromLocalAssemblyContigAlignmentsSpark.java	`76.048% <0%> (-0.794%)`	`5% <0%> (+3%)`
...lbender/tools/spark/sv/discovery/SimpleSVType.java	`86% <0%> (-0.667%)`	`3% <0%> (ø)`
.../discovery/inference/ImpreciseVariantDetector.java	`80.952% <0%> (-0.298%)`	`6% <0%> (ø)`
...ols/spark/sv/discovery/alignment/StrandSwitch.java	`100% <0%> (ø)`	`2% <0%> (+1%)`	⬆️
.../tools/spark/sv/discovery/BreakEndVariantType.java	`0% <0%> (ø)`	`0% <0%> (ø)`	⬇️
...ools/spark/sv/evidence/ExtractSVEvidenceSpark.java	`0% <0%> (ø)`	`0% <0%> (ø)`	⬇️
...e/hellbender/tools/spark/sv/utils/SVVCFReader.java	`0% <0%> (ø)`	`0% <0%> (ø)`	⬇️
...ce/AssemblyContigAlignmentSignatureClassifier.java	`83.871% <0%> (ø)`	`40% <0%> (?)`
... and 38 more

cmnbroad

Question inline for Sam.

cmnbroad · 2018-05-10T16:44:10Z

src/main/python/org/broadinstitute/hellbender/vqsr_cnn/vqsr_cnn/inference.py

@@ -103,6 +103,8 @@ def reference_string_to_tensor(reference):
            dna_data[i, defines.DNA_SYMBOLS[b]] = 1.0
        elif b in defines.AMBIGUITY_CODES:
            dna_data[i] = defines.AMBIGUITY_CODES[b]
+        elif b == '\x00':


@lucidtronix - question on this - it seems like this issue (zeros in the ref bases) can happen whenever the position of a variant is within windowsize/2 of the end of the reference contig, since thats what causes the tool to pad out the ref base string. This makes me wonder if the reference tensor is being constructed correctly when a SNP occurs near the edges of a contig - normally the reference base that corresponds to the variant base in question is in the middle of the read tensor, and padded out on each side. But not so at the edges, where it will be offset. Is that expected, or should we be padding out from the middle on both sides of the base ? I think the answer to that will determine whether its ok to just break out at the first 0.

Also, if this really isn't mito-specific, I don't think we need to add a mito reference for the test case - we might be able to get away with creating a test vcf that uses an existing reference, as long as it has a variant near the edge of a contig.

Yes, you're right variants at the very beginning (i.e. within 64bp of the start) of contigs are not correctly constructed. Since these variants are probably only going to show up in humans in the mitochondria, their CNN scores are pretty meaningless anyway. After chatting with @ldgauthier we think it's best to get this fix in now to prevent crashes, and then revisit if necessary when mitochondrial best practices are ready.

I removed the mitochondrial reference files and updated the test. Back to you @cmnbroad

cmnbroad

One minor request then this looks good.

cmnbroad · 2018-05-14T18:17:37Z

...t/java/org/broadinstitute/hellbender/tools/walkers/vqsr/CNNScoreVariantsIntegrationTest.java

-    public void testOnMitochondria() throws IOException{
-        final String mitoVcf = largeFileTestDir + "VQSR/errorM.vcf";
+    public void testOnContigEdge() throws IOException{
+        final String edgeVcf = largeFileTestDir + "VQSR/errorM.vcf";


Ok, sounds good. I should have mentioned this last time - can you name this file something distinguishing, like "variantNearContigEdge.vcf" or some such thing, and out of the large folder and into the regular test files, since its small - maybe even into a CNN test folder. Otherwise this looks good.

@lucidtronix Also you can squash this down to one commit.

Moved and renamed test file, rebased and squashed if tests pass good to merge?

cmnbroad

One more change I didn't notice before. Then 👍to merge once tests pass.

cmnbroad · 2018-05-14T19:59:35Z

...t/java/org/broadinstitute/hellbender/tools/walkers/vqsr/CNNScoreVariantsIntegrationTest.java

+        final ArgumentsBuilder argsBuilder = new ArgumentsBuilder();
+        argsBuilder.addArgument(StandardArgumentDefinitions.VARIANT_LONG_NAME, edgeVcf)
+                .addArgument(StandardArgumentDefinitions.REFERENCE_LONG_NAME, hg19MiniReference)
+                .addArgument("architecture", architecture1D)


@lucidtronix I just noticed this test is using 1d, which doesn't exercise the changed code path.

I think the users who encountered this error were running the 1D model, but either way both models will call the reference_string_to_tensor function in inference.py where the null character check was added.

Oh right, I was thinking reads, but this was reference.

lucidtronix requested a review from cmnbroad May 9, 2018 15:01

droazen assigned cmnbroad May 9, 2018

cmnbroad reviewed May 10, 2018

View reviewed changes

cmnbroad mentioned this pull request May 14, 2018

[CNNScoreVariants] ValueError('Error! Unknown code:', '\x00') #4727

Closed

cmnbroad requested changes May 14, 2018

View reviewed changes

lucidtronix force-pushed the sf_mito_fix branch from 5bca112 to 4309a50 Compare May 14, 2018 19:11

bugfix for variants on contig edges

4309a50

cmnbroad reviewed May 14, 2018

View reviewed changes

cmnbroad approved these changes May 14, 2018

View reviewed changes

cmnbroad merged commit 2d24a01 into master May 14, 2018

cmnbroad deleted the sf_mito_fix branch May 14, 2018 22:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sf mito fix #4751

Sf mito fix #4751

lucidtronix commented May 9, 2018

codecov-io commented May 9, 2018 •

edited

Loading

cmnbroad left a comment

cmnbroad May 10, 2018

lucidtronix May 14, 2018

cmnbroad left a comment

cmnbroad May 14, 2018

cmnbroad May 14, 2018

lucidtronix May 14, 2018

cmnbroad left a comment

cmnbroad May 14, 2018

lucidtronix May 14, 2018

cmnbroad May 14, 2018

Sf mito fix #4751

Sf mito fix #4751

Conversation

lucidtronix commented May 9, 2018

codecov-io commented May 9, 2018 • edited Loading

Codecov Report

cmnbroad left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmnbroad left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmnbroad left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented May 9, 2018 •

edited

Loading