-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure SelectVariants outputs variants in correct order (assuming input vcf is correctly sorted) #6444
Conversation
…y sorted variants
…ection and allele trimming causes crash
@kachulis do you need a reviewer? |
@davidbenjamin yes that’d be great, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back to you, @kachulis. It's amusing to see a pending variants priority queue that is not so ill-fated as the one in LeftAlignAndTrimVariants
.
@@ -451,6 +451,8 @@ | |||
|
|||
private final Map<Integer, Integer> ploidyToNumberOfAlleles = new LinkedHashMap<Integer, Integer>(); | |||
|
|||
private PriorityQueue<VariantContext> pendingVariants; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make this final
by moving its constructor up to the declaration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -533,6 +537,13 @@ public void onTraversalStart() { | |||
@Override | |||
public void apply(VariantContext vc, ReadsContext readsContext, ReferenceContext ref, FeatureContext featureContext) { | |||
|
|||
/*check for pending variants to write out | |||
since variant starts will only be moved further right, we can write out a pending variant if the current variant start is after the pending variant start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving variants to the right may seem like a non sequitur to the average reader so specify that it can occur if {@code preserveAlleles}
is false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -617,11 +632,28 @@ public void apply(VariantContext vc, ReadsContext readsContext, ReferenceContext | |||
(!selectRandomFraction || Utils.getRandomGenerator().nextDouble() < fractionRandom)) { | |||
//remove annotations being dropped and write variantcontext | |||
final VariantContext variantContextToWrite = buildVariantContextWithDroppedAnnotationsRemoved(filteredGenotypeToNocall); | |||
vcfWriter.add(variantContextToWrite); | |||
if (variantContextToWrite.getStart() != vc.getStart()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just add every variant to pendingVariants
. This check is redundant with the while loop at the beginning of apply()
. It's true that written as is the priority queue is usually empty, but I don't think the optimization is worth the added complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you're right, I was trying to avoid doing too many comparisons in the priority queue, but since it is nearly always empty this is an essentially useless optimization.
} | ||
} | ||
} | ||
|
||
/** | ||
* write out all remaining pending variants | ||
* @return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need the return javadoc
@@ -85,6 +85,30 @@ public void testComplexSelection() throws IOException { | |||
spec.executeTest("testComplexSelection--" + testFile, this); | |||
} | |||
|
|||
@Test | |||
public void testUntrimmedVariants() throws IOException { | |||
final String testFile = getToolTestDataDir() + "untrimmed.vcf"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to be opinionated here and argue against using IntegrationTestSpec
. I think the intent of these two tests would be better expressed in code:
final ArgumentsBuilder args = new ArgumentsBuilder()
.addVcf(testFile)
.addOutput(output)
.addArgument(StandardArgumentDefinitions.SAMPLE_NAME_SHORT_NAME, "SAMPLE_01")
.addBoolean("preserve-alleles", false); // this may be overkill since it's the default
runCommandLine(args);
final List<VariantContext> vcs = VariantContextTestUtils.readEntireVCFIntoMemory(output).getRight();
Assert.assertTrue(Comparators.isInOrder(vcs, Comparator.comparingInt(VariantContext::getStart)));
The virtues I see in this approach are 1) Only one test file needs to be added/maintained. 2) Doesn't require exact match and therefore less brittle. 3) Intent is explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well argued, I'm convinced!
@@ -85,6 +85,30 @@ public void testComplexSelection() throws IOException { | |||
spec.executeTest("testComplexSelection--" + testFile, this); | |||
} | |||
|
|||
@Test | |||
public void testUntrimmedVariants() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test needs some javadoc explaining the potential output ordering pitfall of the input untrimmed variants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@davidbenjamin agreed, it took me a sec to convince myself that it's ok in this situation. back to you! |
Merge away once tests pass. |
Partially addresses #6443
Also addresses bug where combination of sample selection, allele trimming, and
--set-filtered-gt-to-nocall
would cause crash.Fix assumes input vcf is correctly sorted, which is not checked anywhere, but is required by vcf spec.