Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure SelectVariants outputs variants in correct order (assuming input vcf is correctly sorted) #6444

Merged
merged 6 commits into from
Feb 7, 2020

Conversation

kachulis
Copy link
Contributor

@kachulis kachulis commented Feb 6, 2020

Partially addresses #6443
Also addresses bug where combination of sample selection, allele trimming, and --set-filtered-gt-to-nocall would cause crash.
Fix assumes input vcf is correctly sorted, which is not checked anywhere, but is required by vcf spec.

@davidbenjamin
Copy link
Contributor

@kachulis do you need a reviewer?

@kachulis
Copy link
Contributor Author

kachulis commented Feb 7, 2020

@davidbenjamin yes that’d be great, thanks!

@davidbenjamin davidbenjamin self-assigned this Feb 7, 2020
@davidbenjamin davidbenjamin self-requested a review February 7, 2020 13:44
Copy link
Contributor

@davidbenjamin davidbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back to you, @kachulis. It's amusing to see a pending variants priority queue that is not so ill-fated as the one in LeftAlignAndTrimVariants.

@@ -451,6 +451,8 @@

private final Map<Integer, Integer> ploidyToNumberOfAlleles = new LinkedHashMap<Integer, Integer>();

private PriorityQueue<VariantContext> pendingVariants;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make this final by moving its constructor up to the declaration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -533,6 +537,13 @@ public void onTraversalStart() {
@Override
public void apply(VariantContext vc, ReadsContext readsContext, ReferenceContext ref, FeatureContext featureContext) {

/*check for pending variants to write out
since variant starts will only be moved further right, we can write out a pending variant if the current variant start is after the pending variant start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving variants to the right may seem like a non sequitur to the average reader so specify that it can occur if {@code preserveAlleles} is false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -617,11 +632,28 @@ public void apply(VariantContext vc, ReadsContext readsContext, ReferenceContext
(!selectRandomFraction || Utils.getRandomGenerator().nextDouble() < fractionRandom)) {
//remove annotations being dropped and write variantcontext
final VariantContext variantContextToWrite = buildVariantContextWithDroppedAnnotationsRemoved(filteredGenotypeToNocall);
vcfWriter.add(variantContextToWrite);
if (variantContextToWrite.getStart() != vc.getStart()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just add every variant to pendingVariants. This check is redundant with the while loop at the beginning of apply(). It's true that written as is the priority queue is usually empty, but I don't think the optimization is worth the added complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right, I was trying to avoid doing too many comparisons in the priority queue, but since it is nearly always empty this is an essentially useless optimization.

}
}
}

/**
* write out all remaining pending variants
* @return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need the return javadoc

@@ -85,6 +85,30 @@ public void testComplexSelection() throws IOException {
spec.executeTest("testComplexSelection--" + testFile, this);
}

@Test
public void testUntrimmedVariants() throws IOException {
final String testFile = getToolTestDataDir() + "untrimmed.vcf";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to be opinionated here and argue against using IntegrationTestSpec. I think the intent of these two tests would be better expressed in code:

final ArgumentsBuilder args = new ArgumentsBuilder()
   .addVcf(testFile)
   .addOutput(output)
   .addArgument(StandardArgumentDefinitions.SAMPLE_NAME_SHORT_NAME, "SAMPLE_01")
   .addBoolean("preserve-alleles", false); // this may be overkill since it's the default

runCommandLine(args);

final List<VariantContext> vcs = VariantContextTestUtils.readEntireVCFIntoMemory(output).getRight();

Assert.assertTrue(Comparators.isInOrder(vcs, Comparator.comparingInt(VariantContext::getStart)));

The virtues I see in this approach are 1) Only one test file needs to be added/maintained. 2) Doesn't require exact match and therefore less brittle. 3) Intent is explicit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well argued, I'm convinced!

@@ -85,6 +85,30 @@ public void testComplexSelection() throws IOException {
spec.executeTest("testComplexSelection--" + testFile, this);
}

@Test
public void testUntrimmedVariants() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test needs some javadoc explaining the potential output ordering pitfall of the input untrimmed variants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@kachulis
Copy link
Contributor Author

kachulis commented Feb 7, 2020

@davidbenjamin agreed, it took me a sec to convince myself that it's ok in this situation. back to you!

@davidbenjamin
Copy link
Contributor

Merge away once tests pass.

@kachulis kachulis merged commit 79e1e67 into master Feb 7, 2020
@kachulis kachulis deleted the ck_SelectVariantsOrderChange branch February 7, 2020 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants