-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added regularization to covariance in GMM maximization step to fix convergence issues in VQSR. #7709
Conversation
612bcc3
to
9eb886c
Compare
…nvergence issues in VQSR.
9eb886c
to
d9a5468
Compare
d9a5468
to
5bf0578
Compare
...est/java/org/broadinstitute/hellbender/tools/walkers/vqsr/GatherTranchesIntegrationTest.java
Show resolved
Hide resolved
...st/resources/org/broadinstitute/hellbender/tools/walkers/VQSR/expected.AS.alternate.tranches
Show resolved
Hide resolved
...ava/org/broadinstitute/hellbender/tools/walkers/vqsr/VariantRecalibratorIntegrationTest.java
Show resolved
Hide resolved
@samuelklee I will build the docker and run the warp tests, but if we do this again I'm going to teach you to fish. Wasn't there a VariantRecalibrator integration test that had some sort of random seed hijinx? Hopefully your fix should mean we don't need any shenanigans anymore -- can you see if there are any other tests we can clean up? |
Thanks for the quick review, @ldgauthier! I don't think my fix will address any non-determinism in the integration tests. I'm inclined to just do better with the new tools---there does seem to be enough duct tape in the integration tests regarding re/setting the RNG so that the exact-match tests consistently pass. As for learning how to run the WARP tests, I think that would indeed be pretty useful---for anyone that might have to update code for VQSR or the new tools in the future! Can we teach everyone to fish? Isn't this what CARROT is for? |
See "smart tests" here: broadinstitute/warp#639 Should finish in maybe an hour and a half? Then I'll look at the VerifyNA12878 results. |
It's this I'm hoping we can take out: Line 229 in b097f75
Kevin has kindly volunteered to help me port the warp tests to Carrot, if I can ever make time. :-( |
Warp exome JG tests had slightly different metrics, so they failed, but the NA12878 results look like this: test truth (Over the exome) negligible change in SNPs, slight recall decrease in indels, which I would say is "in the noise" given how many truth indels are in the exome. |
Thanks @ldgauthier! That canary certainly looks alive to me. Happy to merge whenever you and/or Variants team approve. |
I will approve after you take a look at the integration test I linked above. |
Sorry, I guess I didn't see your edit pointing out that line of code. I've indeed looked at that test and more---there's a lot of similar duct tape and inconsistent resetting of the RNG throughout the entire test suite. But since I think we can reasonably assume that there's enough duct tape to make things deterministic overall, I don't think it's worth cleaning up the duct tape just to get neater, but equally deterministic behavior. (Or perhaps can you point to instances of persisting non-determinism, e.g. random failures in Travis?) In any case, I think it makes more sense to focus effort on making cleaner tests for the new tools, rather than make an 11th hour effort to revamp these existing tests. Do you agree? See e.g. #6112 for some related discussion. Also added a note mentioning that the original GATK3 expected results have been updated, although now looking back at the commit history, I'm not sure if that was already true. |
0985caa
to
0431b77
Compare
I just want to know if the need for that RNG draw is obviated by your convergence fix. If it is, then we can remove that line. If not, then we know there are still some issues yet to be addressed. |
As far as I can tell, even master passes when that line is removed. So the comments in the integration test are already misleading or out of date. Do you see differently? |
Just for future reference, note that comments in
So again, the tests were already "broken." But still, rather than attempt to fix them, I think it's best to follow the principle of not changing both production and test code to the extent that it is possible in this scenario. We've already updated enough exact-match expected results to make me a bit uncomfortable! Someone else may want to tackle fixing the tests in a separate push, but I think it makes sense for me to focus on avoiding these sorts of issues when writing tests for the new tools. EDIT: For the record, I confirmed that the undesired behavior in this test that the RNG hack was trying to avoid was fixed (and hence, the test was "broken") in #6425. Probably wasn't noticed because this is the only non-exact-match test and the test isn't strict enough to check that attempts 1-3 fail, it only checks that we succeed by attempt 4. Again, someone else may feel free to examine the actual coverage of this test and whether it's safe to remove it and/or clean up all the duct tape---but at some point, it becomes difficult to tell which pieces of duct tape are load bearing! |
A quick patch to help out the Variants team, which is struggling with a problematic callset.
Note that a similar regularization to the effective number per component probably should have been applied to solve the issue in #6425. I'm not sure if the lack of this regularization will still lead to convergence issues, but I would hope that the fix that was implemented instead (treating vanishing components as a special case and skipping computation) suffices. As discussed there, we may also want to eventually remove the idiosyncratic finalize step; it’s likely this is the source of issues here, since the correct Bayesian M step is already regularized by the prior.
The covariance regularization term added here is standard (c.f. e.g. https://github.com/scikit-learn/scikit-learn/blob/7e1e6d09bcc2eaeba98f7e737aac2ac782f0e5f1/sklearn/mixture/_gaussian_mixture.py#L154), but it may result in non-negligible changes to VQSLODs. As just discussed with the Variants team, we can probably use the WARP validation to convince ourselves that results are functionally equivalent.
I updated the exact-match tests without much close examination (by simply forcing IntegrationTestSpec.assertEqualTextFiles to overwrite the old expected files), so someone may want to sanity check them. There were also a few more interactions between the integration tests for different tools than I anticipated. Some tests use output generated by an upstream tool as input and break encapsulation.