Better error bars for samples with small contamination in CalculateContamination #7003

davidbenjamin · 2020-12-16T16:34:54Z

@fleharty this is for you.

gatk-bot · 2020-12-16T17:15:01Z

Travis reported job failures from build 32382
Failures in the following jobs:

Test Type	JDK	Job ID	Logs
unit	openjdk11	32382.13	logs

fleharty

Very nice change, almost done, just a small ask on binary search.

fleharty · 2021-01-28T18:29:06Z

src/main/java/org/broadinstitute/hellbender/tools/walkers/contamination/ContaminationModel.java

-        final double contamination = contaminationOppositeDepth / totalDepthWeightedByOppositeFrequency;
+        final double contaminationEstimate = contaminationOppositeDepth / totalDepthWeightedByOppositeFrequency;
+
+        final double coeff1 = homs.stream().mapToDouble(ps -> oppositeAlleleFrequency.applyAsDouble(ps) * ps.getTotalCount()).sum();


Could you specify what equation this is constructing by putting the tex in javadoc?

fleharty · 2021-01-28T18:31:19Z

src/main/java/org/broadinstitute/hellbender/tools/walkers/contamination/ContaminationModel.java

+        final DoubleUnaryOperator errorFunc = c -> homs.isEmpty() ? 1 : Math.sqrt(coeff1*c*(1-c) + coeff2*c*c) / totalDepthWeightedByOppositeFrequency;
+
+        // we're going to binary search to find the largest contamination whose expected standard error brings it within range of
+        // our estimate.  That is, suppose we estimate a contamination of 0.03 and the standard error of 0.05 is 0.02.  Then 0.05 is


standard error of 0.05

Could you make this clearer?

fleharty · 2021-01-28T18:34:00Z

src/main/java/org/broadinstitute/hellbender/tools/walkers/contamination/ContaminationModel.java

+        // the upper end of our 1-sigma confidence interval.
+        // this binary search is far from optimized (in fact we could solve this explicitly with a messy closed-form solution)
+        // but it converges to a precision of 1e-6 in 20 iterations of the square root of a linear function.  This is fast enough.
+        double top = 1.0;


Could you either find a standard binary search, or generalize this in MathUtils?

If you add it in MathUtils, create a simple test for it.

…ery small

davidbenjamin · 2021-02-24T05:03:42Z

Back to you @fleharty.

fleharty · 2021-05-11T17:05:00Z

@davidbenjamin
Looks great.
👍

davidbenjamin · 2021-05-12T15:00:39Z

@fleharty github wants your official approving review to merge.

fleharty · 2021-05-12T15:29:25Z

@davidbenjamin Sorry about that, should work now.

davidbenjamin added the Mutect label Dec 16, 2020

davidbenjamin requested a review from fleharty December 16, 2020 16:34

davidbenjamin assigned fleharty Dec 16, 2020

fleharty reviewed Jan 28, 2021

View reviewed changes

davidbenjamin added 2 commits February 24, 2021 00:00

improved error bars in CalculateContamination when contamination is v…

59c2147

…ery small

integration tests

d037588

davidbenjamin force-pushed the db_contam_error_bars branch from c442c90 to 3b5fd8a Compare February 24, 2021 05:01

edits

6dc8638

davidbenjamin force-pushed the db_contam_error_bars branch from 3b5fd8a to 6dc8638 Compare February 24, 2021 05:03

davidbenjamin mentioned this pull request May 5, 2021

CalculateContamination bug report #7177

Open

2 tasks

fleharty approved these changes May 12, 2021

View reviewed changes

davidbenjamin merged commit f408270 into master May 12, 2021

davidbenjamin deleted the db_contam_error_bars branch May 12, 2021 17:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better error bars for samples with small contamination in CalculateContamination #7003

Better error bars for samples with small contamination in CalculateContamination #7003

davidbenjamin commented Dec 16, 2020

gatk-bot commented Dec 16, 2020

fleharty left a comment

fleharty Jan 28, 2021

davidbenjamin Feb 23, 2021

fleharty Jan 28, 2021

davidbenjamin Feb 23, 2021

fleharty Jan 28, 2021

davidbenjamin Feb 24, 2021

davidbenjamin commented Feb 24, 2021

fleharty commented May 11, 2021

davidbenjamin commented May 12, 2021

fleharty commented May 12, 2021

Better error bars for samples with small contamination in CalculateContamination #7003

Better error bars for samples with small contamination in CalculateContamination #7003

Conversation

davidbenjamin commented Dec 16, 2020

gatk-bot commented Dec 16, 2020

fleharty left a comment

Choose a reason for hiding this comment

fleharty Jan 28, 2021

Choose a reason for hiding this comment

davidbenjamin Feb 23, 2021

Choose a reason for hiding this comment

fleharty Jan 28, 2021

Choose a reason for hiding this comment

davidbenjamin Feb 23, 2021

Choose a reason for hiding this comment

fleharty Jan 28, 2021

Choose a reason for hiding this comment

davidbenjamin Feb 24, 2021

Choose a reason for hiding this comment

davidbenjamin commented Feb 24, 2021

fleharty commented May 11, 2021

davidbenjamin commented May 12, 2021

fleharty commented May 12, 2021