-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better error bars for samples with small contamination in CalculateContamination #7003
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice change, almost done, just a small ask on binary search.
final double contamination = contaminationOppositeDepth / totalDepthWeightedByOppositeFrequency; | ||
final double contaminationEstimate = contaminationOppositeDepth / totalDepthWeightedByOppositeFrequency; | ||
|
||
final double coeff1 = homs.stream().mapToDouble(ps -> oppositeAlleleFrequency.applyAsDouble(ps) * ps.getTotalCount()).sum(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you specify what equation this is constructing by putting the tex in javadoc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
final DoubleUnaryOperator errorFunc = c -> homs.isEmpty() ? 1 : Math.sqrt(coeff1*c*(1-c) + coeff2*c*c) / totalDepthWeightedByOppositeFrequency; | ||
|
||
// we're going to binary search to find the largest contamination whose expected standard error brings it within range of | ||
// our estimate. That is, suppose we estimate a contamination of 0.03 and the standard error of 0.05 is 0.02. Then 0.05 is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
standard error of 0.05
Could you make this clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// the upper end of our 1-sigma confidence interval. | ||
// this binary search is far from optimized (in fact we could solve this explicitly with a messy closed-form solution) | ||
// but it converges to a precision of 1e-6 in 20 iterations of the square root of a linear function. This is fast enough. | ||
double top = 1.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you either find a standard binary search, or generalize this in MathUtils?
If you add it in MathUtils, create a simple test for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
c442c90
to
3b5fd8a
Compare
3b5fd8a
to
6dc8638
Compare
Back to you @fleharty. |
@davidbenjamin |
@fleharty github wants your official approving review to merge. |
@davidbenjamin Sorry about that, should work now. |
@fleharty this is for you.