Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better error bars for samples with small contamination in CalculateContamination #7003

Merged
merged 3 commits into from
May 12, 2021

Conversation

davidbenjamin
Copy link
Contributor

@fleharty this is for you.

@gatk-bot
Copy link

Travis reported job failures from build 32382
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 32382.13 logs

Copy link
Contributor

@fleharty fleharty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice change, almost done, just a small ask on binary search.

final double contamination = contaminationOppositeDepth / totalDepthWeightedByOppositeFrequency;
final double contaminationEstimate = contaminationOppositeDepth / totalDepthWeightedByOppositeFrequency;

final double coeff1 = homs.stream().mapToDouble(ps -> oppositeAlleleFrequency.applyAsDouble(ps) * ps.getTotalCount()).sum();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you specify what equation this is constructing by putting the tex in javadoc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

final DoubleUnaryOperator errorFunc = c -> homs.isEmpty() ? 1 : Math.sqrt(coeff1*c*(1-c) + coeff2*c*c) / totalDepthWeightedByOppositeFrequency;

// we're going to binary search to find the largest contamination whose expected standard error brings it within range of
// our estimate. That is, suppose we estimate a contamination of 0.03 and the standard error of 0.05 is 0.02. Then 0.05 is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

standard error of 0.05

Could you make this clearer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// the upper end of our 1-sigma confidence interval.
// this binary search is far from optimized (in fact we could solve this explicitly with a messy closed-form solution)
// but it converges to a precision of 1e-6 in 20 iterations of the square root of a linear function. This is fast enough.
double top = 1.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you either find a standard binary search, or generalize this in MathUtils?

If you add it in MathUtils, create a simple test for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@davidbenjamin
Copy link
Contributor Author

Back to you @fleharty.

@fleharty
Copy link
Contributor

@davidbenjamin
Looks great.
👍

@davidbenjamin
Copy link
Contributor Author

@fleharty github wants your official approving review to merge.

@fleharty
Copy link
Contributor

@davidbenjamin Sorry about that, should work now.

@davidbenjamin davidbenjamin merged commit f408270 into master May 12, 2021
@davidbenjamin davidbenjamin deleted the db_contam_error_bars branch May 12, 2021 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants