Skip to content

Commit

Permalink
Added numerical-stability tests and updated test data for all ModelSe…
Browse files Browse the repository at this point in the history
…gments single-sample and multiple-sample modes. (#7652)

* Added an explicit check that minimum-total-allele-count-case is set to zero when running ModelSegments in matched-normal mode.

* Added exact-match tests and updated test data for ModelSegments single-sample modes.

* Added exact-match tests and updated test data for ModelSegments multiple-sample modes, including new tests of downstream single-sample scatters using joint segmentation.

* Relaxed exact match to comparison of doubles at delta of 1E-6 to account for Java 8 and Java 11 differences on Travis.

* Added a comment documenting numerical issues with log10factorial in AlleleFractionLikelihoods.

* Moved assertFilesEqualUpToAllowedDeltaForDoubleValues to CopyNumberTestUtils.

* Added toggle to update expected outputs.
  • Loading branch information
samuelklee authored Jun 25, 2022
1 parent a28dfff commit 8341ae5
Show file tree
Hide file tree
Showing 127 changed files with 12,793 additions and 1,158 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -746,6 +746,12 @@ private void setModesAndValidateArguments() {
"Must provide at least one denoised-copy-ratios file or allelic-counts file.");
Utils.validateArg(!(inputAllelicCountsFiles.isEmpty() && inputNormalAllelicCountsFile != null),
"Must provide an allelic-counts file for the case sample to run in matched-normal mode.");
Utils.validateArg(!(inputNormalAllelicCountsFile != null && genotypingArguments.minTotalAlleleCountCase > 0),
"The minimum total count for filtering allelic counts in case samples must be set to zero in matched-normal mode. " +
"If the effect of statistical noise due to low depth in case samples on segmentation is a concern, " +
"consider using only denoised copy ratios or externally preprocessing allelic-count files " +
"to remove sites that are poorly covered across all samples.");


runMode = (inputDenoisedCopyRatiosFiles.size() > 1 || inputAllelicCountsFiles.size() > 1)
? RunMode.MULTIPLE_SAMPLE
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,8 @@ static double hetLogLikelihood(final AlleleFractionGlobalParameters parameters,
- n * log(majorFraction + minorFraction * lambda0RefMinor);
final double refMinorLogLikelihood = logNotPi + logcRefMinor + Gamma.logGamma(rhoRefMinor) - rhoRefMinor * log(tauRefMinor);

// changing the factorial implementation below may introduce non-negligible numerical differences;
// note https://github.com/broadinstitute/gatk/pull/7652
final double outlierLogLikelihood = logPi + log10ToLog(log10Factorial(a) + log10Factorial(r) - log10Factorial(a + r + 1));

return NaturalLogUtils.logSumExp(altMinorLogLikelihood, refMinorLogLikelihood, outlierLogLikelihood);
Expand Down

Large diffs are not rendered by default.

Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Sample Chromosome Start End Num_Probes Segment_Mean
SM-74P4M-1 20 138125 62871232 827 0.372342
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Sample Chromosome Start End Num_Probes Segment_Mean
SM-74P4M-1 20 138125 62871232 0 NaN
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
@HD VN:1.6
@SQ SN:20 LN:63025520 UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta AS:GRCh37 M5:0dec9660ec1efaaf33281c0d5ea2560f SP:Homo Sapiens
@RG ID:GATKCopyNumber SM:SM-74P4M-1
CONTIG START END NUM_POINTS_COPY_RATIO MEAN_LOG2_COPY_RATIO
20 138125 62871232 0 NaN
Loading

0 comments on commit 8341ae5

Please sign in to comment.