Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrote leftAlignIndels code -- it now always works, even for multiple indels #6427

Merged
merged 2 commits into from
Feb 9, 2020

Conversation

davidbenjamin
Copy link
Contributor

@kachulis I hope you find this entertaining.

Closes #150.

Closes #4919.

Also fixes a bug we didn't know about where LeftAlignAndTrimVariants didn't do anything when --dont-trim was set.

Also fixes a bug where LeftAlignAndTrimVariants couldn't left align an indel past a SNP.

Also fixes a bug where LeftAlignAndTrimVariants didn't left align multiallelic variants when splitting was turned off.

Probably fixes a few other bugs since the old code failed a bunch of my unit tests.

Copy link
Contributor

@kachulis kachulis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha, what could be more exciting than some left aligning!

Some initial comments. I generally like where this is going. The changes to AlignmentUtils are great; being able to left align cigars with multiple indels or mutliallelic sites is a major improvement.

I do think the treatment of multiple nearby variants in LeftAlignAndTrimVariants needs to be handled more carefully. Simply left aligning each variant separately and then placing them wherever they fall will introduce problems. For example, left aligning a deletion past a snp, without adjusting the snp's location, will change the represented haplotype. And I can construct scenarios where left aligning an insertion past a snp leaves it impossible to recover the initially represented haplotype no matter how I attempt to adjust the downstream snp.

While left aligning a single variant record on its own is well defined, I don't think the same is true for a set of variants in each others presence. In the end, I'm not convinced that simply refusing to left align past earlier variants isn't the best solution. Probably merging abutting variants at that point is reasonable. (I think this is effectively what AlignmentUtils.LeftAlignIndels is now doing). But I think attempting to left align past an upstream variant raises too many pitfalls and potential representation problems make it a good option.

@@ -69,6 +70,9 @@ public static OptionalDouble getReadPosition(final GATKRead read, final int refL
final int leadingHardClips = firstElement.getOperator() == CigarOperator.HARD_CLIP ? firstElement.getLength() : 0;
final int trailingHardClips = lastElement.getOperator() == CigarOperator.HARD_CLIP ? lastElement.getLength() : 0;

if (offset >= cigar.getReadLength()) {
return OptionalDouble.empty();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an unrelated change, did you intend to include in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's intentional. A test was failing because correct left-alignment exposed an edge case. I'm about to issue a PR related to ReadPosRankSumTest that will re-do this code.

final Pair<Integer, Integer> shifts = leftAlignAlleles(Arrays.asList(ref, read), Arrays.asList(refIndelRange, readIndelRange), maxShift);

// account for new match alignments on the right due to left-alignment
resultRightToLeft.add(new CigarElement(shifts.getRight(), CigarOperator.MATCH_OR_MISMATCH));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use element.getOperator() (like you do below) instead of CigarOperator.MATCH_OR_MISMATCH to avoid throwing out information

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's different here because below we're re-emitting a shorter version of a cigar element that already existed. Here we're creating something new resulting from the left-alignment.

@davidbenjamin
Copy link
Contributor Author

@kachulis back to you

Copy link
Contributor

@kachulis kachulis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I think there is just a small bug in LeftAlignAndTrimVariants::leftAlignAndTrim. I think it's an easy fix, and it would be nice to have a test of the bug-related case it as well.

return vc;
}

// get the indel length
final int indelLength = Math.abs(vc.getIndelLengths().get(0));
for(int leadingBases = Math.min(maxLeadingBases, 10); leadingBases <= maxLeadingBases; leadingBases *= 2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is a bug here (quite possibly originating prior to this PR). If maxLeadingBases is greater than 10, but not a multiple of 10, there can be problems. Suppose maxLeadingBases=15, and the indel should be left aligned that full amount. leadingBases will initialize to 10, and the indel will left align the full 10 bases. leadingBases will then double to 20, the for loop condition will be false, and the variant won't be shifted left at all. I think this is easy to fix, just needs a slight adjustment to the control flow logic.

Also would be nice to have a test of this type of case, to ensure it is handled correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@davidbenjamin
Copy link
Contributor Author

@kachulis Back to you. I fixed the bug and wrote a test for it. I also put in a commit replacing some trimming code with the new normalization method.

Copy link
Contributor

@kachulis kachulis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidbenjamin Looks good! One very minor suggestion you can accept or ignore as you prefer.

@@ -279,7 +279,8 @@ static VariantContext leftAlignAndTrim(final VariantContext vc, final ReferenceC
return vc;
}

for(int leadingBases = Math.min(maxLeadingBases, 10); leadingBases <= maxLeadingBases; leadingBases *= 2) {

for(int leadingBases = Math.min(maxLeadingBases, 10); leadingBases <= maxLeadingBases; leadingBases = Math.min(2*leadingBases, maxLeadingBases)) {
final int refStart = Math.max(vc.getStart() - Math.min(leadingBases, maxLeadingBases), 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need the Math.min here, since it's the condition in the for loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

@gatk-bot
Copy link

gatk-bot commented Feb 7, 2020

Travis reported job failures from build 29034
Failures in the following jobs:

Test Type JDK Job ID Logs
variantcalling openjdk8 29034.4 logs
variantcalling openjdk8 29034.4 logs

@gatk-bot
Copy link

gatk-bot commented Feb 7, 2020

Travis reported job failures from build 29042
Failures in the following jobs:

Test Type JDK Job ID Logs
variantcalling openjdk8 29042.4 logs
variantcalling openjdk8 29042.4 logs

@davidbenjamin davidbenjamin merged commit 8612288 into master Feb 9, 2020
@davidbenjamin davidbenjamin deleted the db_left_align branch February 9, 2020 00:41
jamesemery added a commit that referenced this pull request Mar 25, 2020
… multiple indels (#6427)"

This reverts commit 8612288. Which brakes the softclip aware cigar building code and has a horrible bug.
jamesemery added a commit that referenced this pull request Jun 1, 2020
… multiple indels (#6427)"

This reverts commit 8612288. Which brakes the softclip aware cigar building code and has a horrible bug.
jamesemery added a commit that referenced this pull request Sep 28, 2020
… multiple indels (#6427)"

This reverts commit 8612288. Which brakes the softclip aware cigar building code and has a horrible bug.
droazen pushed a commit that referenced this pull request Sep 30, 2020
DRAGstr first working approximation to matlab scripts

fist go at rebasing everything correctly, now have to deal with the buggy refactor code

Revert "Rewrote leftAlignIndels code -- it now always works, even for multiple indels (#6427)"

This reverts commit 8612288. Which brakes the softclip aware cigar building code and has a horrible bug.

bringing modernity to this branch

added a consistency test for DRAGEN mode

a bunch of code cleanup

more cleanup and tests?

Added the matlab files as test

Strengthened trust in SoftclipBases Behavior for overlap base quality adjustments

Fixing a crash in alleleliklehoods and adjusting bq annotation after the quality for overlapping bases change

minor fix and expanded debug output

removing excess bases before hmm scoring seems to improve the resutlts at complex sites... the FRD issue is still relevant...

Fixed genotype posterior calculator to not honor the unphase permutation number factor

more bugfixes to read edge cases at sites with many low quality ends and low mapping quality combined

including reads that only overlap in hardclips (despite the obvious concerns with that)

fixing an unexpected error

flipping

hardclippingsoftclippedBases

creating a frankenbranch with a very intersting change involving realignment of reads and apparent DRAGEN behavior

fixing a bug in 'getAllVariationEvents()' code

flipping the values so it actually works

Changed BQD feather end calculation to account for indels in the read relative to the reference

reverted the non-realignment code

fixing the horrible regression I just wrought on the BQD calculations...

disabling bad debug squack

Parameterizing the option to use original base qualities

threading the truly original alignment through for BQD/FRD to match the bad dragen behavior we are now aware of

cleaning up the branch

if you like me then you should have put a pin in it

responding to reveiw comments, (still havent fixed the debug streams)

deleting a test that was for local files

fixing a silly bug

fixing the logic yet one more time

fixing an issue with the disable-spanning-event-genotyping code that was causing false positivles (or perhaps reasonable behavior)

jumping the gun on takutos fix because it was causing me trouble for this test

refactored the debug output to be something very simple that should/could/maybe is better handled by making a debug level in the debugger. Anyway this is fine for now and probably could be expanded to handle the other debug streams in HC

adding a scary error message to deter people from using contamination downsampling with BQD/FRD (it should be an exception but somebody would complain and it really isn't a sound thing to do unless the disqualified reads are included but they can't be inherently because they don't have scores)

partial val review response work checkpoint
:

responded to most comments for my code in val PR review

Fixed a couple of rare bugs in DRAGEN port code.
Now we can reads dragstr-params values out of gcloud storage directly.

taking the changes from valentin with the removed OQ tag

pulling out the --use-original-alignments-for-genotyping-overlap as if it were so many weeds in my garden.

Revision changes thus far

Fix issues with how we assign STR period and repeat lenghts to positions on a read.
Added more robust testing using Illumina's Matlab script to generate use cases.

Impose a maximum of 40 for the GOP in DRAGstr pairhmm as seem to be the case in DRAGEN.

Changes to meet recommendation except for the param estimation part that is on a separate branch

reverting a mistake where i replaced every every instance of the word 'exists' with 'isEnabled'

Unifying changes with the refactoring of the ReadsDataSource class

A few updates on the revision:

* Solved a couple of self-evaluation javadoc PR comments.
* Added a integration test for ComposeSTRTableFile based on the DRAGEN output on some of the mini references under src/test/resources/large
* Fix an issue that made ComposeSTRTableFile quite slow with reference with a very large number of contigs (eg. funcotator gencode trascripts ~100_000).
* Found and fixed a bug introduced during refactoring making EstimateDragstrParameters produce "garbage" if run in parallel.
* Silience a tedious debug message in SAMReaderQueryingIterator that comes out too often when the traversal intervals are just a few. Now it requires
          like a 1000+ intervals to produce that debug, might be still too low but it works in practice.

Solves james'  comment on DragstrParam toString deficiency

fixed a few dangling issues with the rebase

added the argument to end all arguments (and also completely remove the users ability to interact with our toolkit as they please)

Fixes bug reported by Cwhelan

Addresses some of James' comments and fixes a bug in a unit test for DragstrLocus class

Fixes Dragstr Read Str analyzer failing unit test

allowing nulls in the genotyper because its actually okay
jamesemery added a commit that referenced this pull request Oct 8, 2020
…even for multiple indels (#6427)""

This reverts commit 3347761. which hopefully works better now and will be somewhat functional
jamesemery added a commit that referenced this pull request Oct 8, 2020
…uggy refactor code

Revert "Rewrote leftAlignIndels code -- it now always works, even for multiple indels (#6427)"

This reverts commit 8612288. Which brakes the softclip aware cigar building code and has a horrible bug.

bringing modernity to this branch

added a consistency test for DRAGEN mode

a bunch of code cleanup

more cleanup and tests?

Added the matlab files as test

Strengthened trust in SoftclipBases Behavior for overlap base quality adjustments

Fixing a crash in alleleliklehoods and adjusting bq annotation after the quality for overlapping bases change

minor fix and expanded debug output

removing excess bases before hmm scoring seems to improve the resutlts at complex sites... the FRD issue is still relevant...

Fixed genotype posterior calculator to not honor the unphase permutation number factor

more bugfixes to read edge cases at sites with many low quality ends and low mapping quality combined

including reads that only overlap in hardclips (despite the obvious concerns with that)

fixing an unexpected error

flipping

hardclippingsoftclippedBases

creating a frankenbranch with a very intersting change involving realignment of reads and apparent DRAGEN behavior

fixing a bug in 'getAllVariationEvents()' code

flipping the values so it actually works

Changed BQD feather end calculation to account for indels in the read relative to the reference

reverted the non-realignment code

fixing the horrible regression I just wrought on the BQD calculations...

disabling bad debug squack

Parameterizing the option to use original base qualities

threading the truly original alignment through for BQD/FRD to match the bad dragen behavior we are now aware of

cleaning up the branch

if you like me then you should have put a pin in it

responding to reveiw comments, (still havent fixed the debug streams)

deleting a test that was for local files

fixing a silly bug

fixing the logic yet one more time

fixing an issue with the disable-spanning-event-genotyping code that was causing false positivles (or perhaps reasonable behavior)

jumping the gun on takutos fix because it was causing me trouble for this test

refactored the debug output to be something very simple that should/could/maybe is better handled by making a debug level in the debugger. Anyway this is fine for now and probably could be expanded to handle the other debug streams in HC

adding a scary error message to deter people from using contamination downsampling with BQD/FRD (it should be an exception but somebody would complain and it really isn't a sound thing to do unless the disqualified reads are included but they can't be inherently because they don't have scores)

partial val review response work checkpoint
:

responded to most comments for my code in val PR review

Fixed a couple of rare bugs in DRAGEN port code.
Now we can reads dragstr-params values out of gcloud storage directly.

taking the changes from valentin with the removed OQ tag

pulling out the --use-original-alignments-for-genotyping-overlap as if it were so many weeds in my garden.

Revision changes thus far

Fix issues with how we assign STR period and repeat lenghts to positions on a read.
Added more robust testing using Illumina's Matlab script to generate use cases.

Impose a maximum of 40 for the GOP in DRAGstr pairhmm as seem to be the case in DRAGEN.

Changes to meet recommendation except for the param estimation part that is on a separate branch

reverting a mistake where i replaced every every instance of the word 'exists' with 'isEnabled'

Unifying changes with the refactoring of the ReadsDataSource class

A few updates on the revision:

* Solved a couple of self-evaluation javadoc PR comments.
* Added a integration test for ComposeSTRTableFile based on the DRAGEN output on some of the mini references under src/test/resources/large
* Fix an issue that made ComposeSTRTableFile quite slow with reference with a very large number of contigs (eg. funcotator gencode trascripts ~100_000).
* Found and fixed a bug introduced during refactoring making EstimateDragstrParameters produce "garbage" if run in parallel.
* Silience a tedious debug message in SAMReaderQueryingIterator that comes out too often when the traversal intervals are just a few. Now it requires
          like a 1000+ intervals to produce that debug, might be still too low but it works in practice.

Solves james'  comment on DragstrParam toString deficiency

fixed a few dangling issues with the rebase

added the argument to end all arguments (and also completely remove the users ability to interact with our toolkit as they please)

Fixes bug reported by Cwhelan

Addresses some of James' comments and fixes a bug in a unit test for DragstrLocus class

Fixes Dragstr Read Str analyzer failing unit test

allowing nulls in the genotyper because its actually okay

Fixed a bug causing the wrong priors to be taken (indel's instead of snp's in snp sites)
Reverted a couple of changes in DragstrPairHMMInputScoreImputator to make it match better
the corresponding HC integration-tests

fixing a test i broke

adding a test for the easy dragen calling mode

Roll forward some of the rolled back changes Dragstr PairHMM code:

* Limit GOP to 40 max.
* byte-round rather than byte-truncate gop, gcp scores

I Address most outstanding review comments

Fix treatment of star alleles in use genotype posteriors for qual mode (#6856)

* use genotype posteriors for qual spanning deletion fix
jamesemery added a commit that referenced this pull request Oct 8, 2020
…even for multiple indels (#6427)""

This reverts commit 3347761. which hopefully works better now and will be somewhat functional
jamesemery added a commit that referenced this pull request Oct 14, 2020
…uggy refactor code

Revert "Rewrote leftAlignIndels code -- it now always works, even for multiple indels (#6427)"

This reverts commit 8612288. Which brakes the softclip aware cigar building code and has a horrible bug.

bringing modernity to this branch

added a consistency test for DRAGEN mode

a bunch of code cleanup

more cleanup and tests?

Added the matlab files as test

Strengthened trust in SoftclipBases Behavior for overlap base quality adjustments

Fixing a crash in alleleliklehoods and adjusting bq annotation after the quality for overlapping bases change

minor fix and expanded debug output

removing excess bases before hmm scoring seems to improve the resutlts at complex sites... the FRD issue is still relevant...

Fixed genotype posterior calculator to not honor the unphase permutation number factor

more bugfixes to read edge cases at sites with many low quality ends and low mapping quality combined

including reads that only overlap in hardclips (despite the obvious concerns with that)

fixing an unexpected error

flipping

hardclippingsoftclippedBases

creating a frankenbranch with a very intersting change involving realignment of reads and apparent DRAGEN behavior

fixing a bug in 'getAllVariationEvents()' code

flipping the values so it actually works

Changed BQD feather end calculation to account for indels in the read relative to the reference

reverted the non-realignment code

fixing the horrible regression I just wrought on the BQD calculations...

disabling bad debug squack

Parameterizing the option to use original base qualities

threading the truly original alignment through for BQD/FRD to match the bad dragen behavior we are now aware of

cleaning up the branch

if you like me then you should have put a pin in it

responding to reveiw comments, (still havent fixed the debug streams)

deleting a test that was for local files

fixing a silly bug

fixing the logic yet one more time

fixing an issue with the disable-spanning-event-genotyping code that was causing false positivles (or perhaps reasonable behavior)

jumping the gun on takutos fix because it was causing me trouble for this test

refactored the debug output to be something very simple that should/could/maybe is better handled by making a debug level in the debugger. Anyway this is fine for now and probably could be expanded to handle the other debug streams in HC

adding a scary error message to deter people from using contamination downsampling with BQD/FRD (it should be an exception but somebody would complain and it really isn't a sound thing to do unless the disqualified reads are included but they can't be inherently because they don't have scores)

partial val review response work checkpoint
:

responded to most comments for my code in val PR review

Fixed a couple of rare bugs in DRAGEN port code.
Now we can reads dragstr-params values out of gcloud storage directly.

taking the changes from valentin with the removed OQ tag

pulling out the --use-original-alignments-for-genotyping-overlap as if it were so many weeds in my garden.

Revision changes thus far

Fix issues with how we assign STR period and repeat lenghts to positions on a read.
Added more robust testing using Illumina's Matlab script to generate use cases.

Impose a maximum of 40 for the GOP in DRAGstr pairhmm as seem to be the case in DRAGEN.

Changes to meet recommendation except for the param estimation part that is on a separate branch

reverting a mistake where i replaced every every instance of the word 'exists' with 'isEnabled'

Unifying changes with the refactoring of the ReadsDataSource class

A few updates on the revision:

* Solved a couple of self-evaluation javadoc PR comments.
* Added a integration test for ComposeSTRTableFile based on the DRAGEN output on some of the mini references under src/test/resources/large
* Fix an issue that made ComposeSTRTableFile quite slow with reference with a very large number of contigs (eg. funcotator gencode trascripts ~100_000).
* Found and fixed a bug introduced during refactoring making EstimateDragstrParameters produce "garbage" if run in parallel.
* Silience a tedious debug message in SAMReaderQueryingIterator that comes out too often when the traversal intervals are just a few. Now it requires
          like a 1000+ intervals to produce that debug, might be still too low but it works in practice.

Solves james'  comment on DragstrParam toString deficiency

fixed a few dangling issues with the rebase

added the argument to end all arguments (and also completely remove the users ability to interact with our toolkit as they please)

Fixes bug reported by Cwhelan

Addresses some of James' comments and fixes a bug in a unit test for DragstrLocus class

Fixes Dragstr Read Str analyzer failing unit test

allowing nulls in the genotyper because its actually okay

Fixed a bug causing the wrong priors to be taken (indel's instead of snp's in snp sites)
Reverted a couple of changes in DragstrPairHMMInputScoreImputator to make it match better
the corresponding HC integration-tests

fixing a test i broke

adding a test for the easy dragen calling mode

Roll forward some of the rolled back changes Dragstr PairHMM code:

* Limit GOP to 40 max.
* byte-round rather than byte-truncate gop, gcp scores

I Address most outstanding review comments

Fix treatment of star alleles in use genotype posteriors for qual mode (#6856)

* use genotype posteriors for qual spanning deletion fix
jamesemery added a commit that referenced this pull request Oct 14, 2020
…even for multiple indels (#6427)""

This reverts commit 3347761. which hopefully works better now and will be somewhat functional
jamesemery added a commit that referenced this pull request Oct 15, 2020
…uggy refactor code

Revert "Rewrote leftAlignIndels code -- it now always works, even for multiple indels (#6427)"

This reverts commit 8612288. Which brakes the softclip aware cigar building code and has a horrible bug.

bringing modernity to this branch

added a consistency test for DRAGEN mode

a bunch of code cleanup

more cleanup and tests?

Added the matlab files as test

Strengthened trust in SoftclipBases Behavior for overlap base quality adjustments

Fixing a crash in alleleliklehoods and adjusting bq annotation after the quality for overlapping bases change

minor fix and expanded debug output

removing excess bases before hmm scoring seems to improve the resutlts at complex sites... the FRD issue is still relevant...

Fixed genotype posterior calculator to not honor the unphase permutation number factor

more bugfixes to read edge cases at sites with many low quality ends and low mapping quality combined

including reads that only overlap in hardclips (despite the obvious concerns with that)

fixing an unexpected error

flipping

hardclippingsoftclippedBases

creating a frankenbranch with a very intersting change involving realignment of reads and apparent DRAGEN behavior

fixing a bug in 'getAllVariationEvents()' code

flipping the values so it actually works

Changed BQD feather end calculation to account for indels in the read relative to the reference

reverted the non-realignment code

fixing the horrible regression I just wrought on the BQD calculations...

disabling bad debug squack

Parameterizing the option to use original base qualities

threading the truly original alignment through for BQD/FRD to match the bad dragen behavior we are now aware of

cleaning up the branch

if you like me then you should have put a pin in it

responding to reveiw comments, (still havent fixed the debug streams)

deleting a test that was for local files

fixing a silly bug

fixing the logic yet one more time

fixing an issue with the disable-spanning-event-genotyping code that was causing false positivles (or perhaps reasonable behavior)

jumping the gun on takutos fix because it was causing me trouble for this test

refactored the debug output to be something very simple that should/could/maybe is better handled by making a debug level in the debugger. Anyway this is fine for now and probably could be expanded to handle the other debug streams in HC

adding a scary error message to deter people from using contamination downsampling with BQD/FRD (it should be an exception but somebody would complain and it really isn't a sound thing to do unless the disqualified reads are included but they can't be inherently because they don't have scores)

partial val review response work checkpoint
:

responded to most comments for my code in val PR review

Fixed a couple of rare bugs in DRAGEN port code.
Now we can reads dragstr-params values out of gcloud storage directly.

taking the changes from valentin with the removed OQ tag

pulling out the --use-original-alignments-for-genotyping-overlap as if it were so many weeds in my garden.

Revision changes thus far

Fix issues with how we assign STR period and repeat lenghts to positions on a read.
Added more robust testing using Illumina's Matlab script to generate use cases.

Impose a maximum of 40 for the GOP in DRAGstr pairhmm as seem to be the case in DRAGEN.

Changes to meet recommendation except for the param estimation part that is on a separate branch

reverting a mistake where i replaced every every instance of the word 'exists' with 'isEnabled'

Unifying changes with the refactoring of the ReadsDataSource class

A few updates on the revision:

* Solved a couple of self-evaluation javadoc PR comments.
* Added a integration test for ComposeSTRTableFile based on the DRAGEN output on some of the mini references under src/test/resources/large
* Fix an issue that made ComposeSTRTableFile quite slow with reference with a very large number of contigs (eg. funcotator gencode trascripts ~100_000).
* Found and fixed a bug introduced during refactoring making EstimateDragstrParameters produce "garbage" if run in parallel.
* Silience a tedious debug message in SAMReaderQueryingIterator that comes out too often when the traversal intervals are just a few. Now it requires
          like a 1000+ intervals to produce that debug, might be still too low but it works in practice.

Solves james'  comment on DragstrParam toString deficiency

fixed a few dangling issues with the rebase

added the argument to end all arguments (and also completely remove the users ability to interact with our toolkit as they please)

Fixes bug reported by Cwhelan

Addresses some of James' comments and fixes a bug in a unit test for DragstrLocus class

Fixes Dragstr Read Str analyzer failing unit test

allowing nulls in the genotyper because its actually okay

Fixed a bug causing the wrong priors to be taken (indel's instead of snp's in snp sites)
Reverted a couple of changes in DragstrPairHMMInputScoreImputator to make it match better
the corresponding HC integration-tests

fixing a test i broke

adding a test for the easy dragen calling mode

Roll forward some of the rolled back changes Dragstr PairHMM code:

* Limit GOP to 40 max.
* byte-round rather than byte-truncate gop, gcp scores

I Address most outstanding review comments

Fix treatment of star alleles in use genotype posteriors for qual mode (#6856)

* use genotype posteriors for qual spanning deletion fix
jamesemery added a commit that referenced this pull request Oct 15, 2020
…even for multiple indels (#6427)""

This reverts commit 3347761. which hopefully works better now and will be somewhat functional
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

leftAlignIndel only works for reads with one indel add tests for LeftAlignIndels
3 participants