Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indel revamp #1824

Merged
merged 24 commits into from
Nov 22, 2022
Merged

Indel revamp #1824

merged 24 commits into from
Nov 22, 2022

Conversation

pd3
Copy link
Member

@pd3 pd3 commented Nov 22, 2022

No description provided.

pd3 added 24 commits June 29, 2022 10:53
This is just an initial commit so that we can make incremental changes
- proper struct for pileup client data
- flesh out the indel type finding routine. This is identical to the original code,
  the only difference is caching qlen in pileup client data
This version is able to make simple calls, correctly creates up to two
reference consensus templates. The code is based on the original indel
calling but does not contain the heuristics introduced later by jkb.
It is now possible to call from long reads.

To do: consider allowing smaller than 2*110bp window for trimming
ref and qry sequences in realignment. Currently it is possible to
increase --indel-size but it is not possible to decrease it
Remove the heuristics introduced by e4e1610 which appears unnecessary
and even harmful on documented test cases.
…nd check return values when there are no indels types (-1 is returned)
The errors were:
- pass correct position to cigar INS matching operator
- set the correct offset in the consensus template
by counting badly matching reads as low quality reference reads.

In the original code reads matching badly to any indel type or reference had indelQ set to 0 where thus
effectively removed from calling. This leads to problems when there are many soft clipped reads and a few good
matching indel reads (see noisy-softclips.bam in mpileup-tests). Only the few good quality indel reads would
become visible to the caller and the indel is called with high quality. This commit changes the logic to make
the badly matching reads low quality reference reads instead of removing them completely. The threshold was
set to make the test case still be called as an indel, but with very low quality.
…phred-scaled product of sample genotype likelihoods
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant