Indel revamp #1824

pd3 · 2022-11-22T14:20:34Z

No description provided.

This is just an initial commit so that we can make incremental changes

- proper struct for pileup client data - flesh out the indel type finding routine. This is identical to the original code, the only difference is caching qlen in pileup client data

This version is able to make simple calls, correctly creates up to two reference consensus templates. The code is based on the original indel calling but does not contain the heuristics introduced later by jkb.

It is now possible to call from long reads. To do: consider allowing smaller than 2*110bp window for trimming ref and qry sequences in realignment. Currently it is possible to increase --indel-size but it is not possible to decrease it

Remove the heuristics introduced by e4e1610 which appears unnecessary and even harmful on documented test cases.

…an inheritance but are not novel

… strict (0)

…nd check return values when there are no indels types (-1 is returned)

The errors were: - pass correct position to cigar INS matching operator - set the correct offset in the consensus template

…ime when correcting haplotypes

by counting badly matching reads as low quality reference reads. In the original code reads matching badly to any indel type or reference had indelQ set to 0 where thus effectively removed from calling. This leads to problems when there are many soft clipped reads and a few good matching indel reads (see noisy-softclips.bam in mpileup-tests). Only the few good quality indel reads would become visible to the caller and the indel is called with high quality. This commit changes the logic to make the badly matching reads low quality reference reads instead of removing them completely. The threshold was set to make the test case still be called as an indel, but with very low quality.

… only alts were counted

…phred-scaled product of sample genotype likelihoods

…e.flag

…ror in the -std=gnu99 test

pd3 added 24 commits June 29, 2022 10:53

Skeleton for new mpileup indels, for now accessible with --indels-2.0

cbd5470

This is just an initial commit so that we can make incremental changes

Pileup client data and indel type finding

a828884

- proper struct for pileup client data - flesh out the indel type finding routine. This is identical to the original code, the only difference is caching qlen in pileup client data

Outline for read consensus creation

6b8eb04

First draft of read consensus creation functional

1e06536

Functional --indels-2.0 prototype

2240ff3

This version is able to make simple calls, correctly creates up to two reference consensus templates. The code is based on the original indel calling but does not contain the heuristics introduced later by jkb.

Long reads with --indels-2.0

cef69be

It is now possible to call from long reads. To do: consider allowing smaller than 2*110bp window for trimming ref and qry sequences in realignment. Currently it is possible to increase --indel-size but it is not possible to decrease it

Fix ref/qry sequence trimming for indel realignment

311486e

Remove the heuristics introduced by e4e1610 which appears unnecessary and even harmful on documented test cases.

Fix a bug, float arithmetics must be used, not int

61691ab

New --strictly-novel option to downplay alleles which violate Mendeli…

4d18e8a

…an inheritance but are not novel

Set --pn/--pns separately for SNVs and indels, make the indel default…

c4afbed

… strict (0)

Remove debugging asserts (using a temporary debug printout instead) a…

2985837

…nd check return values when there are no indels types (-1 is returned)

Remove unused cns_seq_t.pos array. Fix two index errors

f3b9ae6

The errors were: - pass correct position to cigar INS matching operator - set the correct offset in the consensus template

Fix a bug in recognizing the need to end the error correction

86330ed

Prevent segfault when no indels encountered; Add to 86330ed, end on t…

bc3a521

…ime when correcting haplotypes

Remove unused pos_seq array

ea15bcb

Consider all indel types one the site passed for indel evaluation

c46ca8a

Add an experimental INFO/NM annotation

e9d22b1

Split the new NM annotation (e9d22b1) into ref/alt counts, originally…

fdaf9c9

… only alts were counted

Experimental filtering annotation MIN_PL_SUM, roughly corresponds to …

bc635c0

…phred-scaled product of sample genotype likelihoods

Make most of the mpileup -a output tags optional

9b5be4b

Remember read's realn status in a clean way, not by misusing bam->cor…

47fe5b8

…e.flag

Resolved indel-revamp vs develop merge conflicts

4fbad53

Declare inline functions as static in the hope it fixes a compiler er…

321c0d0

…ror in the -std=gnu99 test

pd3 merged commit 9d948cf into develop Nov 22, 2022

pd3 deleted the indel-revamp branch November 22, 2022 17:01

pd3 mentioned this pull request Nov 23, 2022

bcftools call ignores deletion with high coverage #1459

Closed

jkbonfield mentioned this pull request May 30, 2023

bcftools v1.17 no longer finds known variants from v1.8 #1930

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indel revamp #1824

Indel revamp #1824

pd3 commented Nov 22, 2022

Indel revamp #1824

Indel revamp #1824

Conversation

pd3 commented Nov 22, 2022