Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make mpileup's overlap removal choose a random sequence. #1273

Merged
merged 1 commit into from
Apr 27, 2021

Conversation

jkbonfield
Copy link
Contributor

Currently it always chooses the second sequence (except for the circumstance of differing base calls). This is essentially random strand and random coordinate in most library strategies, but some targetted sequencing methods have a very strong strand bias (first is + strand, second is - strand) or positional bias (eg PCR amplicons). Given SNPs near the end of sequences can give rise to poor BAQ scores, both position and strand bias are detrimental.

This change makes it select either read 'a' or 'b' based on a hash of the read name. Unlike using a traditional random number generator, this gives it consistent behaviour regardless of how many sequences have gone before.

An example from SynDip region 1:185M-200M:

No overlap removal:
SNP          Q>0 /   Filtered
SNP   TP   18830 /     18803
SNP   FP     264 /       238
SNP   GT      56 /        53
SNP   FN     459 /       486

InDel TP    2788 /      2697
InDel FP    1022 /        86
InDel GT     353 /       345
InDel FN     596 /       687

Old removal strategy:
SNP          Q>0 /   Filtered
SNP   TP   18841 /     18813
SNP   FP     270 /       243
SNP   GT      56 /        54
SNP   FN     448 /       476

InDel TP    2754 /      2663
InDel FP     985 /        83
InDel GT     413 /       404
InDel FN     630 /       721

This PR:
SNP          Q>0 /   Filtered
SNP   TP   18841 /     18814
SNP   FP     272 /       242
SNP   GT      55 /        53
SNP   FN     448 /       475

InDel TP    2765 /      2679
InDel FP     996 /        85
InDel GT     382 /       375
InDel FN     619 /       705

The CPU cost on bcftools mpileup | bcftools call between the latter two tests was 0.4% (which may also just be random fluctuation). Vs the old removal system, this is a marginal improvement for SNPs and, oddly, a significant improvement to Indels. (It's still behind no overlap removal for indels, but I'm unsure on the veracity of small indels in that truth set).

Fixes samtools/bcftools#1459

Note this commit also has some formatting changes that I made at the same time as I was finding it hard to read the old code. Included in those are the removal of the debugging output too.

Currently it always chooses the second sequence (except for the
circumstance of differing base calls).  This is essentially random
strand and random coordinate in most library strategies, but some
targetted sequencing methods have a very strong strand bias (first is
+ strand, second is - strand) or positional bias (eg PCR amplicons).
Given SNPs near the end of sequences can give rise to poor BAQ scores,
both position and strand bias are detrimental.

This change makes it select either read 'a' or 'b' based on a hash of
the read name.  Unlike using a traditional random number generator,
this gives it consistent behaviour regardless of how many sequences
have gone before.

An example from SynDip region 1:185M-200M:

No overlap removal:
SNP          Q>0 /   Filtered
SNP   TP   18830 /     18803
SNP   FP     264 /       238
SNP   GT      56 /        53
SNP   FN     459 /       486

InDel TP    2788 /      2697
InDel FP    1022 /        86
InDel GT     353 /       345
InDel FN     596 /       687

Old removal strategy:
SNP          Q>0 /   Filtered
SNP   TP   18841 /     18813
SNP   FP     270 /       243
SNP   GT      56 /        54
SNP   FN     448 /       476

InDel TP    2754 /      2663
InDel FP     985 /        83
InDel GT     413 /       404
InDel FN     630 /       721

This PR:
SNP          Q>0 /   Filtered
SNP   TP   18841 /     18814
SNP   FP     272 /       242
SNP   GT      55 /        53
SNP   FN     448 /       475

InDel TP    2765 /      2679
InDel FP     996 /        85
InDel GT     382 /       375
InDel FN     619 /       705

The CPU cost on bcftools mpileup | bcftools call between the latter
two tests was 0.4% (which may also just be random fluctuation).
Vs the old removal system, this is a marginal improvement for SNPs
and, oddly, a significant improvement to Indels.  (It's still behind
no overlap removal for indels, but I'm unsure on the veracity of
small indels in that truth set).

Fixes samtools/bcftools#1459
@jkbonfield
Copy link
Contributor Author

Also not to read too much into the absolute numbers in the above reports. Comparing the same data to samtools/bcftools#1363 shows it's dwarfed by other changes.

SNPs:

SNP          Q>0 /   Filtered
SNP   TP   18841 /     18814
SNP   FP     272 /       242
SNP   GT      55 /        53
SNP   FN     448 /       475

>>> to
SNP          Q>0 /   Filtered
SNP   TP   18978 /     18956
SNP   FP     279 /       253  +/+
SNP   GT      49 /        47  -/-
SNP   FN     311 /       333 --/--

Indels:

InDel TP    2765 /      2679
InDel FP     996 /        85
InDel GT     382 /       375
InDel FN     619 /       705

>>> to
InDel TP    2976 /      2950
InDel FP     117 /        95 ----/+
InDel GT      99 /        93  ---/---
InDel FN     408 /       434  ---/---

@jmarshall
Copy link
Member

This PR and merge have rolled back the submodule.

clrpackages pushed a commit to clearlinux-pkgs/bcftools that referenced this pull request Jul 10, 2024
Andrew Whitwham (8):
      Summer 2021 release copyright updates.
      Updated the version year to 2022.
      CentOS, Alpine Linux and Windows additions.
      Added error checks.
      Error comments changed on review.
      Another error message alteration.
      Stop treating multiple CIGAR match operations as indels.
      Switched back to openssl for Alpine.

Colin Nolan (1):
      Fix bug where check-ploidy reports wrong chrom

Dr. K. D. Murray (1):
      concat: don't always error when setting --verbose

Dr. K.D. Murray (2):
      bcftools +fill-tag: include F_MISSING in --tag all
      better comment

Gert Hulselmans (1):
      Fix grammar in bcftools gtcheck help

James Bonfield (45):
      Improve mpileup speed by delaying BAQ calculation.
      Make a new calc_mwu_biasZ with s.d. normalised Z score.
      Add an indel-bias argument to mpileup.
      Add indel BQBZ, MQBZ, RPBZ and SCBZ INFO fields.
      Improve the indel INFO filtering metric, plus big fix to GT accuracy.
      Addition of STR finder.
      Refactor bcf_call_gap_prep.
      Tidy up STR finder and fix iscore calc.
      bam2bcf_indel tidyups.
      Rewrite get_position to not parse entire CIGAR.
      Improve calling for PacBio CCS reads.
      Remove the second probaln_glocal call from the indel caller.
      Adjust min/max qual for SNP caller.
      Fix for REF/ALT indel calculation.
      Add mpileup -U option for old MWU scale.
      Make partial-BAQ the default in mpileup.
      Correct the mpileup --config 1.12 parameter set.
      Add a --seed option to mpileup.
      Correct mpileup --ambig-reads documentation.
      Change the MWU test so MWU 1 or MWU-Z 0 is returned for variance=0.
      Recommend "icc -fp-model precise" if doing "make check".
      A minor speed up to bcftools merge.
      Further speed up vcf merge.
      Improve merge_vcf() error reporting, following review.
      Add --write-index to bcftools view
      Add error checking to gvcf_write_block
      Add a new indel caller.
      Add an mpileup --poly-mqual option for the "edlib" mode.
      - Enable the "edlib mode" for the more recent mpileup -X profiles.
      Recompute IDV and IMF if mpileup -a AD is set.
      Fix the indelQ assignment for multi-allelic indel sites.
      - Rename mpileup --edlib to --indels-cns
      Fix a clang16 warning on bit-field overflow.
      Add some scripts for evaluation bcftools mpileup.
      Fix a trivial memory leak in the mpileup usage (etc) code
      Don't change indel qual when indelQ == 0.
      Tweak the condition for calling a 2nd consensus sequence in a deletion.
      Fix REF indel calls for multi-sample calling.
      Make --write-index have an optional =FMT option.
      Make bcftools isec --write-index=FMT apply to isec directory output too
      Fix +scatter -n so it honours --write-index
      Fix a bug in expression parsing for `type=INDEL` with missing quotes.
      Add documentation on the optional =FMT bit of --write-index.
      Add short option -W for --write-index.
      Fix csq error message for now fasta file.

John Marshall (16):
      ax_with_htslib.m4: Support separate HTSlib build directory
      Include <strings.h> directly where needed [minor]
      Add `bcftools view --with-header` option
      Reformat command list so *.1 output is aligned
      Fix Makefile dependencies [minor]
      Add new bcftools head command
      Fix Makefile dependencies; correct VCF data files [minor]
      Add --regions/--targets-overlap pos|record|variant mnemonic options
      Preserve MISSING when copying filter values to INFO and FORMAT
      Remove unused <zlib.h> inclusions
      Add `bcftools index --stats --all` option to display all contigs
      Print unknown stats as '.' rather than 0
      Move includes needed by gff.c (not by gff.h) to gff.c
      Fix Makefile dependencies
      Print uint32_t values via PRIu32 rather than %d [minor]
      Remove -specs=...redhat-hardened... from Perl options

Michael Hall (1):
      add missing M option in getopt call

Nils Homer (1):
      bcftools sort has duplicate -O in the usage

Petr Danecek (344):
      New --QR-QA-to-QS option. Also, capitalize the other options as well, however keeping the lowercase functional
      Support for VCFs with more than 4 ALTs
      Improve diagnostics and be flexible in the use of GT/PL
      Remove a redundant condition (a typo) and rename variable for clarity
      Fix language
      New --use-NAIVE option for a naive DNM calling based solely on FORMAT/GT.
      Atomization of AD and QS updates correctly occurrences of duplicate alleles
      Remove annotations with incorrect number of fields
      Fix silly typo made in 2bfe824b
      Atomization should not discard ALT=. records
      Fix and update documentation, PBINOM should be uppercase
      Support for matching annotation line also by ID,
      Always generate sites.txt with isec -p. Resolves #1462
      Fix behavior to match the documentation,
      Switch from a2x to asciidoctor; much faster and less clumsy
      Fix Type=Flag output in `norm --atomize`. Resolves #1472
      Update mpileup differences caused by randomized overlap removal in samtools/htslib#1273
      Minor usage clarification
      Update tests and documentation
      Update README.md
      Fill AF from AC/AN when FMT/GT not present
      Fix arithmetic underflow, incorrect access to uninitialized memory.
      Fix bug in VAF calculation, hom DNMs were incorrectly set to 50%
      Replace "concat --rm-dup none" with "--rm-dup exact". Resolves samtools/bcftools#1089 (comment)
      Fix a typo. Thanks to Rick Wertenbroek, PR #1485
      Update REF len in BCF records when all INFO tags are removed with `annotate -x INFO`. Fixes #1483
      Pool soft-clipped reads together regardless of it being on red, indel, or snv reads
      Prevent segfaul, throw an error. See also #1477
      Add `reheader -T` option and unify temp file creation across all commands that use it. Resolves #1497
      Update documentation
      Possible fix of #1499
      Allow setting of custom genotypes by +setGT
      Parametrize brief-predictions parameter (-b) of bcftools csq
      Remove an inaccessible branch, remove optional getopt_long option as it is not supported by all platforms.
      Remove/add the "chr" prefix to chromosome name to match gff with faidx. Resolves #1507
      Generalize tags from custom functions
      Better resolution of ambiguous keys.
      Extend generalized functions in fill-tags more
      Add +split-vep -u option. Resolves samtools/bcftools#1508 (comment)
      Fix failing test
      Fix a minor memory leak to avoid failure of automated test
      Warn about depricating `csq -b`
      Update NEWS
      Minor usability bug fix
      Accept index file names as well as data file names
      Prevent out of bounds mate positions; remove unused code; new -O, --offset option to help creation of multisample test cases
      Consider only complete trios, do not crash on sample name typos. Fixes #1520
      Allow combining --pn and --pns
      Support for increasing ploidy with -n c:...; See also #1516
      Fix AD[0] generations for indels
      Test only increasing AD but leaving the default calling
      Add new --ar, --ambig-reads option
      Add support for atomization of Number=A,R string annotations. Resolves #1503
      Fix the --ar incAD0 mode, the promise was to increment the REF allele
      Add GT definition to the header if not already present.
      Append version information to the output VCF header
      Fix the --compression-level option. Resolves #1528
      Add a new --with-pAD option to allow processing of VCFs without FORMAT/QS. Change the existing --ppl option to an analogous --with-pPL, making the --ppl functional but undocumented
      Consider indels when deciding if two nearby variants are compound
      Switch to FMT/AD when --with-pPL is used, FMT/QS is not available, and more than 4 alleles are present
      Fix FORMAT/BCSQ bitmasks at multiallelic sites in multisample VCFs
      Fix forward strand of e0e3484e
      Add forgotten test output
      Add +checkploidy tests, see #1530
      Do not crash when trying to get usage page with `bcftools +check-ploidy -- -h`
      Make +check-ploidy optionally use missing genotypes. Replaces #1531
      Update documentation to prevent confusion, see #1538
      Inframe indels can introduce a stop codon and should be marked as such
      Add test for samtools/htslib#1321. This will be failing until the pull request is merged
      Exit gracefully instead of segfaulting
      Greedy assignment of alternate alleles to genotypes with -m -
      Make `query` format consistent
      Check if stop codon occurs before frameshift is restored,
      Add a new annotation mode to carry over missing values
      Minor usage clarification
      Add new `-c ~INFO/END` feature to match also by INFO/END tag
      Allow to change positions using `-c CHROM,POS,~POS`. Resolves #1545
      Complain if trying to use `concat -n` with a diferent output type. Resolves #1561
      Resolve a "fixme" case in `call -C alleles`
      Prevent segfault when -0 or -1 is omitted. Fixes #1562
      Allow to proceed with empty -0/-1 sets with --force-samples
      Don't bail on symbolic ALTs
      Add a missing warning about skipped sites. See also #1567
      New `--ligate-force` and `--ligate-warn` options
      Add new --indel-size option
      Finer control of --regions vs --targets overlap
      Fix a typo which leads to incorrect naming of output BCF files. Fixes #1404
      Apply 0d04159437d to the `+split` plugin as well
      Allow `index -s` on index file only w/o data file present
      [Minor] make tests conform to VCF specification
      Use `--output-type` to override the default compression level
      Bug fix in reporting upstream stops in multi-sample VCFs
      Prevent confusion such as #1580
      Remove unused calculation and prevent clang -Werror failure
      Fix clang-13 unused variable messages. Replaces #1587
      Make `sort --max-mem` more accurate
      Update NEWS
      Keep AN,AC values when merging VCFs with no samples
      Make the header compatibility check more strict. Fixes #1591
      Apply mask even if the VCF has no notion about the chromosome
      The --use-NAIVE mode should annotate with the de novo allele (FORMAT/VA) as well
      Clarification of --rf, --ff options
      --use-NAIVE mode should tolerate missing paternal GT at chrX in male probands
      Keep per-sample value count in expressions such as AD[:1] / sum(FMT/AD[*]), see #1604
      Renaming annotations from the command line
      Fix a bug in `-t q -e EXPR` logic
      Advertise plugins on the usage page
      [Minor] Declare fname to be const char
      Add gVCF consensus test
      Prevent compiler warning, use the correct directive
      Symbolic allele strings cannot be trimmed
      Review of --vcf-ids option with `--gensample`, `--hapsample` and `--haplegendsample`
      Allow GFFs with phase column unset. See #1628
      Clean HTML codes and clarify the use of `-s ^SAMPLE1,SAMPLE2`
      Make the `--samples` and `--samples-file` options work
      Prevent segfault on sites filtered with -i/-e in all files. Fixes #1632
      Add test for samtools/htslib#1370
      Check the return status of hts_readlist and warn about non-existent files
      Add support for ID~"regex" and ID!~"regex". Resolves #1640
      More flexible read filtering
      New `--mask`, `--mask-file` and `--mask-overlap` options
      Don't add a new Filter in the header when not needed
      Make use of TMPDIR environment variable. Resolves #1642
      Fix header line formatting
      Add new option --min-overlap to bcftools annotate
      Fix an API error, see #1647 for discussion
      Support for transfering ALT from VCF
      Document missing option
      Add test for #1598
      Update to 64bit integers, fixes #978
      New `-m flip-all` mode and support for arbitrary ids
      Use unsigned PRIu64 where appropriate
      Check for non-ACGTN REF allele when atomizing,
      Document supported consequence types. Resolves #1671
      Support for filling in FORMAT tags
      New `-H, --header-line` option
      Allow multiple custom functions in a single run
      Sanitize VEP field names that do not form valid INFO tags.
      Fix the loss of phasing in half-missing genotypes in variant atomization
      Prevent further segfaults with the combination of --atomize and --check-ref
      Fix endless loop or incorrect AF estimates
      Minor usage page update
      New plugin +variant-distance
      Strip column type prefix to avoid confusion. Resolves #1695
      Fix a bug introduced by c51199511c
      Prevent segfault, fix #1700
      Add pointer to extended consensus documentation
      Fix parsing minimal PED files. Resolves #1696
      Support both phased (1|0) and unphsed (1/0) genotypes. Resolves #1710
      Exit gracefully rather than segfaulting. See #1716
      Add new NMBZ annotation
      New `-H, --header-line` option
      New --strictly-novel option to downplay alleles which violate Mendelian inheritance but are not novel
      Remove unnecessary assert. Resolves #1717
      Remove unnecessary -t/-T requirement and make the default behavior explicit when -n/-C given. Resolves #1718
      Set --pn/--pns separately for SNVs and indels, make the indel default strict (0)
      Add support for querying multiple filters
      Fix a bug introduced in 1.14
      Add a new option -f to run only desired tests
      Fix duplications of PG line in +scatter
      Custom genotypes (e.g. `-n c:1/1`) now correctly override ploidy. Resolves #1726
      Skeleton for new mpileup indels, for now accessible with --indels-2.0
      Pileup client data and indel type finding
      Add modulo operator to filtering expressions. Resolves #1744
      Outline for read consensus creation
      First draft of read consensus creation functional
      New `-m snp-ins-del` switch to merge SNVs, insertions and deletions separately
      Functional --indels-2.0 prototype
      Fix a documentation typo
      When complaining about missing tags, be explicit if it is FORMAT or INFO. Resolves #1770
      Make missing AD into a non-critical warning for VAF/VAF1 and make it more informative. Resolves #1769
      Long reads with --indels-2.0
      Fix ref/qry sequence trimming for indel realignment
      Fix a bug, float arithmetics must be used, not int
      New --strictly-novel option to downplay alleles which violate Mendelian inheritance but are not novel
      Set --pn/--pns separately for SNVs and indels, make the indel default strict (0)
      Support sample reordering of annotation file. Resolves #1785
      Remove debugging asserts (using a temporary debug printout instead) and check return values when there are no indels types (-1 is returned)
      Remove unused cns_seq_t.pos array. Fix two index errors
      Fix a bug in recognizing the need to end the error correction
      Prevent segfault when no indels encountered; Add to 86330edaaa, end on time when correcting haplotypes
      Document explicitly output format of roh
      Remove unused pos_seq array
      Fix a rare bug where printing of SAMPLE field with `query` was incorrectly suppressed.
      Consider all indel types one the site passed for indel evaluation
      Add an experimental INFO/NM annotation
      Attempt to reduce indel quality in problematic regions
      Split the new NM annotation (e9d22b1f5d) into ref/alt counts, originally only alts were counted
      Add INFO/FIXREF annotation, add a new -m swap mode
      Fix a memory corruption bug with too many alleles passed to `-C alleles` via `-T`
      Sanitize VEP field names to enforce VCF conventions as in htslib/bcf_hrec_check. Fixes #1795
      Experimental filtering annotation MIN_PL_SUM, roughly corresponds to phred-scaled product of sample genotype likelihoods
      New -H option to print header with -f. Resolves #1798
      Check failed memory alloc in `sort`. Resolves #1801
      Add new `--new-gt X` option. Resolves #1800
      New `norm --multi-overlaps` option. Resolves #1802 and #1764
      Fix a silly bug in 87bf15961b0
      Prevent out of range indices and consequent segfault. Fixes #1805
      New options `-g, --gene-list` and `--gene-list-fields`
      [minor] Remove unused variable
      Fix gene restricting when combined with -s
      Make the --af-annots argument optional
      Make variantkey conversion work for ALT=. sites. Resolves #1806
      [DOCS] Add usage page warning for combining filtering with sample subsetting. Resolves #1807
      [MINOR] update NEWS
      New +mendelian2 plugin, deprecates the original +mendelian
      Fix a bug which under-reports MNV consequences
      Restore functionality of the --pair-logic option. Fixes #1808
      Add more tags with -t all. See also #916
      Fix a bug where indels constrained with `-C alleles -T` would sometimes be missed. Fixes #1706
      Fix a bug of not filling in missing FORMAT value and using the present vector_end instead. Fixes #1818
      Make most of the mpileup -a output tags optional
      Remember read's realn status in a clean way, not by misusing bam->core.flag
      Declare inline functions as static in the hope it fixes a compiler error in the -std=gnu99 test
      Update documentation and NEWS
      Add missing cigar case (hard clip) and remove a debugging warning
      Cache NM values, for speed. See #1826
      Remove NMBZ from default annotations, for perfomrance reasons. See #1826
      Change the behavior of the `-I, --iupac-codes` option
      Refer to the correct option --samples, rather than --sample
      New option `-d, --direction` to choose the directionality: fwd,rev,nearest,both
      Make the INFO/FS annotation functional
      Minor usage page clarification
      Split complex indels
      Add error checks to prevent incorrect use of vector arithmetics in -i/-e.
      Fix overflow for indels longer than 512bp. Fixes #1837
      Per-sample stats (PSC) would not be computed
      Make GFF file parsing more flexible. Add new misc/gff2gff script
      Revamp or +tag2tag
      Make the `-t ./x` mode select both phased and unphased genotypes,
      Update man page to prevent future confusion as in #1851
      Output missing FORMAT/VAF values in non-trio samples, rather than random nonsense values
      Add new `--target-gt r:FLOAT` option to randomly select a proportion of genotypes
      Arrays in Number=R tags can be now subscripted by alleles found in FORMAT/GT
      Exit when no matching sample is found
      Drop offending space character in `query -H` output
      Make +fill-tags recognise both `-t TAG` and `-t INFO/TAG`. Resolves #1857
      Replace assert with an error message to help with debugging. See #1044
      Work around a bug in the LOFTEE VEP plugin used to annotate gnomAD VCFs
      The -c option can be omitted when a VEP subfield is used in filtering expressions
      [minor] NEWS update
      Clarify -R/-T format to prevent confusion such as #1862
      Collect and plot new VAF stats
      The `-m, --mark-sites` option can be used to mark all sites
      The `-m` function did not respect the `--min-overlap` option
      Prevent a segfault when -i/-e use a VEP subfield not included in `-f` or `-c`
      Modify the substitution graph
      Support auto indexing during writing BCF and VCF.gz via new `--write-index` option
      Removed deprecated mendelian tests
      Exit nicely if isec to read a stream on standard input
      Revert "Exit nicely if isec to read a stream on standard input"
      Give control over creation of vectors with mixed known and missing values
      Support higher-ploidy genotypes with `-H, --haplotype`
      The `-m, --multiallelics +` mode now preserves phasing
      Revamped line matching code to fix problems in gVCF merging
      Allow `--mark-ins` and `--mark-snv` with a character, similarly to `--mark-del`
      Fixes in +mendelian2 command line parsing
      Improvements related to newline characters in formatting expressions
      Add new -X, --keep-sites option
      More fixes in gVCF merging
      Improvements in GFF parsing code
      The option --drop-genotypes cannot be used with --naive or --ligate.
      NEWS update
      [MINOR] remove unused function
      Fix a memory leak in `concat -G`, follows 68497b22e5b and 56a440406
      Fix a off-by-1 error in csq
      Force newline character when not given explicitly
      Support for conversion from tab-delimited files (CHROM,POS,REF,ALT) to sites-only VCFs
      Include 'NMD_transcript' in the consequence part of the annotation
      Identical rows must be returned when `-s` is applied regardless of `-f` containing the `-a` VEP tag itself or not.
      Add tests to #1920
      Support normalizing of symbolic <DEL.*> alleles
      Warn about overlapping CDS/ribosomal slippage but do not require --force option
      Update documentation
      Fix a bug in --indels-v2.0
      Add stats for the number of sites matched in the GT-vs-GT, GT-vs-PL, etc modes.
      Test case for htslib/1620 and htslib/1630
      Add expected failure test
      Clarify the XXXXXX template convention of mkdtemp
      Fix missed VCF regions
      The --gene-list option can be used for any field
      Make `reheader --fai` aware of long contigs
      Fix a bug when update of INFO/END results in assertion error
      Refactor csq code to provide a mini library for GFF parsing
      Add new `-g, --gff-annot` option
      Fix a memory leak
      Change debugging aln_win from 1 back to 100. Should not affect results, only performance
      Add --disable-automatic-newline option; Improve automatic behavior
      Filtering expressions can be given a file with list of strings to match
      [docs] add section on terminology. Resolves #1982
      Prevent multiple -w options, clarify documentation
      Fix `bcftools annotate --mark-sites`
      Add new `-F, --print-filtered` option and include sample names in the header
      Silly error left vcmp uninitialized. Fixes #1990
      Add the number of merged lines to the summary output
      Acknowledge functionality of -i'REF="N"'
      Allow combining -m and -a with --old-rec-tag
      Make --indels-2.0 work with BAM_CREF_SKIP operator
      Remove unused test
      Output MIN_DP instead of MinDP in gvcf mode
      New -*,--keep-unseen-allele option. Resolves #2015
      Don't expand REF when symbolic <DEL> allele is present
      Improvements and changes in gtcheck
      New `-s, --samples` option to include the #CHROM header line with samples.
      Modify the interpretation -E, --error-probability
      Add support for optional removal of the unseen allele
      Add flexibility to FILTER column transfers
      Replace semicolons with commas
      Remove forgotten debugging output
      Fix the lost ability to filter on subfield names
      Do not flag paternal genotyping errors as de novo mutations.
      Extend --strictly-novel
      Support for custom genotypes based on the allele with higher depth
      Clarify that no annotation is added in intergenic regions
      Exit nicely when non-existent field name requested
      Automatically select INFO/BCSQ when INFO/CSQ is not present.
      Transcript selection by MANE, PICK, and user-defined transcripts
      Change automatic type parsing of VEP fields DNA_position, CDS_position, and Protein_position
      [minor] Make fname const char*, as it should be
      Change of default order of -m,-a operations and fix a few bugs
      Fix the "Requested allele outside valid range" error.
      Complain if both --write-index and --naive options are given
      Fix `bcftools norm -m +indels`
      Add new --regions-overlap option
      Update documentation
      Add new `-l, --file-list` option. Resolves #2092
      Exit with an informative error message when wrong format given. Resolves #2095
      Fix two bugs in vep-split
      Fix a silly bug introduced by 12a6617a0b571c1a8b9903a9f75975a232c1257c
      Fix another silly bug
      Prevent segfault on invalid DP/AD values
      Add new option `--force-single` to support single-file edge case
      Fix a segfault on missing tags
      Update NEWS
      Fix GT indexing
      Minor documentation fix
      Update documentation
      Add usage case to demonstrate 78ed055a
      Support for conversion from tags using localized alleles (e.g. LPL, LAD)
      Support dynamic variables read from a tab-delimited annotation file
      Revert "Support dynamic variables read from a tab-delimited annotation file"
      Update NEWS
      Consider the possibility of strand=".". Resolves #2158
      Consider the possibility of strand="." and int signedness in comparisons. Resolves #2158

Pierre Lindenbaum (1):
      drop-genotypes for concat

Rob Davies (15):
      Fix missing autotools on Appveyor
      Replace CentOS test build with Rocky Linux
      Switch to https for htslib git clone
      Add NEWS updates
      Switch cirrus ubuntu image to ubuntu:latest
      Switch to rockylinux:9
      Adjust snp-ins-del code for the revised bcf_has_variant_type API
      Fix tsv convert bug
      Switch MacOS CI tests to an ARM-based image
      Happy New Year
      Stop make check from running tests twice
      Fix bus error in bcftools merge on armhf (32-bit hard-float)
      Portability improvements
      Happy New Year
      Allow bcftools reheader --fai to read its input file from a stream

Sebastian Schmeier (2):
      Fixes the python plotting routines.
      Fixes the python plotting routines.

SpikyClip (1):
      Improve docs on --soft-filter argument.

Tobias Rausch (3):
      GSL_LIBS
      ifndef
      ifndef

Valeriu Ohan (3):
      Add citation section.
      Try to work just with the index file when calculating record statistics.
      Use `bcf_hdr_id2name` directly, instead of `bcf_index_seqnames`.

freeseek (1):
      scatter variants based on VCF position

nicolaasuni (2):
      Update variantkey with upstream changes
      Fix license

Étienne Mollier (9):
      address multiple typos
      NEWS: improve "allows one to" wording.
      doc/bcftools.1: improve "allows one to" wording.
      doc/bcftools.html: improve "allows one to" wording.
      doc/bcftools.txt: improve "allows one to" wording.
      variantkey.h: small description improvement.
      plugins/*.c: improve init functions description.
      vcfplugin.c: improve init functions description.
      Apply Rob's suggestions from code review

## Release 1.20 (15th April 2024)

Changes affecting the whole of bcftools, or multiple commands:

* Add short option -W for --write-index. The option now accepts an optional parameter
  which allows to choose between TBI and CSI index format.

Changes affecting specific commands:

* bcftools consensus

    - Add new --regions-overlap option which allows to take into account overlapping deletions
      that start out of the fasta file target region.

(NEWS truncated at 15 lines)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bcftools call ignores deletion with high coverage
3 participants