-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pair lines with symbolic alleles by END tag #1321
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
While non-symbolic variation is uniquely identified by POS,REF,ALT, symbolic alleles starting at the same position were undistinguishable. This prevented correct matching of records with the same positions and variant type but different length (INFO/END) A test case is added in bcftools in a separate commit (annotate24.*.vcf)
pd3
added a commit
to samtools/bcftools
that referenced
this pull request
Aug 12, 2021
…l request is merged
clrpackages
pushed a commit
to clearlinux-pkgs/bcftools
that referenced
this pull request
Jul 10, 2024
Andrew Whitwham (8): Summer 2021 release copyright updates. Updated the version year to 2022. CentOS, Alpine Linux and Windows additions. Added error checks. Error comments changed on review. Another error message alteration. Stop treating multiple CIGAR match operations as indels. Switched back to openssl for Alpine. Colin Nolan (1): Fix bug where check-ploidy reports wrong chrom Dr. K. D. Murray (1): concat: don't always error when setting --verbose Dr. K.D. Murray (2): bcftools +fill-tag: include F_MISSING in --tag all better comment Gert Hulselmans (1): Fix grammar in bcftools gtcheck help James Bonfield (45): Improve mpileup speed by delaying BAQ calculation. Make a new calc_mwu_biasZ with s.d. normalised Z score. Add an indel-bias argument to mpileup. Add indel BQBZ, MQBZ, RPBZ and SCBZ INFO fields. Improve the indel INFO filtering metric, plus big fix to GT accuracy. Addition of STR finder. Refactor bcf_call_gap_prep. Tidy up STR finder and fix iscore calc. bam2bcf_indel tidyups. Rewrite get_position to not parse entire CIGAR. Improve calling for PacBio CCS reads. Remove the second probaln_glocal call from the indel caller. Adjust min/max qual for SNP caller. Fix for REF/ALT indel calculation. Add mpileup -U option for old MWU scale. Make partial-BAQ the default in mpileup. Correct the mpileup --config 1.12 parameter set. Add a --seed option to mpileup. Correct mpileup --ambig-reads documentation. Change the MWU test so MWU 1 or MWU-Z 0 is returned for variance=0. Recommend "icc -fp-model precise" if doing "make check". A minor speed up to bcftools merge. Further speed up vcf merge. Improve merge_vcf() error reporting, following review. Add --write-index to bcftools view Add error checking to gvcf_write_block Add a new indel caller. Add an mpileup --poly-mqual option for the "edlib" mode. - Enable the "edlib mode" for the more recent mpileup -X profiles. Recompute IDV and IMF if mpileup -a AD is set. Fix the indelQ assignment for multi-allelic indel sites. - Rename mpileup --edlib to --indels-cns Fix a clang16 warning on bit-field overflow. Add some scripts for evaluation bcftools mpileup. Fix a trivial memory leak in the mpileup usage (etc) code Don't change indel qual when indelQ == 0. Tweak the condition for calling a 2nd consensus sequence in a deletion. Fix REF indel calls for multi-sample calling. Make --write-index have an optional =FMT option. Make bcftools isec --write-index=FMT apply to isec directory output too Fix +scatter -n so it honours --write-index Fix a bug in expression parsing for `type=INDEL` with missing quotes. Add documentation on the optional =FMT bit of --write-index. Add short option -W for --write-index. Fix csq error message for now fasta file. John Marshall (16): ax_with_htslib.m4: Support separate HTSlib build directory Include <strings.h> directly where needed [minor] Add `bcftools view --with-header` option Reformat command list so *.1 output is aligned Fix Makefile dependencies [minor] Add new bcftools head command Fix Makefile dependencies; correct VCF data files [minor] Add --regions/--targets-overlap pos|record|variant mnemonic options Preserve MISSING when copying filter values to INFO and FORMAT Remove unused <zlib.h> inclusions Add `bcftools index --stats --all` option to display all contigs Print unknown stats as '.' rather than 0 Move includes needed by gff.c (not by gff.h) to gff.c Fix Makefile dependencies Print uint32_t values via PRIu32 rather than %d [minor] Remove -specs=...redhat-hardened... from Perl options Michael Hall (1): add missing M option in getopt call Nils Homer (1): bcftools sort has duplicate -O in the usage Petr Danecek (344): New --QR-QA-to-QS option. Also, capitalize the other options as well, however keeping the lowercase functional Support for VCFs with more than 4 ALTs Improve diagnostics and be flexible in the use of GT/PL Remove a redundant condition (a typo) and rename variable for clarity Fix language New --use-NAIVE option for a naive DNM calling based solely on FORMAT/GT. Atomization of AD and QS updates correctly occurrences of duplicate alleles Remove annotations with incorrect number of fields Fix silly typo made in 2bfe824b Atomization should not discard ALT=. records Fix and update documentation, PBINOM should be uppercase Support for matching annotation line also by ID, Always generate sites.txt with isec -p. Resolves #1462 Fix behavior to match the documentation, Switch from a2x to asciidoctor; much faster and less clumsy Fix Type=Flag output in `norm --atomize`. Resolves #1472 Update mpileup differences caused by randomized overlap removal in samtools/htslib#1273 Minor usage clarification Update tests and documentation Update README.md Fill AF from AC/AN when FMT/GT not present Fix arithmetic underflow, incorrect access to uninitialized memory. Fix bug in VAF calculation, hom DNMs were incorrectly set to 50% Replace "concat --rm-dup none" with "--rm-dup exact". Resolves samtools/bcftools#1089 (comment) Fix a typo. Thanks to Rick Wertenbroek, PR #1485 Update REF len in BCF records when all INFO tags are removed with `annotate -x INFO`. Fixes #1483 Pool soft-clipped reads together regardless of it being on red, indel, or snv reads Prevent segfaul, throw an error. See also #1477 Add `reheader -T` option and unify temp file creation across all commands that use it. Resolves #1497 Update documentation Possible fix of #1499 Allow setting of custom genotypes by +setGT Parametrize brief-predictions parameter (-b) of bcftools csq Remove an inaccessible branch, remove optional getopt_long option as it is not supported by all platforms. Remove/add the "chr" prefix to chromosome name to match gff with faidx. Resolves #1507 Generalize tags from custom functions Better resolution of ambiguous keys. Extend generalized functions in fill-tags more Add +split-vep -u option. Resolves samtools/bcftools#1508 (comment) Fix failing test Fix a minor memory leak to avoid failure of automated test Warn about depricating `csq -b` Update NEWS Minor usability bug fix Accept index file names as well as data file names Prevent out of bounds mate positions; remove unused code; new -O, --offset option to help creation of multisample test cases Consider only complete trios, do not crash on sample name typos. Fixes #1520 Allow combining --pn and --pns Support for increasing ploidy with -n c:...; See also #1516 Fix AD[0] generations for indels Test only increasing AD but leaving the default calling Add new --ar, --ambig-reads option Add support for atomization of Number=A,R string annotations. Resolves #1503 Fix the --ar incAD0 mode, the promise was to increment the REF allele Add GT definition to the header if not already present. Append version information to the output VCF header Fix the --compression-level option. Resolves #1528 Add a new --with-pAD option to allow processing of VCFs without FORMAT/QS. Change the existing --ppl option to an analogous --with-pPL, making the --ppl functional but undocumented Consider indels when deciding if two nearby variants are compound Switch to FMT/AD when --with-pPL is used, FMT/QS is not available, and more than 4 alleles are present Fix FORMAT/BCSQ bitmasks at multiallelic sites in multisample VCFs Fix forward strand of e0e3484e Add forgotten test output Add +checkploidy tests, see #1530 Do not crash when trying to get usage page with `bcftools +check-ploidy -- -h` Make +check-ploidy optionally use missing genotypes. Replaces #1531 Update documentation to prevent confusion, see #1538 Inframe indels can introduce a stop codon and should be marked as such Add test for samtools/htslib#1321. This will be failing until the pull request is merged Exit gracefully instead of segfaulting Greedy assignment of alternate alleles to genotypes with -m - Make `query` format consistent Check if stop codon occurs before frameshift is restored, Add a new annotation mode to carry over missing values Minor usage clarification Add new `-c ~INFO/END` feature to match also by INFO/END tag Allow to change positions using `-c CHROM,POS,~POS`. Resolves #1545 Complain if trying to use `concat -n` with a diferent output type. Resolves #1561 Resolve a "fixme" case in `call -C alleles` Prevent segfault when -0 or -1 is omitted. Fixes #1562 Allow to proceed with empty -0/-1 sets with --force-samples Don't bail on symbolic ALTs Add a missing warning about skipped sites. See also #1567 New `--ligate-force` and `--ligate-warn` options Add new --indel-size option Finer control of --regions vs --targets overlap Fix a typo which leads to incorrect naming of output BCF files. Fixes #1404 Apply 0d04159437d to the `+split` plugin as well Allow `index -s` on index file only w/o data file present [Minor] make tests conform to VCF specification Use `--output-type` to override the default compression level Bug fix in reporting upstream stops in multi-sample VCFs Prevent confusion such as #1580 Remove unused calculation and prevent clang -Werror failure Fix clang-13 unused variable messages. Replaces #1587 Make `sort --max-mem` more accurate Update NEWS Keep AN,AC values when merging VCFs with no samples Make the header compatibility check more strict. Fixes #1591 Apply mask even if the VCF has no notion about the chromosome The --use-NAIVE mode should annotate with the de novo allele (FORMAT/VA) as well Clarification of --rf, --ff options --use-NAIVE mode should tolerate missing paternal GT at chrX in male probands Keep per-sample value count in expressions such as AD[:1] / sum(FMT/AD[*]), see #1604 Renaming annotations from the command line Fix a bug in `-t q -e EXPR` logic Advertise plugins on the usage page [Minor] Declare fname to be const char Add gVCF consensus test Prevent compiler warning, use the correct directive Symbolic allele strings cannot be trimmed Review of --vcf-ids option with `--gensample`, `--hapsample` and `--haplegendsample` Allow GFFs with phase column unset. See #1628 Clean HTML codes and clarify the use of `-s ^SAMPLE1,SAMPLE2` Make the `--samples` and `--samples-file` options work Prevent segfault on sites filtered with -i/-e in all files. Fixes #1632 Add test for samtools/htslib#1370 Check the return status of hts_readlist and warn about non-existent files Add support for ID~"regex" and ID!~"regex". Resolves #1640 More flexible read filtering New `--mask`, `--mask-file` and `--mask-overlap` options Don't add a new Filter in the header when not needed Make use of TMPDIR environment variable. Resolves #1642 Fix header line formatting Add new option --min-overlap to bcftools annotate Fix an API error, see #1647 for discussion Support for transfering ALT from VCF Document missing option Add test for #1598 Update to 64bit integers, fixes #978 New `-m flip-all` mode and support for arbitrary ids Use unsigned PRIu64 where appropriate Check for non-ACGTN REF allele when atomizing, Document supported consequence types. Resolves #1671 Support for filling in FORMAT tags New `-H, --header-line` option Allow multiple custom functions in a single run Sanitize VEP field names that do not form valid INFO tags. Fix the loss of phasing in half-missing genotypes in variant atomization Prevent further segfaults with the combination of --atomize and --check-ref Fix endless loop or incorrect AF estimates Minor usage page update New plugin +variant-distance Strip column type prefix to avoid confusion. Resolves #1695 Fix a bug introduced by c51199511c Prevent segfault, fix #1700 Add pointer to extended consensus documentation Fix parsing minimal PED files. Resolves #1696 Support both phased (1|0) and unphsed (1/0) genotypes. Resolves #1710 Exit gracefully rather than segfaulting. See #1716 Add new NMBZ annotation New `-H, --header-line` option New --strictly-novel option to downplay alleles which violate Mendelian inheritance but are not novel Remove unnecessary assert. Resolves #1717 Remove unnecessary -t/-T requirement and make the default behavior explicit when -n/-C given. Resolves #1718 Set --pn/--pns separately for SNVs and indels, make the indel default strict (0) Add support for querying multiple filters Fix a bug introduced in 1.14 Add a new option -f to run only desired tests Fix duplications of PG line in +scatter Custom genotypes (e.g. `-n c:1/1`) now correctly override ploidy. Resolves #1726 Skeleton for new mpileup indels, for now accessible with --indels-2.0 Pileup client data and indel type finding Add modulo operator to filtering expressions. Resolves #1744 Outline for read consensus creation First draft of read consensus creation functional New `-m snp-ins-del` switch to merge SNVs, insertions and deletions separately Functional --indels-2.0 prototype Fix a documentation typo When complaining about missing tags, be explicit if it is FORMAT or INFO. Resolves #1770 Make missing AD into a non-critical warning for VAF/VAF1 and make it more informative. Resolves #1769 Long reads with --indels-2.0 Fix ref/qry sequence trimming for indel realignment Fix a bug, float arithmetics must be used, not int New --strictly-novel option to downplay alleles which violate Mendelian inheritance but are not novel Set --pn/--pns separately for SNVs and indels, make the indel default strict (0) Support sample reordering of annotation file. Resolves #1785 Remove debugging asserts (using a temporary debug printout instead) and check return values when there are no indels types (-1 is returned) Remove unused cns_seq_t.pos array. Fix two index errors Fix a bug in recognizing the need to end the error correction Prevent segfault when no indels encountered; Add to 86330edaaa, end on time when correcting haplotypes Document explicitly output format of roh Remove unused pos_seq array Fix a rare bug where printing of SAMPLE field with `query` was incorrectly suppressed. Consider all indel types one the site passed for indel evaluation Add an experimental INFO/NM annotation Attempt to reduce indel quality in problematic regions Split the new NM annotation (e9d22b1f5d) into ref/alt counts, originally only alts were counted Add INFO/FIXREF annotation, add a new -m swap mode Fix a memory corruption bug with too many alleles passed to `-C alleles` via `-T` Sanitize VEP field names to enforce VCF conventions as in htslib/bcf_hrec_check. Fixes #1795 Experimental filtering annotation MIN_PL_SUM, roughly corresponds to phred-scaled product of sample genotype likelihoods New -H option to print header with -f. Resolves #1798 Check failed memory alloc in `sort`. Resolves #1801 Add new `--new-gt X` option. Resolves #1800 New `norm --multi-overlaps` option. Resolves #1802 and #1764 Fix a silly bug in 87bf15961b0 Prevent out of range indices and consequent segfault. Fixes #1805 New options `-g, --gene-list` and `--gene-list-fields` [minor] Remove unused variable Fix gene restricting when combined with -s Make the --af-annots argument optional Make variantkey conversion work for ALT=. sites. Resolves #1806 [DOCS] Add usage page warning for combining filtering with sample subsetting. Resolves #1807 [MINOR] update NEWS New +mendelian2 plugin, deprecates the original +mendelian Fix a bug which under-reports MNV consequences Restore functionality of the --pair-logic option. Fixes #1808 Add more tags with -t all. See also #916 Fix a bug where indels constrained with `-C alleles -T` would sometimes be missed. Fixes #1706 Fix a bug of not filling in missing FORMAT value and using the present vector_end instead. Fixes #1818 Make most of the mpileup -a output tags optional Remember read's realn status in a clean way, not by misusing bam->core.flag Declare inline functions as static in the hope it fixes a compiler error in the -std=gnu99 test Update documentation and NEWS Add missing cigar case (hard clip) and remove a debugging warning Cache NM values, for speed. See #1826 Remove NMBZ from default annotations, for perfomrance reasons. See #1826 Change the behavior of the `-I, --iupac-codes` option Refer to the correct option --samples, rather than --sample New option `-d, --direction` to choose the directionality: fwd,rev,nearest,both Make the INFO/FS annotation functional Minor usage page clarification Split complex indels Add error checks to prevent incorrect use of vector arithmetics in -i/-e. Fix overflow for indels longer than 512bp. Fixes #1837 Per-sample stats (PSC) would not be computed Make GFF file parsing more flexible. Add new misc/gff2gff script Revamp or +tag2tag Make the `-t ./x` mode select both phased and unphased genotypes, Update man page to prevent future confusion as in #1851 Output missing FORMAT/VAF values in non-trio samples, rather than random nonsense values Add new `--target-gt r:FLOAT` option to randomly select a proportion of genotypes Arrays in Number=R tags can be now subscripted by alleles found in FORMAT/GT Exit when no matching sample is found Drop offending space character in `query -H` output Make +fill-tags recognise both `-t TAG` and `-t INFO/TAG`. Resolves #1857 Replace assert with an error message to help with debugging. See #1044 Work around a bug in the LOFTEE VEP plugin used to annotate gnomAD VCFs The -c option can be omitted when a VEP subfield is used in filtering expressions [minor] NEWS update Clarify -R/-T format to prevent confusion such as #1862 Collect and plot new VAF stats The `-m, --mark-sites` option can be used to mark all sites The `-m` function did not respect the `--min-overlap` option Prevent a segfault when -i/-e use a VEP subfield not included in `-f` or `-c` Modify the substitution graph Support auto indexing during writing BCF and VCF.gz via new `--write-index` option Removed deprecated mendelian tests Exit nicely if isec to read a stream on standard input Revert "Exit nicely if isec to read a stream on standard input" Give control over creation of vectors with mixed known and missing values Support higher-ploidy genotypes with `-H, --haplotype` The `-m, --multiallelics +` mode now preserves phasing Revamped line matching code to fix problems in gVCF merging Allow `--mark-ins` and `--mark-snv` with a character, similarly to `--mark-del` Fixes in +mendelian2 command line parsing Improvements related to newline characters in formatting expressions Add new -X, --keep-sites option More fixes in gVCF merging Improvements in GFF parsing code The option --drop-genotypes cannot be used with --naive or --ligate. NEWS update [MINOR] remove unused function Fix a memory leak in `concat -G`, follows 68497b22e5b and 56a440406 Fix a off-by-1 error in csq Force newline character when not given explicitly Support for conversion from tab-delimited files (CHROM,POS,REF,ALT) to sites-only VCFs Include 'NMD_transcript' in the consequence part of the annotation Identical rows must be returned when `-s` is applied regardless of `-f` containing the `-a` VEP tag itself or not. Add tests to #1920 Support normalizing of symbolic <DEL.*> alleles Warn about overlapping CDS/ribosomal slippage but do not require --force option Update documentation Fix a bug in --indels-v2.0 Add stats for the number of sites matched in the GT-vs-GT, GT-vs-PL, etc modes. Test case for htslib/1620 and htslib/1630 Add expected failure test Clarify the XXXXXX template convention of mkdtemp Fix missed VCF regions The --gene-list option can be used for any field Make `reheader --fai` aware of long contigs Fix a bug when update of INFO/END results in assertion error Refactor csq code to provide a mini library for GFF parsing Add new `-g, --gff-annot` option Fix a memory leak Change debugging aln_win from 1 back to 100. Should not affect results, only performance Add --disable-automatic-newline option; Improve automatic behavior Filtering expressions can be given a file with list of strings to match [docs] add section on terminology. Resolves #1982 Prevent multiple -w options, clarify documentation Fix `bcftools annotate --mark-sites` Add new `-F, --print-filtered` option and include sample names in the header Silly error left vcmp uninitialized. Fixes #1990 Add the number of merged lines to the summary output Acknowledge functionality of -i'REF="N"' Allow combining -m and -a with --old-rec-tag Make --indels-2.0 work with BAM_CREF_SKIP operator Remove unused test Output MIN_DP instead of MinDP in gvcf mode New -*,--keep-unseen-allele option. Resolves #2015 Don't expand REF when symbolic <DEL> allele is present Improvements and changes in gtcheck New `-s, --samples` option to include the #CHROM header line with samples. Modify the interpretation -E, --error-probability Add support for optional removal of the unseen allele Add flexibility to FILTER column transfers Replace semicolons with commas Remove forgotten debugging output Fix the lost ability to filter on subfield names Do not flag paternal genotyping errors as de novo mutations. Extend --strictly-novel Support for custom genotypes based on the allele with higher depth Clarify that no annotation is added in intergenic regions Exit nicely when non-existent field name requested Automatically select INFO/BCSQ when INFO/CSQ is not present. Transcript selection by MANE, PICK, and user-defined transcripts Change automatic type parsing of VEP fields DNA_position, CDS_position, and Protein_position [minor] Make fname const char*, as it should be Change of default order of -m,-a operations and fix a few bugs Fix the "Requested allele outside valid range" error. Complain if both --write-index and --naive options are given Fix `bcftools norm -m +indels` Add new --regions-overlap option Update documentation Add new `-l, --file-list` option. Resolves #2092 Exit with an informative error message when wrong format given. Resolves #2095 Fix two bugs in vep-split Fix a silly bug introduced by 12a6617a0b571c1a8b9903a9f75975a232c1257c Fix another silly bug Prevent segfault on invalid DP/AD values Add new option `--force-single` to support single-file edge case Fix a segfault on missing tags Update NEWS Fix GT indexing Minor documentation fix Update documentation Add usage case to demonstrate 78ed055a Support for conversion from tags using localized alleles (e.g. LPL, LAD) Support dynamic variables read from a tab-delimited annotation file Revert "Support dynamic variables read from a tab-delimited annotation file" Update NEWS Consider the possibility of strand=".". Resolves #2158 Consider the possibility of strand="." and int signedness in comparisons. Resolves #2158 Pierre Lindenbaum (1): drop-genotypes for concat Rob Davies (15): Fix missing autotools on Appveyor Replace CentOS test build with Rocky Linux Switch to https for htslib git clone Add NEWS updates Switch cirrus ubuntu image to ubuntu:latest Switch to rockylinux:9 Adjust snp-ins-del code for the revised bcf_has_variant_type API Fix tsv convert bug Switch MacOS CI tests to an ARM-based image Happy New Year Stop make check from running tests twice Fix bus error in bcftools merge on armhf (32-bit hard-float) Portability improvements Happy New Year Allow bcftools reheader --fai to read its input file from a stream Sebastian Schmeier (2): Fixes the python plotting routines. Fixes the python plotting routines. SpikyClip (1): Improve docs on --soft-filter argument. Tobias Rausch (3): GSL_LIBS ifndef ifndef Valeriu Ohan (3): Add citation section. Try to work just with the index file when calculating record statistics. Use `bcf_hdr_id2name` directly, instead of `bcf_index_seqnames`. freeseek (1): scatter variants based on VCF position nicolaasuni (2): Update variantkey with upstream changes Fix license Étienne Mollier (9): address multiple typos NEWS: improve "allows one to" wording. doc/bcftools.1: improve "allows one to" wording. doc/bcftools.html: improve "allows one to" wording. doc/bcftools.txt: improve "allows one to" wording. variantkey.h: small description improvement. plugins/*.c: improve init functions description. vcfplugin.c: improve init functions description. Apply Rob's suggestions from code review ## Release 1.20 (15th April 2024) Changes affecting the whole of bcftools, or multiple commands: * Add short option -W for --write-index. The option now accepts an optional parameter which allows to choose between TBI and CSI index format. Changes affecting specific commands: * bcftools consensus - Add new --regions-overlap option which allows to take into account overlapping deletions that start out of the fasta file target region. (NEWS truncated at 15 lines)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While non-symbolic variation is uniquely identified by POS,REF,ALT,
symbolic alleles starting at the same position were undistinguishable.
This prevented correct matching of records with the same positions
and variant type but different length (INFO/END)
A test case is added in bcftools in a separate commit
samtools/bcftools@4578b76