Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mpileup speedup #2

Open
wants to merge 242 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
242 commits
Select commit Hold shift + click to select a range
f07f0fb
Hidden switch for unscaled output in polysomy
pd3 Feb 11, 2015
278cb33
polysomy: new options, large rewrites at mostly unchanged performance
pd3 Mar 12, 2015
62764c0
Tuning the parameters and fitting procedure, all RA peaks included fo…
pd3 Mar 16, 2015
983bea4
polysomy: CN4 side peaks symmetry check now relative rather than abso…
pd3 Mar 17, 2015
57edcab
minor wording and whitespace changes to polysomy.c
mcshane Jun 29, 2015
9480a54
show polysomy command in help message
mcshane Jun 29, 2015
68b06b7
update copyright dates for peakfit
mcshane Jul 6, 2015
96e7858
Merge pull request #286 from samtools/feature/polysomy
mcshane Jul 6, 2015
baa24a8
Implements "bcftools view -G removes all format header lines"
winni2k Apr 28, 2015
8fdd498
The HMM structure becomes opaque for future flexibility
pd3 Mar 26, 2015
f506244
RoH bugfix: Transition probabilities must scale to 1
pd3 Jun 29, 2015
3ba5f38
New bcftools roh --AF-dflt option; changed -a,-H defaults; fixed a bu…
pd3 Jun 30, 2015
aea4cf9
Fix polysomy dependencies [minor]
jmarshall Jul 6, 2015
626527c
Merge pull request #287 from samtools/feature/roh
mcshane Jul 6, 2015
da26c06
plot-vcfstats: Fix edge case in tstv_by_qual plot, see #229
pd3 Jun 29, 2015
4eec7d5
concat: new --compact-PS switch
pd3 Mar 16, 2015
4abc0f2
concat -d option to remove duplicates based on one of the collapse sr…
pd3 Apr 29, 2015
566df3c
concat -l: Print warning and proceed if GT not present
pd3 Apr 29, 2015
0f15ea8
bcf_hdr_combine replaced by bcf_hdr_merge
mcshane Jul 6, 2015
7e7a7db
More control duplicate lines removal with norm (REF still not treated…
pd3 May 13, 2015
4bfe242
Output missing FMT fields as "." rather than negative integer when qu…
pd3 Jun 25, 2015
ddc0bef
Updated cnv command:
pd3 Feb 27, 2015
d2b012f
cnv -O: Increase robustness by smoothing HMM lks before updating the …
pd3 Mar 26, 2015
2b50604
cnv: Update to the modified HMM API
pd3 Mar 26, 2015
d3f9fe4
cnv: Prior on CN2; changed the default -P to be more strict (0.5)
pd3 Apr 27, 2015
733f257
Further support for case-insensitive REF,ALT allele merging
mcshane Jul 17, 2015
cecff77
more informative error message: annotate supports FMT fields only fro…
pd3 Jul 17, 2015
ae5663d
merge: More informative error message when FILTER is undefined
mcshane Jul 28, 2015
58a1cf3
Resolved assert when comparing long GT strings
pd3 Jul 11, 2015
0cf6efb
plugin: new --version switch
mcshane Jul 28, 2015
d8b82c3
vcfplugin: header boilerplate mentioned the wrong file
mcshane Jul 28, 2015
5653f1b
Include ctype for toupper declaration in vcmp
pd3 Jul 23, 2015
2962f5c
New fill-tags plugin to set INFO tags AN,AC,AC_Hom,AC_Het,AC_Hemi
pd3 Jul 23, 2015
653c26a
Fix bcftools_pluginCommand=plugin header annotation.
pd3 Jul 23, 2015
264745e
New isec -n~BITMASK option
pd3 May 29, 2015
6ccecd1
New annotate -m option to mark sites present/absent in a file
pd3 Jun 15, 2015
d0ff204
docs update
mcshane Jul 28, 2015
c4e8f08
Merge branch 'develop' of github.com:samtools/bcftools into feature/cnv
mcshane Jul 28, 2015
40ee806
Fix in -i/-e filtering in the convert command; query missing GT field…
pd3 Jul 17, 2015
b62ae45
bugfix in filtering expressions involving ALT=. sites
pd3 Mar 9, 2015
27f0b08
Merge pull request #298 from samtools/feature/cnv
mcshane Jul 28, 2015
a69cf61
Merge branch 'develop' of github.com:samtools/bcftools into develop
mcshane Jul 28, 2015
d3af7da
rbuf_expand0 macro made analogous to hts_expand0
pd3 May 3, 2015
33464ef
Moved hts_bcf_wmode to be accesible by plugins
pd3 May 3, 2015
0b8f0ea
norm: Replace IUPAC ambiguity codes in fasta ref with Ns
pd3 May 20, 2015
1b485ce
norm: Set freed alleles to NULL to prevent multiple free()s
pd3 Apr 24, 2015
5e1e930
stats: previously omitted ALT=. sites now count as SNPs in per-sample…
pd3 Mar 9, 2015
6fed5ac
stats: output also indel nHET and nAA in PSI stats
pd3 May 15, 2015
eee5780
gtcheck: switch automatically to -G99 at ALT=. sites in absence of PLs
pd3 Mar 10, 2015
7e15f83
gtcheck: more informative description of discordance per-site
pd3 Mar 10, 2015
255ee9a
Support for -i/-e filtering in vcfstats
pd3 May 13, 2015
0bde8e0
remove install dependency on asciidoc
mcshane Jul 17, 2015
a04ddb7
Merge pull request #295 from samtools/feature/fix.289
mcshane Jul 28, 2015
5d52a36
Merge pull request #299 from samtools/feature/stats
mcshane Jul 28, 2015
c4a53ac
update man docs
mcshane Jul 28, 2015
21e745a
call: new ploidy handling
pd3 May 28, 2015
117155c
Merge pull request #300 from samtools/feature/concat
mcshane Aug 4, 2015
942fbd2
vcfnorm: merge records even if tags are missing from some records
mcshane Aug 5, 2015
f47097b
Merge pull request #301 from samtools/feature/fix.282
mcshane Aug 5, 2015
6deaf37
update man docs with new concat options
mcshane Aug 5, 2015
21aca6d
cleanup GRCh38 alias description
mcshane Aug 5, 2015
ef11927
Three ways to add values in annotate: -c +ID, =ID, or ID
pd3 Jul 29, 2015
b3257c1
update man docs after ef11927ac5fc6feca75279d93b56061cf3ae6e96
mcshane Aug 5, 2015
90d0a6a
Proper ploidy initialization when the sexes are added explicitly.
pd3 Jul 30, 2015
3b0378f
call -m fix: skip sites with too many alleles (>32) instead of silent…
pd3 Jul 31, 2015
d4d42ca
Check that opening chain and output files succeeded
jmarshall Aug 7, 2015
63dd6d1
include missing call to set ploidy and add test
mcshane Sep 1, 2015
b1bf66a
Fix parsing of the default ploidy lines, both tab or spaces will work…
pd3 Sep 1, 2015
9a06670
default ploidy now ready from file after b1bf66a7287bfa67ca22cfb20633…
mcshane Sep 1, 2015
354b980
docs: clarify `--include-type` and `--exclude-types`
mcshane Sep 2, 2015
26778fa
fix FORMAT float filtering, resolves #312
pd3 Sep 2, 2015
fcd451b
Merge new handling of ploidy (PR #313)
jmarshall Sep 2, 2015
0dbe41e
Avoid using C99 NAN macro
jmarshall Sep 2, 2015
8e82f7d
update docs...
mcshane Sep 2, 2015
cabeeb2
plot-vcfstats: fix bug in merging multiple stats files
mcshane Sep 2, 2015
03cfc6a
add back in tests accidentally deleted in previous commit
mcshane Sep 2, 2015
590b697
Fix whitespace [trivial]
jmarshall Sep 2, 2015
5cf52c4
Add test for splitting and merging in one go
Sep 3, 2015
77d9c18
Fix in consensus, last position in the buffer was not updated. Resolv…
pd3 Sep 3, 2015
7465fc8
Merge norm -m- test (PR #315)
jmarshall Sep 14, 2015
5940dc0
plot-vcfstats: ensure merged SN fields are in a specific order
mcshane Sep 15, 2015
ae4c9b0
docs: mention that --fasta-ref is required for left-alignment and nor…
mcshane Sep 15, 2015
274c642
PL to determine sex with +vcf2sex
mcshane Sep 16, 2015
4dd02c3
new setGT plugin
mcshane Sep 16, 2015
f46aa3e
Plugins to use bcf_gt_is_missing instead of the old bcf_int32_missing
pd3 Sep 8, 2015
84aff62
fixploidy: Do not exit on lines without GT, leave unchanged
pd3 Apr 28, 2015
e63a37f
convert: Report the number of written records
pd3 Feb 17, 2015
f1129e6
Update test VCF headers to v4.2
Sep 17, 2015
ceb1524
convert -H: Fixes issue #274; adds support for haploid and missing al…
winni2k Sep 22, 2015
7b39452
Merge pull request #321 from humanlongevity/develop
mcshane Sep 24, 2015
c29c8c3
Fixed bug in trio calling introduced by bcc7b26e
pd3 Mar 16, 2015
5372280
vcfindex: make index -s work even if length attribute is not present
pd3 Sep 30, 2015
6d2faf2
Bringing "call --gvcf" from experimental to main repo.
pd3 Sep 24, 2015
2346a00
vcfcall: gvcf calling not working with -c calling mode yet
mcshane Oct 1, 2015
7d084e1
Merge branch 'develop' of https://github.com/wkretzsch/bcftools into …
mcshane Oct 1, 2015
1d3d6fd
vcfconvert: support for END tag when converting from oxford formats t…
mcshane Oct 1, 2015
e5a2919
Merge branch 'wkretzsch-develop' into develop
mcshane Oct 1, 2015
1c126ec
docs: clarify merge behaviour for INFO and QUAL fields
mcshane Oct 1, 2015
489657c
add info plugin to add information metrics from GP tags
mcshane Oct 2, 2015
c645822
mendelian plugin to check for mendelian inconsistencies
mcshane Oct 2, 2015
fe79a91
add plugin to color shared chromosomal segments
mcshane Oct 2, 2015
2f3bc55
tag2tag.c: minor fix to boilerplate header
mcshane Oct 2, 2015
2f3ab49
color-chrs.pl: add copyright boilerplate
mcshane Oct 2, 2015
4923227
Makefile: remove unnecessary "\"
mcshane Oct 2, 2015
7357020
rename plugin/info.c -> plugin/impute-info.c
mcshane Oct 2, 2015
818cfbe
add tests for trimming many alleles and dropping header+format
mcshane Oct 12, 2015
dacc705
genaralise mcall_trim_numberR beyond DPR to any Number=R/Type=Integer…
mcshane Oct 15, 2015
9fc3364
Small polysomy improvements
pd3 Sep 23, 2015
f0889c4
New hidden -n and -S options to polysomy
pd3 Sep 23, 2015
83c9593
change to filter.c:filters_init()
adamspargo Oct 14, 2015
8deae27
Merge pull request #337 from adamspargo/issue320
mcshane Oct 15, 2015
aa6bc17
add test for consensus on empty vcf file
mcshane Oct 22, 2015
f6831b5
norm: Expand round buffer if needed. Fixes #336
pd3 Oct 21, 2015
983067a
cnv: New -k option, fixed non-functional parsing of -dfloat,float
pd3 Oct 6, 2015
2fc54de
CNV output include number of sites/hets in regions
pd3 Oct 9, 2015
e325cd0
Reduce memory consumption in cnv plotting script
pd3 Oct 21, 2015
3c4ddc7
Added cnv and polysomy commands to the docs
pd3 Oct 22, 2015
649f1e2
Expanded docs to prevent errors reported in #346 and #239
pd3 Nov 2, 2015
bca7778
Support for --guess GL in vcf2sex
pd3 Nov 5, 2015
c30fb5f
Fixes in vcf2sex
pd3 Nov 5, 2015
b12ca08
More informative error msg in fill-tags plugin
pd3 Nov 2, 2015
edca051
Expanded plugin docs: explain "+pname" vs "plugin name"
pd3 Nov 2, 2015
7564362
vcfmerge: Fix error message, correct chr name must be printed
pd3 Oct 30, 2015
6d12391
gtcheck: Use also sites without DP for GT checking; Plot discordance …
pd3 Oct 31, 2015
6ff1c21
Distinguish between & and && in -i/-e FMT filtering
pd3 Oct 27, 2015
74cc328
Fix in filtering of missing values.
pd3 Oct 27, 2015
c43c235
Document requirement of compressed/indexed files when reading multipl…
mcshane Nov 6, 2015
0b9ad1a
document subset behaviour with regard to sample dependent tags
mcshane Nov 6, 2015
08cc3f2
add some missing documentation
mcshane Nov 6, 2015
06ce5a1
update man and html doc from the asciidoc
mcshane Nov 6, 2015
2e5fda7
Dynamically export bcftools symbols to plugins
jmarshall Nov 9, 2015
db3d87d
Fix error message segfault [minor]
jmarshall Nov 10, 2015
63cffa4
further fixes in vcf2sex
pd3 Nov 12, 2015
3eb86dc
plot-vcfstats: plotting of sample names fixed
dlaehnemann Nov 12, 2015
6470ba4
Merge pull request #352 from dlaehnemann/develop
mcshane Nov 13, 2015
6710191
Merge branch 'feature/cnv' into develop
mcshane Nov 16, 2015
459a7ab
update docs from asciidoc
mcshane Nov 16, 2015
37b0ddc
add *.dSYM to .gitignore
mcshane Nov 16, 2015
8e8679d
Prevent empty PL fields on BCF output with "call -C alleles"
pd3 Nov 18, 2015
03c60cf
add POS as a filter column
mcshane Nov 19, 2015
1a55e45
Merge pull request #354 from samtools/feature/filter_pos
mcshane Nov 20, 2015
3aa989e
Fix typo: annotate -x ^FORMAT/TAG did not work
pd3 Nov 30, 2015
5d90843
split out test input files for -c and -m calling
mcshane Dec 7, 2015
71921ca
call -mC alleles fix: Do not take X as last in ALT list for granted
pd3 Nov 30, 2015
6737c5c
add --threads option for output compression threads
mcshane Nov 9, 2015
5c0d811
update manpage docs after adding --threads options
mcshane Dec 15, 2015
d388dd1
Release 1.3: various new options, plugins, and bug fixes
jmarshall Dec 15, 2015
06435c0
Merge version number bump from master
jmarshall Dec 15, 2015
9e33c25
Happy New Year
mcshane Jan 11, 2016
781e049
annotate: print more informative error message, resolves #365
pd3 Jan 11, 2016
bbe1543
call: improvements in "bcftools call --ploidy" when all samples haploid
pd3 Jan 11, 2016
cba6e32
index: do not segfault when indexing non-existent files. Fixes #367
pd3 Jan 11, 2016
3138e74
norm: remove forced flush, forgotten when fixing #336. Fixes #368
pd3 Jan 11, 2016
611297e
reheader: support spaces in sample names
pd3 Jan 12, 2016
c0b49dc
annotate: add to FORMAT fields from a tab-delimited file
pd3 Oct 6, 2015
ba58f21
The fill-tags plugin can add AF. Resolves #345
pd3 Oct 29, 2015
820e1d6
roh: Viterbi training fixed, resolves #328
pd3 Dec 17, 2015
36036cf
concat: new --naive option for fast operations on large BCFs
pd3 Nov 20, 2015
2b4572a
Merge pull request #359 from samtools/feature/naive_concat
mcshane Jan 15, 2016
2cd6f08
Pass unchanged argc,argv to plugins with run() implemented.
pd3 Nov 18, 2015
9cf0d82
add --no-version option to suppress appending of version and command …
mcshane Jan 21, 2016
e0890a1
reheader: allow empty body with vcf.gz. Fixes #356
pd3 Jan 22, 2016
5d25295
Fix merging of Number=G format strings
jmarshall Feb 1, 2016
ef62fd0
Merge pull request #374 from samtools/feature/no_version
mcshane Feb 10, 2016
4e041a4
remove some chatter from test suite
mcshane Feb 10, 2016
e915fd6
fill-tags plugin: add NS tag
mcshane Feb 12, 2016
be92659
test plugins via travis too
mcshane Feb 12, 2016
681d80c
peakfit build with GSLv2, resolves #378
pd3 Feb 11, 2016
f876f53
Merge pull request #381 from samtools/feature/test_plugins
mcshane Feb 12, 2016
a121590
plot-vcfstats: update expected headers [minor]
mcshane Feb 19, 2016
ef4e8a9
vcfannotate: fix typo [minor]
mcshane Feb 19, 2016
9f5ee21
vcfcall: fix typo in error message [minor]
mcshane Feb 19, 2016
2f4a2b2
new plugin: GTisec to count genotype intersections across all possibl…
dlaehnemann Dec 4, 2015
f31e888
Merge branch 'dlaehnemann-develop' into develop
mcshane Feb 23, 2016
0bed387
docs: add some clarification for view filters. fix some typos. closes…
mcshane Mar 4, 2016
e38a10a
whitespace [minor]
mcshane Mar 4, 2016
bf38eac
norm: should not crash on deletions in telomere N regions which canno…
pd3 Mar 4, 2016
2211509
impute-info: do calculation with double; only convert to float on ins…
mcshane Mar 14, 2016
afecb4e
getopt_long() returns int, not char
jmarshall Mar 15, 2016
3eb4e8e
Mark error() as HTS_NORETURN
jmarshall Mar 16, 2016
4c93386
Check return codes from bgzf_read() / bgzf_write() calls
jmarshall Mar 16, 2016
21994b9
GLs are log10 scaled, not ln scaled. Fixes #394
pd3 Mar 17, 2016
47e811c
Fix docs grammar [minor]
jmarshall Mar 31, 2016
c45b93d
docs: clarify use of BED files in --regions-file and --targets-file o…
mcshane Apr 18, 2016
ee99052
docs: update man and html docs
mcshane Apr 18, 2016
0f1bc9d
Release 1.3.1: bug fixes, new GTisec plugin
jmarshall Apr 22, 2016
4d44e83
Merge version number bump from master
jmarshall Apr 22, 2016
77e76e5
filter: fix bug with uninitialised key
mcshane Apr 29, 2016
679c6e6
norm: informative error messages on fields with wrong Number=R,A values
pd3 Mar 23, 2016
8eb3804
concat: `--naive` option now works for both bcf and vcf.gz
pd3 Mar 25, 2016
e3ef8da
reheader: allow multispace delimiters in `reheader -s` files
pd3 Apr 13, 2016
0c88e1a
mcall: trim all `Number=R` tags
pd3 Feb 2, 2016
0a5e8f8
call: new `-F, --prior-freqs` option
pd3 Apr 25, 2016
4b16ea3
docs: fix up alphabetic order of commands in docs cnv>call
mcshane May 3, 2016
e8b7c5b
Use bcf_hdr_format() rather than deprecated bcf_hdr_fmt_text()
jmarshall May 11, 2016
f7f36d9
Avoid angle brackets around option arguments
jmarshall May 12, 2016
84bbe86
Merge mcall trim alleles and population priors (PR #417)
jmarshall May 16, 2016
b9e0105
annotate: add `-m` support for VCF, not just tab-delimited annot files
pd3 Feb 15, 2016
ee8148d
Refactorings and warning preventions [minor]
jmarshall May 17, 2016
170198d
Fix test_cmd(exp=>"expected") when test fails
jmarshall May 24, 2016
7c7c7a2
Merge concat --naive for vcf.gz file (PR #418)
jmarshall May 24, 2016
16414e6
Avoid segfault when QUAL < 0
jmarshall May 27, 2016
df3ca41
consensus: handle variants which overlap region boundaries
pd3 May 6, 2016
25d042b
norm: strict checking of non-ACGTN bases in `norm -cwx`
pd3 Jun 20, 2016
d083c5b
roh: fix nonfunctional --AF-dflt option
pd3 May 20, 2016
2035ddd
call: re-enable numeric ploidy definitions via `--samples-file`
pd3 Jun 13, 2016
be8b378
cnv: add params estimated by -O to the summary cnv output
pd3 May 2, 2016
101061c
call: add some extra info to the ploidy help message [minor]
mcshane Jun 21, 2016
e29f2f5
plugin: decrease verbosity of plugin -l, old output via -lv and -lvv
pd3 Jun 3, 2016
4bf6c48
tag2tag: new `--gp-to-gt` option
pd3 Mar 31, 2016
206c4ee
tag2tag: add hard-call `--threshold` option; add `--gp-to-gt` test
mcshane Jun 28, 2016
7c56ce4
Merge branch 'feature/tag2tag' into develop
mcshane Jun 28, 2016
3c7df1d
plugin: new trio-switch-rate plugin
pd3 Feb 3, 2016
a0107db
trio-switch-rate: add test and clean a few edge cases
mcshane Jun 27, 2016
ead9086
Merge branch 'feature/trio' into develop
mcshane Jun 28, 2016
c78a6f5
plugin: new plugin to detect sites with statistically significant dif…
pd3 Mar 17, 2016
dd11889
ad-bias: do not skip first line of samples file; added test
mcshane Jun 28, 2016
da5e8a0
Merge branch 'feature/ad-bias' into develop
mcshane Jun 28, 2016
e43ca8c
norm: avoid regression when ALT is a spanning deletion `*`
mcshane Jul 1, 2016
fe43a83
copy over mpileup files from samtools
mcshane Apr 25, 2016
e46a2cf
rename bam_plcmd.c -> mpileup.c
mcshane Apr 25, 2016
edc2bc6
add mpileup command to bcftools
mcshane Apr 25, 2016
b5c2093
remove mpileup text output
mcshane Apr 26, 2016
303cc36
import mpileup tests from samtools
mcshane Apr 26, 2016
b20d1e7
bring over mpileup documentation
mcshane Apr 26, 2016
6546afd
mpileup: remove `printw()` which was only used in the text output
mcshane May 9, 2016
acb1b9b
remove deprecated options
mcshane May 26, 2016
d1b8e88
mpileup: add `--gvcf` output mode
mcshane May 26, 2016
8091e19
mpileup: new `-s` and `-S` options to include only selected samples
mcshane May 27, 2016
d2f3068
mpileup: switch `--output-tags` option to `--annotate`
mcshane May 31, 2016
c153e8b
regidx: import regidx api from htslib into bcftools
mcshane Jun 13, 2016
d766590
regidx: extend regidx API
mcshane Jun 17, 2016
d4a7db3
mpileup: support multiple regions
mcshane Jun 20, 2016
ceb0a20
mpileup: in read_file_list(), check for URL syntax or extant files
mcshane Jun 22, 2016
46479b6
mpileup: add `--no-version` and `--threads` options as for other bcft…
mcshane Jun 22, 2016
c552060
mpileup: enable `-g` short `--gvcf` option that was missed
mcshane Jun 23, 2016
2607f11
mpileup: exclude samples when prefixed with ^
pd3 Jun 23, 2016
51e31af
mpileup: -l replaced with -t,-T to match other bcftools commands
pd3 Jun 23, 2016
7f4929c
rename sample.* to bam_sample.*
mcshane Jun 30, 2016
5562f05
mpileup: changes -G/--exclude-RG to -G/--read-groups
pd3 Jun 24, 2016
251fe2a
Make use of the pileup constructor/destructor interface.
jkbonfield Jul 12, 2016
c333e8b
Speed up of mpileup funcs.
jkbonfield Jul 13, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ bcftools
*.o
/version.h
plugins/*.so
plugins/*.dSYM
plugins/*.P

/test/test-rbuf
Expand Down
23 changes: 13 additions & 10 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,25 +1,28 @@
# Control file for continuous integration testing at http://travis-ci.org/

language: c
compiler:
- clang
- gcc

env:
global:
- HTSDIR=./htslib
- LD_LIBRARY_PATH=$TRAVIS_BUILD_DIR/htslib:$LD_LIBRARY_PATH

matrix:
include:
- os: linux
compiler: clang
- os: linux
compiler: gcc
# An unoptimised C99 build, for detecting non-static inline functions
- compiler: gcc
- os: linux
compiler: gcc
env: CFLAGS="-std=gnu99 -O0"
- os: osx
compiler: clang

env:
global:
- HTSDIR=./htslib

before_script:
# Clone samtools/htslib (or another repository, as specified by a Travis CI
# repository $HTSREPO setting) and check out a corresponding branch with the
# same name, if any, or otherwise the default branch.
- .travis/clone ${HTSREPO:-git://github.com/samtools/htslib.git} $HTSDIR $TRAVIS_BRANCH

script: make -e && make -e test
script: make plugindir=$TRAVIS_BUILD_DIR/plugins -e && make -e test-plugins
81 changes: 70 additions & 11 deletions HMM.c
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
/* The MIT License

Copyright (c) 2014 Genome Research Ltd.
Copyright (c) 2014-2015 Genome Research Ltd.

Author: Petr Danecek <pd3@sanger.ac.uk>

Expand Down Expand Up @@ -31,6 +31,33 @@
#include <htslib/hts.h>
#include "HMM.h"

struct _hmm_t
{
int nstates; // number of states

double *vprob, *vprob_tmp; // viterbi probs [nstates]
uint8_t *vpath; // viterbi path [nstates*nvpath]
double *bwd, *bwd_tmp; // bwd probs [nstates]
double *fwd; // fwd probs [nstates*(nfwd+1)]
int nvpath, nfwd;

int ntprob_arr; // number of pre-calculated tprob matrices
double *curr_tprob, *tmp; // Temporary arrays; curr_tprob is short lived, valid only for
// one site (that is, one step of Viterbi algorithm)
double *tprob_arr; // Array of transition matrices, precalculated to ntprob_arr
// positions. The first matrix is the initial tprob matrix
// set by hmm_init() or hmm_set_tprob()
set_tprob_f set_tprob; // Optional user function to set / modify transition probabilities
// at each site (one step of Viterbi algorithm)
void *set_tprob_data;
double *init_probs; // Initial state probabilities, NULL for uniform probs
};

uint8_t *hmm_get_viterbi_path(hmm_t *hmm) { return hmm->vpath; }
double *hmm_get_tprob(hmm_t *hmm) { return hmm->tprob_arr; }
int hmm_get_nstates(hmm_t *hmm) { return hmm->nstates; }
double *hmm_get_fwd_bwd_prob(hmm_t *hmm) { return hmm->fwd; }

static inline void multiply_matrix(int n, double *a, double *b, double *dst, double *tmp)
{
double *out = dst;
Expand Down Expand Up @@ -63,6 +90,18 @@ hmm_t *hmm_init(int nstates, double *tprob, int ntprob)
return hmm;
}

void hmm_init_states(hmm_t *hmm, double *probs)
{
if ( !probs )
{
free(hmm->init_probs);
hmm->init_probs = NULL;
}

if ( !hmm->init_probs ) hmm->init_probs = (double*) malloc(sizeof(double)*hmm->nstates);
memcpy(hmm->init_probs,probs,sizeof(double)*hmm->nstates);
}

void hmm_set_tprob(hmm_t *hmm, double *tprob, int ntprob)
{
hmm->ntprob_arr = ntprob;
Expand Down Expand Up @@ -118,7 +157,10 @@ void hmm_run_viterbi(hmm_t *hmm, int n, double *eprobs, uint32_t *sites)

// Init all states with equal likelihood
int i,j, nstates = hmm->nstates;
for (i=0; i<nstates; i++) hmm->vprob[i] = 1./nstates;
if ( hmm->init_probs )
for (i=0; i<nstates; i++) hmm->vprob[i] = hmm->init_probs[i];
else
for (i=0; i<nstates; i++) hmm->vprob[i] = 1./nstates;

// Run Viterbi
uint32_t prev_pos = sites[0];
Expand All @@ -130,7 +172,7 @@ void hmm_run_viterbi(hmm_t *hmm, int n, double *eprobs, uint32_t *sites)
int pos_diff = sites[i] == prev_pos ? 0 : sites[i] - prev_pos - 1;

_set_tprob(hmm, pos_diff);
if ( hmm->set_tprob ) hmm->set_tprob(hmm, prev_pos, sites[i], hmm->set_tprob_data);
if ( hmm->set_tprob ) hmm->set_tprob(hmm, prev_pos, sites[i], hmm->set_tprob_data, hmm->curr_tprob);
prev_pos = sites[i];

double vnorm = 0;
Expand Down Expand Up @@ -182,8 +224,16 @@ void hmm_run_fwd_bwd(hmm_t *hmm, int n, double *eprobs, uint32_t *sites)

// Init all states with equal likelihood
int i,j,k, nstates = hmm->nstates;
for (i=0; i<nstates; i++) hmm->fwd[i] = 1./hmm->nstates;
for (i=0; i<nstates; i++) hmm->bwd[i] = 1./hmm->nstates;
if ( hmm->init_probs )
{
for (i=0; i<nstates; i++) hmm->fwd[i] = hmm->init_probs[i];
for (i=0; i<nstates; i++) hmm->bwd[i] = hmm->init_probs[i];
}
else
{
for (i=0; i<nstates; i++) hmm->fwd[i] = 1./hmm->nstates;
for (i=0; i<nstates; i++) hmm->bwd[i] = 1./hmm->nstates;
}

// Run fwd
uint32_t prev_pos = sites[0];
Expand All @@ -196,7 +246,7 @@ void hmm_run_fwd_bwd(hmm_t *hmm, int n, double *eprobs, uint32_t *sites)
int pos_diff = sites[i] == prev_pos ? 0 : sites[i] - prev_pos - 1;

_set_tprob(hmm, pos_diff);
if ( hmm->set_tprob ) hmm->set_tprob(hmm, prev_pos, sites[i], hmm->set_tprob_data);
if ( hmm->set_tprob ) hmm->set_tprob(hmm, prev_pos, sites[i], hmm->set_tprob_data, hmm->curr_tprob);
prev_pos = sites[i];

double norm = 0;
Expand All @@ -222,7 +272,7 @@ void hmm_run_fwd_bwd(hmm_t *hmm, int n, double *eprobs, uint32_t *sites)
int pos_diff = sites[n-i-1] == prev_pos ? 0 : prev_pos - sites[n-i-1] - 1;

_set_tprob(hmm, pos_diff);
if ( hmm->set_tprob ) hmm->set_tprob(hmm, sites[n-i-1], prev_pos, hmm->set_tprob_data);
if ( hmm->set_tprob ) hmm->set_tprob(hmm, sites[n-i-1], prev_pos, hmm->set_tprob_data, hmm->curr_tprob);
prev_pos = sites[n-i-1];

double bwd_norm = 0;
Expand Down Expand Up @@ -262,8 +312,16 @@ void hmm_run_baum_welch(hmm_t *hmm, int n, double *eprobs, uint32_t *sites)

// Init all states with equal likelihood
int i,j,k, nstates = hmm->nstates;
for (i=0; i<nstates; i++) hmm->fwd[i] = 1./hmm->nstates;
for (i=0; i<nstates; i++) hmm->bwd[i] = 1./hmm->nstates;
if ( hmm->init_probs )
{
for (i=0; i<nstates; i++) hmm->fwd[i] = hmm->init_probs[i];
for (i=0; i<nstates; i++) hmm->bwd[i] = hmm->init_probs[i];
}
else
{
for (i=0; i<nstates; i++) hmm->fwd[i] = 1./hmm->nstates;
for (i=0; i<nstates; i++) hmm->bwd[i] = 1./hmm->nstates;
}

// New transition matrix: temporary values
double *tmp_xi = (double*) calloc(nstates*nstates,sizeof(double));
Expand All @@ -281,7 +339,7 @@ void hmm_run_baum_welch(hmm_t *hmm, int n, double *eprobs, uint32_t *sites)
int pos_diff = sites[i] == prev_pos ? 0 : sites[i] - prev_pos - 1;

_set_tprob(hmm, pos_diff);
if ( hmm->set_tprob ) hmm->set_tprob(hmm, prev_pos, sites[i], hmm->set_tprob_data);
if ( hmm->set_tprob ) hmm->set_tprob(hmm, prev_pos, sites[i], hmm->set_tprob_data, hmm->curr_tprob);
prev_pos = sites[i];

double norm = 0;
Expand All @@ -307,7 +365,7 @@ void hmm_run_baum_welch(hmm_t *hmm, int n, double *eprobs, uint32_t *sites)
int pos_diff = sites[n-i-1] == prev_pos ? 0 : prev_pos - sites[n-i-1] - 1;

_set_tprob(hmm, pos_diff);
if ( hmm->set_tprob ) hmm->set_tprob(hmm, sites[n-i-1], prev_pos, hmm->set_tprob_data);
if ( hmm->set_tprob ) hmm->set_tprob(hmm, sites[n-i-1], prev_pos, hmm->set_tprob_data, hmm->curr_tprob);
prev_pos = sites[n-i-1];

double bwd_norm = 0;
Expand Down Expand Up @@ -362,6 +420,7 @@ void hmm_run_baum_welch(hmm_t *hmm, int n, double *eprobs, uint32_t *sites)

void hmm_destroy(hmm_t *hmm)
{
free(hmm->init_probs);
free(hmm->vprob);
free(hmm->vprob_tmp);
free(hmm->vpath);
Expand Down
57 changes: 31 additions & 26 deletions HMM.h
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
/* The MIT License

Copyright (c) 2014 Genome Research Ltd.
Copyright (c) 2014-2015 Genome Research Ltd.

Author: Petr Danecek <pd3@sanger.ac.uk>

Expand Down Expand Up @@ -31,28 +31,7 @@

typedef struct _hmm_t hmm_t;

typedef void (*set_tprob_f) (hmm_t *hmm, uint32_t prev_pos, uint32_t pos, void *data);

struct _hmm_t
{
int nstates; // number of states

double *vprob, *vprob_tmp; // viterbi probs [nstates]
uint8_t *vpath; // viterbi path [nstates*nvpath]
double *bwd, *bwd_tmp; // bwd probs [nstates]
double *fwd; // fwd probs [nstates*(nfwd+1)]
int nvpath, nfwd;

int ntprob_arr; // number of pre-calculated tprob matrices
double *curr_tprob, *tmp; // Temporary arrays; curr_tprob is short lived, valid only for
// one site (that is, one step of Viterbi algorithm)
double *tprob_arr; // Array of transition matrices, precalculated to ntprob_arr
// positions. The first matrix is the initial tprob matrix
// set by hmm_init() or hmm_set_tprob()
set_tprob_f set_tprob; // Optional user function to set / modify transition probabilities
// at each site (one step of Viterbi algorithm)
void *set_tprob_data;
};
typedef void (*set_tprob_f) (hmm_t *hmm, uint32_t prev_pos, uint32_t pos, void *data, double *tprob);

/**
* hmm_init() - initialize HMM
Expand All @@ -65,6 +44,22 @@ struct _hmm_t
hmm_t *hmm_init(int nstates, double *tprob, int ntprob);
void hmm_set_tprob(hmm_t *hmm, double *tprob, int ntprob);

/**
* hmm_init_states() - initial state probabilities
* @probs: initial state probabilities or NULL to reset to default
*
* If uncalled, all states are initialized with the same likelihood
*/
void hmm_init_states(hmm_t *hmm, double *probs);

/**
* hmm_get_tprob() - return the array of transition matrices, precalculated
* to ntprob positions. The first matrix is the initial tprob matrix
* set by hmm_init() or hmm_set_tprob()
*/
double *hmm_get_tprob(hmm_t *hmm);
int hmm_get_nstates(hmm_t *hmm);

/**
* hmm_set_tprob_func() - custom setter of transition probabilities
*/
Expand All @@ -82,17 +77,26 @@ void hmm_set_tprob_func(hmm_t *hmm, set_tprob_f set_tprob, void *data);
*/
void hmm_run_viterbi(hmm_t *hmm, int nsites, double *eprob, uint32_t *sites);

/**
* hmm_get_viterbi_path() - the viterbi path: state at ith site is the
* (nstates*isite)-th element
*/
uint8_t *hmm_get_viterbi_path(hmm_t *hmm);

/**
* hmm_run_fwd_bwd() - run the forward-backward algorithm
* @nsites: number of sites
* @eprob: emission probabilities for each site and state (nsites x nstates)
* @sites: list of positions
*
* When done, hmm->fwd[] contains the calculated fwd*bwd probabilities. The
* probability of i-th state at j-th site can be accessed as fwd[j*nstates+i].
*/
void hmm_run_fwd_bwd(hmm_t *hmm, int nsites, double *eprob, uint32_t *sites);

/**
* hmm_get_fwd_bwd_prob() - the probability of i-th state at j-th site can
* be accessed as fwd_bwd[j*nstates+i].
*/
double *hmm_get_fwd_bwd_prob(hmm_t *hmm);

/**
* hmm_run_baum_welch() - run one iteration of Baum-Welch algorithm
* @nsites: number of sites
Expand All @@ -104,6 +108,7 @@ void hmm_run_fwd_bwd(hmm_t *hmm, int nsites, double *eprob, uint32_t *sites);
* are not updated.
*/
void hmm_run_baum_welch(hmm_t *hmm, int nsites, double *eprob, uint32_t *sites);

void hmm_destroy(hmm_t *hmm);

#endif
Expand Down
Loading