Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoFreq 2.4 indelqual filter error #89

Closed
sleyn opened this issue Feb 19, 2020 · 65 comments
Closed

LoFreq 2.4 indelqual filter error #89

sleyn opened this issue Feb 19, 2020 · 65 comments

Comments

@sleyn
Copy link

sleyn commented Feb 19, 2020

Hi,

It is the continuation of my previous BI/DB issue.

I've tried to do lofreq indelqual with the new 2.4 version on my BAM file. The subsequent lofreq call-parallel failed saying something about error on the filtering command (sorry - I did not copy the exact error message). With --no-default-filter parameter I found that in VCF files indels had only HRUN field, but no DP,AF,SB and DP4.

I've tried to look in BAM file, but for me it looked good - BI/BD tags were added.

With the LoFreq 2.3.1 everything works perfect. So I'll stay with this version for now.

@andreas-wilm
Copy link
Contributor

That sounds serious. Can you reproduce this and share logging messages and possibly even a minimal working example / slice of the BAM file? Thanks

@sleyn
Copy link
Author

sleyn commented Feb 20, 2020

I'll try to reproduce this error and post an update.

@sleyn
Copy link
Author

sleyn commented Feb 21, 2020

I've re-installed LoFreq 2.1.4 to the separate conda environment.

The error message:

INFO [2020-02-21 12:08:09,710]: Using 16 threads with following basic args: lofreq call --call-indels -f ./Pseudomonas_aeruginosa_ATCC27853.fna Sample_r_bqsr.bam

INFO [2020-02-21 12:08:09,724]: Adding 35 commands to mp-pool
Number of substitution tests performed: 14967066
Number of indel tests performed: 59983
INFO [2020-02-21 12:49:46,098]: Executing lofreq filter -i /tmp/lofreq2_call_paralleltg03qywl/concat.vcf.gz -o variants.lofreq.vcf --snvqual-thresh 92 --indelqual-thresh 68

CRITICAL [2020-02-21 12:49:46,614]: Final filtering command failed. Commmand was lofreq filter -i /tmp/lofreq2_call_paralleltg03qywl/concat.vcf.gz -o variants.lofreq.vcf --snvqual-thresh 92 --indelqual-thresh 68

If I call variants without filtering then my VCF look like:

##fileformat=VCFv4.0
##fileDate=20200221
##source=lofreq call --call-indels -f ./Pseudomonas_aeruginosa_ATCC27853.fna --no-default-filter --no-default-filter -r CP011857:1-213537 -o /tmp/lofreq2_call_parallelyh0kjl50/0.vcf.gz Sample.bam 
##reference=/home/sleyn/Tasks/01_GEAR/Genomes/Pseudomonas_aeruginosa_ATCC27853/Pseudomonas_aeruginosa_ATCC27853.fna
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw Depth">
##INFO=<ID=AF,Number=1,Type=Float,Description="Allele Frequency">
##INFO=<ID=SB,Number=1,Type=Integer,Description="Phred-scaled strand bias at this position">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Counts for ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=CONSVAR,Number=0,Type=Flag,Description="Indicates that the variant is a consensus variant (as opposed to a low frequency variant).">
##INFO=<ID=HRUN,Number=1,Type=Integer,Description="Homopolymer length to the right of report indel position">
##FILTER=<ID=min_snvqual_20,Description="Minimum SNV Quality (Phred) 20">
##FILTER=<ID=min_indelqual_20,Description="Minimum Indel Quality (Phred) 20">
##FILTER=<ID=min_snvqual_71,Description="Minimum SNV Quality (Phred) 71">
##FILTER=<ID=min_indelqual_47,Description="Minimum Indel Quality (Phred) 47">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
CP011857	1003167	.	TG	T	71	PASS	;HRUN=2
CP011857	1005136	.	TC	T	50	PASS	;HRUN=2
CP011857	1005682	.	TCG	T	74	PASS	;HRUN=1
CP011857	1010538	.	TG	T	50	PASS	;HRUN=4
CP011857	1012046	.	T	TG	50	PASS	;HRUN=2
CP011857	1012526	.	AGC	A	51	PASS	;HRUN=1
CP011857	1015526	.	TG	T	70	PASS	;HRUN=1
CP011857	1017223	.	TG	T	74	PASS	;HRUN=2
CP011857	1017935	.	TG	T	51	PASS	;HRUN=2
CP011857	1018601	.	TG	T	109	PASS	;HRUN=2
CP011857	1020331	.	TC	T	73	PASS	;HRUN=2
CP011857	1022545	.	GC	G	50	PASS	;HRUN=2
CP011857	1027603	.	TG	T	72	PASS	;HRUN=3
CP011857	1028842	.	TG	T	66	PASS	;HRUN=1
CP011857	1029838	.	T	G	76	PASS	DP=486;AF=0.043210;SB=132;DP4=362,100,0,21
CP011857	1029847	.	A	G	116	PASS	DP=507;AF=0.053254;SB=173;DP4=380,98,0,27
CP011857	1029853	.	T	G	183	PASS	DP=573;AF=0.085515;SB=176;DP4=392,130,6,43
CP011857	1032599	.	TG	T	76	PASS	;HRUN=2
CP011857	1034011	.	TG	T	75	PASS	;HRUN=3
CP011857	1037298	.	TG	T	61	PASS	;HRUN=4
CP011857	1038355	.	GC	G	70	PASS	;HRUN=2
CP011857	1038508	.	AC	A	51	PASS	;HRUN=3
CP011857	1038607	.	C	T	80	PASS	DP=892;AF=0.013453;SB=0;DP4=434,442,6,6
CP011857	1040063	.	TA	T	70	PASS	;HRUN=1
CP011857	1040182	.	GATC	G	118	PASS	;HRUN=1
CP011857	1041151	.	TG	T	53	PASS	;HRUN=2
CP011857	1041193	.	AC	A	97	PASS	;HRUN=2
CP011857	1042395	.	TG	T	49	PASS	;HRUN=4
CP011857	1042470	.	TG	T	53	PASS	;HRUN=2
CP011857	1042909	.	TG	T	72	PASS	;HRUN=2
CP011857	1044038	.	CT	C	73	PASS	;HRUN=1
CP011857	1045431	.	GC	G	51	PASS	;HRUN=2

The toy example - BAM file with 50000nt slice from the alignment:
https://16515-my.sharepoint.com/:f:/g/personal/sleyn_sbpdiscovery_org/EiyNJNV7b_9JgVnSdpgiDTQBhzEewOzFHmBOxtJCr5s3LQ?e=YudWhO

The VCF above was produced from it. The error was produced from the full alignment.

@andreas-wilm
Copy link
Contributor

Oh something is very wrong. I will be able to look at this end of next week

@andreas-wilm
Copy link
Contributor

Two quick questions:

  1. how did you install LoFreq? Via Conda or from source?
  2. what's the result if you run lofreq call --call-indels -f Pseudomonas_aeruginosa_ATCC27853.fna -r CP011857:1003167-1003167 Sample.bam?

And thanks for sharing those files!

@sleyn
Copy link
Author

sleyn commented Feb 27, 2020

  1. I've installed lofreq using Conda conda install -c bioconda lofreq
  2. The command immediately outputs an error:
ERROR(lofreq_call.c|main_call:1529): The following command failed: lofreq filter -i /tmp/lofreq2-call-dyn-bonf.vtELNf -o - --snvqual-thresh 24 --indelqual-thresh 23

@andreas-wilm
Copy link
Contributor

Hm....I can't reproduce this neither using the compiled version nor the conda version. But there is another user with the same problem. Could you help me a bit more please?

What is the output of:

  • lofreq version
  • which lofreq
  • ls $(which lofreq)
  • lofreq call --call-indels -f Pseudomonas_aeruginosa_ATCC27853.fna -r CP011857:1003167-1003167 Sample.bam --debug
  • strace lofreq 2>&1 | grep -c openat

Many thanks!

@sleyn
Copy link
Author

sleyn commented Feb 27, 2020

Results:

lofreq version
version: 2.1.4
commit: unknown
build-date: Jan 15 2020
which lofreq
~/miniconda3/envs/lofreq24/bin/lofreq
ls $(which lofreq)
/home/sleyn/miniconda3/envs/lofreq24/bin/lofreq
lofreq call --call-indels -f Pseudomonas_aeruginosa_ATCC27853.fna -r CP011857:1003167-1003167 Sample.bam --debug
mplp options
  max_mq       = 255
  min_mq       = 0
  flag         = 432
  flag & MPLP_NO_ORPHAN  = 1
  flag & MPLP_BAQ        = 1
  flag & MPLP_REDO_BAQ   = 0
  flag & MPLP_EXT_BAQ    = 1
  flag & MPLP_IDAQ       = 1
  flag & MPLP_REDO_IDAQ = 0
  flag & MPLP_USE_SQ     = 0
  flag & MPLP_ILLUMINA13 = 0
  max_depth    = 1000000
  min_plp_bq   = 3
  min_plp_idq  = 0
  def_nm_q     = -1
  reg          = CP011857:1003167-1003167
  fa           = 0x565080930010
  bed          = (nil)
  cmdline      = lofreq call --call-indels -f Pseudomonas_aeruginosa_ATCC27853.fna -r CP011857:1003167-1003167 --debug Sample.bam 
snvcall options
  min_bq         = 6
  min_alt_bq     = 6
  def_alt_bq     = 0
  min_jq         = 0
  min_alt_jq     = 0
  def_alt_jq     = 0
  min_cov        = 1
  bonf_subst       = 1  (might get recalculated)
  bonf_indel     = 1  (might get recalculated)
  bonf_dynamic   = 1
  sig            = 0.010000
  flag & VARCALL_USE_BAQ     = 1
  flag & VARCALL_USE_MQ      = 1
  flag & VARCALL_USE_SQ      = 0
  flag & VARCALL_USE_IDAQ    = 1
  only_indels    = 0
  no_indels      = 0
DEBUG(plp.c|mpileup): BAM header initialized
DEBUG(plp.c|mpileup): BAM header target #0: name=CP011857 len=6833187
DEBUG(plp.c|mpileup): Starting pileup loop
DEBUG(plp.c|compile_plp_col): Processing CP011857:1003167
CP011857	1003167	T	T	counts:rv/fw A:0/0 C:0/2 G:0/0 T:416/440 N:0/0 heads:1 tails:6 ins:0 del:4 hrun=2
DEBUG(lofreq_call.c|call_alt_del): CP011857 1003167: passing down 858 quals with noncons_del_counts(4, 0, 0) to snpcaller()
DEBUG(lofreq_call.c|call_alt_del): Low freq deletion: CP011857 1003167 TG>T pv-prob:6.40018e-08;pv-qual:71
DEBUG(lofreq_call.c|call_snvs): CP011857 1003167: passing down 858 quals with noncons_counts (0, 2, 0) to snpcaller(num_snv_tests=3 conf->bonf=3, conf->sig=0.010000)
ERROR(lofreq_call.c|main_call:1529): The following command failed: lofreq filter -i /tmp/lofreq2-call-dyn-bonf.V0eDSv -o - --snvqual-thresh 24 --indelqual-thresh 23
strace lofreq 2>&1 | grep -c openat
0

@andreas-wilm
Copy link
Contributor

Thanks! Could you post the contents of /tmp/lofreq2-call-dyn-bonf.V0eDSv please?

@sleyn
Copy link
Author

sleyn commented Feb 27, 2020

##fileformat=VCFv4.0
##fileDate=20200226
##source=lofreq call --call-indels -f Pseudomonas_aeruginosa_ATCC27853.fna -r CP011857:1003167-1003167 --debug Sample.bam
##reference=Pseudomonas_aeruginosa_ATCC27853.fna
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw Depth">
##INFO=<ID=AF,Number=1,Type=Float,Description="Allele Frequency">
##INFO=<ID=SB,Number=1,Type=Integer,Description="Phred-scaled strand bias at this position">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Counts for ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=CONSVAR,Number=0,Type=Flag,Description="Indicates that the variant is a consensus variant (as opposed to a low frequency variant).">
##INFO=<ID=HRUN,Number=1,Type=Integer,Description="Homopolymer length to the right of report indel position">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
CP011857        1003167 .       TG      T       71      .       ;HRUN=2

@andreas-wilm
Copy link
Contributor

Weird (but again: confirmed by one other user)...Which platform/distribution are you on?

@sleyn
Copy link
Author

sleyn commented Feb 28, 2020

I'm using CentOS 7.

@andreas-wilm
Copy link
Contributor

Sorry, I got to admit I'm still totally in the dark here. Is you setup unusual in any way? What does ldd $(which lofreq) return?

You don't have this running on a cloud instance to which you could grant me temporary access, do you?

Thanks!

@sleyn
Copy link
Author

sleyn commented Mar 22, 2020

I'm not sure if it is unusual. It is quite old lab server (started 4 years ago) with CentOS 7 updated to the last version. I think it has a lot of things historically happened here.

The result of the command is:

        linux-vdso.so.1 =>  (0x00007fff40f5f000)
        libhts.so.2 => /home/sleyn/miniconda3/envs/lofreq24/bin/../lib/libhts.so.2 (0x00007fc4aab4c000)
        libz.so.1 => /home/sleyn/miniconda3/envs/lofreq24/bin/../lib/libz.so.1 (0x00007fc4aafd5000)
        libm.so.6 => /lib64/libm.so.6 (0x00007fc4aa84a000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fc4aa62e000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fc4aa260000)
        libdeflate.so => /home/sleyn/miniconda3/envs/lofreq24/bin/../lib/./libdeflate.so (0x00007fc4aafa4000)
        liblzma.so.5 => /home/sleyn/miniconda3/envs/lofreq24/bin/../lib/./liblzma.so.5 (0x00007fc4aa03a000)
        libbz2.so.1.0 => /home/sleyn/miniconda3/envs/lofreq24/bin/../lib/./libbz2.so.1.0 (0x00007fc4aaf8f000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007fc4a9e36000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fc4aae10000)
        librt.so.1 => /lib64/librt.so.1 (0x00007fc4a9c2e000)

And yes - I don't have a cloud version of the machine.

@sposadac
Copy link

Hi all,
I think I am having a very similar issue. Recently switched to the newer version and I am getting this error:
ERROR(lofreq_call.c|main_call:1529): The following command failed: lofreq filter -i /tmp/lofreq2-call-dyn-bonf.VNe6vz -o samples/n8-uniform/20200228/variants/SNVs/snvs_1.vcf --snvqual-thresh 58 --indelqual-thresh 47

lofreq version
version: 2.1.4
commit: unknown
build-date: Apr  5 2020

sposadac added a commit to sposadac/V-pipe that referenced this issue Apr 19, 2020
@andreas-wilm
Copy link
Contributor

Thanks for reporting. We now have a handful of users with the same problem. Unfortunately I cannot reproduce this. Could you tell me as much as possible about your system, e.g. OS name and version, LoFreq installation method (source of conda) etc? Many thanks

@smrey
Copy link

smrey commented Apr 21, 2020

I also have this issue on v2.1.4. (Centos 7).
Here is the environment.yml file:

name: lofreq_env
channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - python=3
  - biopython=1.76
  - bwa=0.7.17=pl5.22.0_2
  - matplotlib=3.1.3
  - pandas=1.0.3
  - samtools=1.9
  - trim-galore=0.6.5
  - bcftools=1.9
  - bedtools=2.29.2
  - lofreq=2.1.4
  - pyvcf=0.6.8
  - pysam=0.15.0.1

Just following up on this to say that I created a Docker container building the same version (2.1.4) from source in a centos7 environment and this does not have the bug, so it seems that there may be something about the conda version that is not working correctly.

@andreas-wilm
Copy link
Contributor

Thanks for confirming! Did you by any chance try to install from source as well?

@sleyn, could you be so kind and post your conda env details here?

@sleyn
Copy link
Author

sleyn commented Apr 24, 2020

@andreas-wilm sure. My environment for lofreq24.

channels:
  - bioconda
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - bzip2=1.0.8=h7b6447c_0
  - ca-certificates=2020.1.1=0
  - certifi=2019.11.28=py37_0
  - curl=7.68.0=hbc83047_0
  - htslib=1.9=h4da6232_3
  - krb5=1.17.1=h173b8e3_0
  - ld_impl_linux-64=2.33.1=h53a641e_7
  - libcurl=7.68.0=h20c2e04_0
  - libdeflate=1.2=h516909a_1
  - libedit=3.1.20181209=hc058e9b_0
  - libffi=3.2.1=hd88cf55_4
  - libgcc-ng=9.1.0=hdf63c60_0
  - libssh2=1.8.2=h1ba5d50_0
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - lofreq=2.1.4=py37hc3dfafe_0
  - ncurses=6.1=he6710b0_1
  - openssl=1.1.1d=h7b6447c_4
  - pip=20.0.2=py37_1
  - python=3.7.6=h0371630_2
  - readline=7.0=h7b6447c_5
  - setuptools=45.2.0=py37_0
  - sqlite=3.31.1=h7b6447c_0
  - tk=8.6.8=hbc83047_0
  - wheel=0.34.2=py37_0
  - xz=5.2.4=h14c3975_4
  - zlib=1.2.11=h7b6447c_3

@andreas-wilm
Copy link
Contributor

Thanks @sleyn. Could I ask you to do an experiment? Could update htslib in your LoFreq conda env to version 1.10 and see whether you still experience problems?

@tonbar, I noticed you don't have htslib in your LoFreq conda env. That's surprising. Anyway, could you try to install htslib version 1.10 in there and rerun LoFreq to see whether the problem persists?

conda install -n yourenvname htslib=1.10

Many thanks to both of you

@sleyn
Copy link
Author

sleyn commented Apr 24, 2020

For some reason I can't install 1.10 in conda...

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions

Package libgcc-ng conflicts for:
htslib=1.10 -> libgcc-ng[version='>=7.3.0']
htslib=1.10 -> xz[version='>=5.2.4,<5.3.0a0'] -> libgcc-ng[version='>=7.2.0']

Package xz conflicts for:
python=3.7 -> xz[version='>=5.2.4,<6.0a0']
htslib=1.10 -> xz[version='>=5.2.4,<5.3.0a0']

Package openssl conflicts for:
htslib=1.10 -> libcurl[version='>=7.64.1,<8.0a0'] -> openssl[version='>=1.1.1b,<1.1.2a|>=1.1.1c,<1.1.2a|>=1.1.1d,<1.1.2a']
htslib=1.10 -> openssl[version='>=1.1.1a,<1.1.2a']

Happens even in the clean environment. I did not find a solution except one advice to downgrade conda, that I don't like to do.

@sposadac
Copy link

I used conda to install LoFreq, and I am running it on a CentOS 7.
Here is the environement:

channels:
  - bioconda
  - conda-forge
  - defaults
  - r
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=1_llvm
  - bcftools=1.9=h68d8f2e_9
  - bzip2=1.0.8=h516909a_2
  - ca-certificates=2020.4.5.1=hecc5488_0
  - certifi=2020.4.5.1=py37hc8dfbb8_0
  - curl=7.69.1=h33f0ec9_0
  - gsl=2.5=h294904e_1
  - htslib=1.9=h4da6232_3
  - krb5=1.17.1=h2fd8d38_0
  - ld_impl_linux-64=2.34=h53a641e_0
  - libblas=3.8.0=16_openblas
  - libcblas=3.8.0=16_openblas
  - libcurl=7.69.1=hf7181ac_0
  - libdeflate=1.5=h516909a_0
  - libedit=3.1.20170329=hf8c457e_1001
  - libffi=3.2.1=he1b5a44_1007
  - libgcc-ng=9.2.0=h24d8f2e_2
  - libgfortran-ng=7.3.0=hdf63c60_5
  - libopenblas=0.3.9=h5ec1e0e_0
  - libssh2=1.8.2=h22169c7_2
  - libstdcxx-ng=9.2.0=hdf63c60_2
  - llvm-openmp=10.0.0=hc9558a2_0
  - lofreq=2.1.4=py37hc3dfafe_1
  - ncurses=6.1=hf484d3e_1002
  - openssl=1.1.1f=h516909a_0
  - perl=5.26.2=h516909a_1006
  - pip=20.0.2=py_2
  - python=3.7.6=h8356626_5_cpython
  - python_abi=3.7=1_cp37m
  - readline=8.0=hf8c457e_0
  - samtools=1.9=h10a08f8_12
  - setuptools=46.1.3=py37hc8dfbb8_0
  - sqlite=3.30.1=hcee41ef_0
  - tk=8.6.10=hed695b0_0
  - wheel=0.34.2=py_1
  - xz=5.2.5=h516909a_0
  - zlib=1.2.11=h516909a_1006

Trying to install htslib 1.10, I get the following error:

Package htslib conflicts for:
lofreq=2.1.4 -> htslib[version='>=1.9,<1.10.0a0']
samtools -> htslib[version='>=1.10,<1.11.0a0|>=1.10.2,<1.11.0a0|>=1.9,<1.10.0a0']
bcftools -> htslib[version='>=1.10,<1.11.0a0|>=1.10.1,<1.11.0a0|>=1.9,<1.10.0a0']
htslib=1.10

It seems to me the LoFreq bioconda recipe needs to be updated to allow a higher version of htslib. @andreas-wilm What do you think?

@smrey
Copy link

smrey commented Apr 24, 2020

Apologies, in my haste I shared the incorrect stripped down env that would allow installation. Here is what gets installed.

name: lofreq_env
channels:
  - bioconda
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - bwa=0.7.17=hed695b0_7
  - bz2file=0.98=py37_1
  - bzip2=1.0.8=h7b6447c_0
  - ca-certificates=2020.1.1=0
  - certifi=2019.11.28=py37_0
  - curl=7.68.0=hbc83047_0
  - cutadapt=1.18=py37h14c3975_1
  - fastqc=0.11.9=0
  - font-ttf-dejavu-sans-mono=2.37=h6964260_0
  - fontconfig=2.13.0=h9420a91_0
  - freetype=2.9.1=h8a8886c_1
  - htslib=1.9=ha228f0b_7
  - icu=58.2=h9c2bf20_1
  - ivar=1.0=hddee448_1
  - krb5=1.17.1=h173b8e3_0
  - ld_impl_linux-64=2.33.1=h53a641e_7
  - libcurl=7.68.0=h20c2e04_0
  - libdeflate=1.0=h14c3975_1
  - libedit=3.1.20181209=hc058e9b_0
  - libffi=3.2.1=hd88cf55_4
  - libgcc-ng=9.1.0=hdf63c60_0
  - libpng=1.6.37=hbc83047_0
  - libssh2=1.9.0=h1ba5d50_1
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - libuuid=1.0.3=h1bed415_2
  - libxml2=2.9.9=hea5a465_1
  - ncurses=6.2=he6710b0_0
  - openjdk=8.0.152=h7b6447c_3
  - openssl=1.1.1e=h7b6447c_0
  - perl=5.26.2=h14c3975_0
  - pigz=2.4=h84994c4_0
  - pip=20.0.2=py37_1
  - pysam=0.15.3=py37hda2845c_1
  - python=3.7.6=h0371630_2
  - readline=7.0=h7b6447c_5
  - samtools=1.6=h244ad75_5
  - setuptools=46.0.0=py37_0
  - sqlite=3.31.1=h7b6447c_0
  - tk=8.6.8=hbc83047_0
  - trim-galore=0.6.5=0
  - wheel=0.34.2=py37_0
  - xopen=0.7.3=py_0
  - xz=5.2.4=h14c3975_4
  - zlib=1.2.11=h7b6447c_3

@andreas-wilm
Copy link
Contributor

andreas-wilm commented Apr 24, 2020 via email

@smrey
Copy link

smrey commented Apr 24, 2020

When I was trying out a few things, I used both htslib 1.10 and 1.9 in the DockerFile I created and neither had this bug. (Dockerfile building from source).

@andreas-wilm
Copy link
Contributor

andreas-wilm commented Apr 25, 2020 via email

@andreas-wilm
Copy link
Contributor

Dear all, a new LoFreq recipe appeared on Bioconda. It seems that with this new recipe it's now possible to upgrade the htslib version. Please install LoFreq version 2.1.4 and upgrade htslib to 1.10 if it's not immediately pulled in (e.g. conda install -n lofreq-2.1.4 htslib=1.10, if your environment is called lofreq-2.1.4). I would hope that this might address your problem.

Andreas

@rspreafico-vir
Copy link

Same issue here. Funny thing is that running lofreq from a conda env on Mac with htslib v1.10 works fine. Running the docker container (Biocontainer) matching the latest lofreq recipe, which also installs samtools and htslib (1.10) in the same docker container, doesn't.

Biocontainer version: 2.1.4--py37ha77fa96_2 (from https://quay.io/repository/biocontainers/lofreq?tab=tags)
samtools/htslib version in Biocontainer: Version: 1.10 (using htslib 1.10.2)

@andreas-wilm
Copy link
Contributor

andreas-wilm commented May 31, 2020 via email

@bgruening
Copy link

The pain is real: with --enable-debug the OSXbuilds do not finish, so deploying the binary with this option needs more hacks and different compile arguments on different archs.

https://app.circleci.com/pipelines/github/bioconda/bioconda-recipes/27646/workflows/1e499766-8ce2-4d9e-baa2-c93fe7ca6fd4/jobs/112252/steps

@andreas-wilm
Copy link
Contributor

That's almost good news, because it gives me a handle. I'll look into this ASAP and will report back here

@andreas-wilm
Copy link
Contributor

The failing OSX build looks like a separate issues though:

Undefined symbols for architecture x86_64:
01:13:34 BIOCONDA INFO (ERR)   "_bam_calend", referenced from:
01:13:34 BIOCONDA INFO (ERR)       _count_cigar_ops in samutils.o
01:13:34 BIOCONDA INFO (ERR) ld: symbol(s) not found for architecture x86_64
01:13:34 BIOCONDA INFO (ERR) clang-9: error: linker command failed with exit code 1 (use -v to see invocation)

@bgruening
Copy link

Yes, but please note that this only occurs with --enable-debug

@andreas-wilm
Copy link
Contributor

Okay, that compilation error was easily fixed. --enable-debug activated an assert that used outdated HTSlib API. Now on to the real problem...

@bgruening
Copy link

--enable-debug is activating -O0 -pedantic not -Og. Maybe -Og helps you to find the bufferoverflow?

@andreas-wilm
Copy link
Contributor

andreas-wilm commented Jun 12, 2020

Quick observation: as noted above the faulty version segfaults, because the intermediately produced vcf files has incomplete INFO columns, but only at indel positions, e.g.

CP011857        1000142 .       TG      T       33      .       ;HRUN=2
CP011857        1000192 .       T       G       59      .     DP=695;AF=0.028777;SB=39;DP4=322,351,1,19

Hence the subsequent filter command fails. The above was found with biocontainers/lofreq:2.1.4--py37h3a01142_3

@andreas-wilm
Copy link
Contributor

Running a locally compiled version (--enable-debug) with valgrind doesn't show any problems and produces the right output.

@bgruening, sorry beginner question: what's the easiest for me to build an image that contains the faulty setup and also some debug tools, like valgrind?

@bgruening
Copy link

Not that easy, those containers are super minimal and don't have anything included. What do you need? Maybe I can build you one? Maybe you can mount in Valgrind as binary?

@andreas-wilm
Copy link
Contributor

Getting valgrind as (compatible) binary is the problem. It would obviously be awesome, if you could build an image with valgrind.

I still don't understand why my conda installed LoFreq version simply works...

@bgruening
Copy link

If its a wired buffer-overflow this can be anything and really dependent on the system, no?

@andreas-wilm
Copy link
Contributor

Depends. But without valgrind and -g -O0 compiled code it's going to be hard to find out.

Do I understand correctly, that your committed changes fix the problem? One approach would be to publish it and find the bug afterwards (if it's a LoFreq bug)

@bgruening
Copy link

I don't think it fixes it. I think it hides it behind the optimization flag. This smells like memory allocation issues.

@andreas-wilm
Copy link
Contributor

The code that's responsible for the vcf formatting hasn't changed in five years...

@bgruening
Copy link

New container with valgrind is here:

curl "https://112272-42372094-gh.circle-artifacts.com/0/tmp/artifacts/images/lofreq:2.1.4--py37h3a01142_1000.tar.gz" | gzip -dc | docker load

@bgruening
Copy link

(lofreq_bioconda) bag@bag:~/projects/code/lofreq/dist$ docker run -i -t -v /home/bag/Downloads/:/data/ --rm quay.io/biocontainers/lofreq:2.1.4--py37h3a01142_1000 lofreq call --verbose --ref /data/_test/Pseudomonas_aeruginosa_ATCC27853.fna --call-indels  -r CP011857:0-1003167   /data/_test/Sample.bam
2.1.4--py37h3a01142_1000
(lofreq_bioconda) bag@bag:~/projects/code/lofreq/dist$ docker run -i -t -v /home/bag/Downloads/:/data/ --rm quay.io/biocontainers/lofreq:2.1.4--py37h3a01142_1000 lofreq call --verbose --ref /data/_test/Pseudomonas_aeruginosa_ATCC27853.fna --call-indels  -r CP011857:0-1003167   /data/_test/Sample.bam
Alive and happily crunching away on pos 999851 of CP011857...
Executing lofreq filter -i /tmp/lofreq2-call-dyn-bonf.D0KSo5 -o - --snvqual-thresh 58 --indelqual-thresh 34
Segmentation fault (core dumped)
ERROR(lofreq_call.c|main_call:1529): The following command failed: lofreq filter -i /tmp/lofreq2-call-dyn-bonf.D0KSo5 -o - --snvqual-thresh 58 --indelqual-thresh 34

@bgruening
Copy link

New container build with:

CFLAGS="${CFLAGS} -Og -g -pedantic"
./configure --with-htslib=${PREFIX}/lib  --prefix=${PREFIX}
curl "https://112282-42372094-gh.circle-artifacts.com/0/tmp/artifacts/images/lofreq%3A2.1.4--py37h3a01142_1001.tar.gz" | gzip -dc | docker load

And it still fails:

(lofreq_bioconda) bag@bag:~/projects/code/lofreq/dist$ docker run -i -t -v /home/bag/Downloads/:/data/ --rm quay.io/biocontainers/lofreq:2.1.4--py37h3a01142_1001 lofreq call --verbose --ref /data/_test/Pseudomonas_aeruginosa_ATCC27853.fna --call-indels  -r CP011857:0-1003167   /data/_test/Sample.bam
Alive and happily crunching away on pos 999851 of CP011857...
Executing lofreq filter -i /tmp/lofreq2-call-dyn-bonf.MhDeuW -o - --snvqual-thresh 58 --indelqual-thresh 34
Segmentation fault (core dumped)
ERROR(lofreq_call.c|main_call:1529): The following command failed: lofreq filter -i /tmp/lofreq2-call-dyn-bonf.MhDeuW -o - --snvqual-thresh 58 --indelqual-thresh 34

andreas-wilm pushed a commit that referenced this issue Jun 12, 2020
- Refer to #89
- Also fixed some outdated asserts
@andreas-wilm
Copy link
Contributor

Thanks for the hard work @bgruening. I think I found the cause: ages old and shoddily written code with undefined results. That explains why the problem sometimes magically appear. It's committed as #8206db2. Any chance you can try so that I can bump the version?

Andreas

@jmarshall
Copy link
Contributor

This crash was caused by bad effects of the ;HRUN=3 shoddily written 😄 VCF, which @andreas-wilm has just fixed, I guess by altering it so that the INFO field isn't overwritten and so the final INFO field in this case will no longer start with a ;, which is invalid VCF. PR #95 also fixes the VCF parser so that reading ;HRUN=3 no longer causes a crash (and stops AD=1;SBNOT=2;HRUN=3 from being incorrectly recognised as SB=2).

@andreas-wilm
Copy link
Contributor

andreas-wilm commented Jun 12, 2020

Thanks again John. Your changes would have prevented the execution from failing. Info fields for indels in the vcf would still be truncated though. My fix (8206db2 or more exactly 1e44677) hopefully addresses this. Will merge your PR once it's confirmed, that the vcf is no longer truncated. Never managed to reproduce the problem at my end, so can't tell whether the fix actually works and therefore rely on someone's help. Pretty please @bgruening

@bgruening
Copy link

On it ;)

@bgruening
Copy link

🥇 seems to work! If you release a new version I will update the conda package and container.

@andreas-wilm
Copy link
Contributor

Super cool. Thanks so much @bgruening. And many thanks to everyone else as well! God, I love this stuff

I should be able to push a new version over the weekend.

@andreas-wilm
Copy link
Contributor

Dear all, I just pushed 2.1.5. As far as we know this should address the issue. Thanks so much everyone for your help!

@bgruening
Copy link

Thanks! New bioconda package is out. You might want to update your blog :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants