illumina vartiant calling failed #6

Mahmoudbassuoni · 2022-04-26T10:23:22Z

Hi, @anands-repo
I am trying to run a variant calling using the illumina model but I am finally getting files with not even one call, I don't know what's wrong. I am using the same code I used before but for another bam file, every time it ends up with this message.
Computers / CPU cores / Max jobs to run 1:local / 16 / 1 ETA: 0s Left: 0 AVG: 0.00s 0 INFO:root:Completed runs, checking log files INFO:root:All commands completed correctly, preparing vcf Search string /tmp/vcftemp/*expert0.vcf Found 0 files Running command ['vcf-sort', '/home/bassyouni/Documents/VC/hello/results.ngs.tmp.vcf'] sort -k1,1d -k2,2n Search string /tmp/vcftemp/*expert1.vcf Found 0 files Running command ['vcf-sort', '/home/bassyouni/Documents/VC/hello/results.tgs.tmp.vcf'] sort -k1,1d -k2,2n Search string /tmp/vcftemp/*expert2.vcf Found 0 files Running command ['vcf-sort', '/home/bassyouni/Documents/VC/hello/results.ngs_tgs.tmp.vcf'] sort -k1,1d -k2,2n Search string /tmp/vcftemp/*best.vcf Found 0 files Running command ['vcf-sort', '/home/bassyouni/Documents/VC/hello/results.best.tmp.vcf'] sort -k1,1d -k2,2n Search string /tmp/vcftemp/*mean.vcf Found 0 files Running command ['vcf-sort', '/home/bassyouni/Documents/VC/hello/results.mean.tmp.vcf'] sort -k1,1d -k2,2n Choice histogram = {0: 0, 1: 0, 2: 0}

The text was updated successfully, but these errors were encountered:

anands-repo · 2022-04-26T15:58:25Z

Received a similar comment from another user. The issue in that case was some of the paths weren't correctly accessible inside the container. Please check that all paths are accessible inside the container: including BAM file(s), reference fasta, and Neural Network model paths.

Beside this, suggest running on a single chromsome to make sure that case works.

Mahmoudbassuoni · 2022-04-27T10:46:51Z

Hi @anands-repo ,
I decided to go out of the container and ran it again with single chromosomes but still the same thing. I ran it before with different dataset and it was fine not sure what could be wrong.

anands-repo · 2022-04-28T19:53:27Z

I pushed a new repo under branch repo. It is under development, but I have tested it for Illumina reads for individual chromosomes and it works. It goes with the docker image oddjobs/hello_deps. This branch and docker image have the following features which may help with debugging issues of the type you are facing:

New docker image that is singularity friendly (directly convert to singularity without having to hack the image)
Better logging to convey exactly what is going on (no GNU parallel, use of progress bars, much more concise and readable messages)
Better error and exception handling - avoiding fails which do not reveal what exactly is the problem

Algorithms and models are the same as before

PS: workdir will need cleanup before each rerun

Mahmoudbassuoni · 2022-04-30T08:31:12Z

Thanks @anands-repo , will give it a try and see what happens....

maryawood · 2022-08-26T20:16:40Z

Hello,

I'm running into a similar problem of having VCF results without any calls in them. The program looks to be proceeding normally when I run it inside the docker container:

2022-08-26 19:41:50,599 INFO:Launching hotspot detection for chromosome chr16
Hotspots, chromosome chr16: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 501/501 [03:50<00:00,  2.17it/s]
2022-08-26 19:45:41,115 INFO:Combining all hotspots and sharding
2022-08-26 19:45:42,965 INFO:Completed sharding, creating caller commands for running NN
2022-08-26 19:45:42,966 INFO:Created call commands for chromosome chr16
2022-08-26 19:45:42,966 INFO:Finished hotspot generation for all chromosomes. Launching all caller commands.
Caller progress: 0it [00:00, ?it/s]
2022-08-26 19:45:42,967 INFO:Completed runs, checking log files
2022-08-26 19:45:42,967 INFO:Completed running all caller commands correctly, preparing vcf
VCF prep: 0it [00:00, ?it/s]
sort -k1,1d -k2,2n
2022-08-26 19:45:43,010 INFO:Completed runs. Results in local/results.output.vcf

But the VCF file does not contain any calls. I've checked that the BAM is readable inside the container, so I don't think that's the problem. I've also tried to install the software outside of the docker container to be sure, but I'm running into many install issues - would it be possible to make the Dockerfile available so I could see the necessary requirements for installation?

anands-repo · 2022-08-27T21:38:45Z

Hi,

Please find the information below

Dockerfile:

FROM ubuntu:bionic

# Initial installations
RUN apt-get -y update && apt-get -y install wget build-essential vim tmux vcftools bedtools

# Path where everything happens
WORKDIR /opt

# Install miniconda3
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.2-Linux-x86_64.sh
RUN bash Miniconda3-py37_4.8.2-Linux-x86_64.sh -b -p /opt/miniconda3
ENV PATH=/opt/miniconda3/bin:$PATH

# Install boost with boost::python
COPY user-config.jam /root/boost_1_69_0/tools/build/src/user-config.jam
RUN wget https://boostorg.jfrog.io/artifactory/main/release/1.69.0/source/boost_1_69_0.tar.gz && \
        tar xvfz boost_1_69_0.tar.gz && cd boost_1_69_0 && ./bootstrap.sh --prefix=/opt/boost && \
        ./b2 install && rm -rf boost_1_69_0*

# Install conda packages
RUN conda install pytorch==1.2.0 torchvision==0.4.0 -c pytorch && \
        conda install -y -c bioconda pysam pybedtools && \
        pip install pyvcf scikit-bio && \
        conda install -y biopython cmake h5py intervaltree more-itertools scikit-learn matplotlib tqdm

# Additional environment settings (weird that it doesn't work by itself)
RUN ldconfig
ENV LD_LIBRARY_PATH=/opt/boost/lib:$LD_LIBRARY_PATH

# Cleanup
RUN rm -rf /tmp/*

The Dockerfile uses the following user-config.jam file:

using python
    : 3.7
    : /opt/miniconda3/bin/python
    : /opt/miniconda3/include/python3.7m
;

anands-repo · 2022-08-27T21:58:15Z

Regarding the issue - from the logs, it looks like the initial heuristic that filters reads, and shortlists putative loci to run the Neural Network on, doesn't return anything. The following script can be run standalone to see what's happening:

python \
    python/HotspotDetectorDVFiltered.py \
        --bam BAMFILE \
        --ref REFERENCE \
        --region chr16 \
        --mapq_threshold 5 \
        --output /tmp/hotspots.json > /tmp/hotspots_logs.log 2>&1

maryawood · 2022-08-29T22:49:58Z

Thanks for your response! I'm still trying to get a local install working for further testing based on your Dockerfile, but I'm running into some issues with conda permissions based on the install location. In the meantime, I tried out the hotspot command you shared from within the available docker container. Even if I run it using the --debug flag, all I get in the log file is this:

2022-08-29 22:12:32,644 Started script
2022-08-29 22:16:40,987 Completed running the script

Should I be seeing additional messages?

anands-repo · 2022-08-30T04:53:26Z

Yes if --debug flag is used there will be a lot of text displayed (or in this case written to the log files). Did you notice any output in the /tmp/hotspots.json file, or is it still empty?

If it is empty, my best guess is that the reads are being filtered off at the quality control stage and not being used in any analysis. These are the read quality control filters:

    usable    = True;
    usable    = usable and not alignment.is_unmapped;  # Remove unmapped reads
    usable    = usable and not (alignment.is_secondary or alignment.is_supplementary);  # Remove secondary or supplementary alignments
    usable    = usable and not alignment.is_duplicate;  # Remove duplicate reads
    usable    = usable and (not alignment.is_paired or alignment.is_proper_pair);  # Remove improper pairs in the case of paired alignments
    usable    = usable and (alignment.mapping_quality > 0);  # Remove 0-mapping quality reads

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

illumina vartiant calling failed #6

illumina vartiant calling failed #6

Mahmoudbassuoni commented Apr 26, 2022 •

edited

Loading

anands-repo commented Apr 26, 2022

Mahmoudbassuoni commented Apr 27, 2022

anands-repo commented Apr 28, 2022 •

edited

Loading

Mahmoudbassuoni commented Apr 30, 2022

maryawood commented Aug 26, 2022

anands-repo commented Aug 27, 2022

anands-repo commented Aug 27, 2022

maryawood commented Aug 29, 2022

anands-repo commented Aug 30, 2022

illumina vartiant calling failed #6

illumina vartiant calling failed #6

Comments

Mahmoudbassuoni commented Apr 26, 2022 • edited Loading

anands-repo commented Apr 26, 2022

Mahmoudbassuoni commented Apr 27, 2022

anands-repo commented Apr 28, 2022 • edited Loading

Mahmoudbassuoni commented Apr 30, 2022

maryawood commented Aug 26, 2022

anands-repo commented Aug 27, 2022

anands-repo commented Aug 27, 2022

maryawood commented Aug 29, 2022

anands-repo commented Aug 30, 2022

Mahmoudbassuoni commented Apr 26, 2022 •

edited

Loading

anands-repo commented Apr 28, 2022 •

edited

Loading