Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

illumina vartiant calling failed #6

Open
Mahmoudbassuoni opened this issue Apr 26, 2022 · 9 comments
Open

illumina vartiant calling failed #6

Mahmoudbassuoni opened this issue Apr 26, 2022 · 9 comments

Comments

@Mahmoudbassuoni
Copy link

Mahmoudbassuoni commented Apr 26, 2022

Hi, @anands-repo
I am trying to run a variant calling using the illumina model but I am finally getting files with not even one call, I don't know what's wrong. I am using the same code I used before but for another bam file, every time it ends up with this message.
Computers / CPU cores / Max jobs to run 1:local / 16 / 1 ETA: 0s Left: 0 AVG: 0.00s 0 INFO:root:Completed runs, checking log files INFO:root:All commands completed correctly, preparing vcf Search string /tmp/vcftemp/*expert0.vcf Found 0 files Running command ['vcf-sort', '/home/bassyouni/Documents/VC/hello/results.ngs.tmp.vcf'] sort -k1,1d -k2,2n Search string /tmp/vcftemp/*expert1.vcf Found 0 files Running command ['vcf-sort', '/home/bassyouni/Documents/VC/hello/results.tgs.tmp.vcf'] sort -k1,1d -k2,2n Search string /tmp/vcftemp/*expert2.vcf Found 0 files Running command ['vcf-sort', '/home/bassyouni/Documents/VC/hello/results.ngs_tgs.tmp.vcf'] sort -k1,1d -k2,2n Search string /tmp/vcftemp/*best.vcf Found 0 files Running command ['vcf-sort', '/home/bassyouni/Documents/VC/hello/results.best.tmp.vcf'] sort -k1,1d -k2,2n Search string /tmp/vcftemp/*mean.vcf Found 0 files Running command ['vcf-sort', '/home/bassyouni/Documents/VC/hello/results.mean.tmp.vcf'] sort -k1,1d -k2,2n Choice histogram = {0: 0, 1: 0, 2: 0}

@anands-repo
Copy link
Owner

Received a similar comment from another user. The issue in that case was some of the paths weren't correctly accessible inside the container. Please check that all paths are accessible inside the container: including BAM file(s), reference fasta, and Neural Network model paths.

Beside this, suggest running on a single chromsome to make sure that case works.

@Mahmoudbassuoni
Copy link
Author

Hi @anands-repo ,
I decided to go out of the container and ran it again with single chromosomes but still the same thing. I ran it before with different dataset and it was fine not sure what could be wrong.

@anands-repo
Copy link
Owner

anands-repo commented Apr 28, 2022

I pushed a new repo under branch repo. It is under development, but I have tested it for Illumina reads for individual chromosomes and it works. It goes with the docker image oddjobs/hello_deps. This branch and docker image have the following features which may help with debugging issues of the type you are facing:

  1. New docker image that is singularity friendly (directly convert to singularity without having to hack the image)
  2. Better logging to convey exactly what is going on (no GNU parallel, use of progress bars, much more concise and readable messages)
  3. Better error and exception handling - avoiding fails which do not reveal what exactly is the problem

Algorithms and models are the same as before

PS: workdir will need cleanup before each rerun

@Mahmoudbassuoni
Copy link
Author

Thanks @anands-repo , will give it a try and see what happens....

@maryawood
Copy link

Hello,

I'm running into a similar problem of having VCF results without any calls in them. The program looks to be proceeding normally when I run it inside the docker container:

2022-08-26 19:41:50,599 INFO:Launching hotspot detection for chromosome chr16
Hotspots, chromosome chr16: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 501/501 [03:50<00:00,  2.17it/s]
2022-08-26 19:45:41,115 INFO:Combining all hotspots and sharding
2022-08-26 19:45:42,965 INFO:Completed sharding, creating caller commands for running NN
2022-08-26 19:45:42,966 INFO:Created call commands for chromosome chr16
2022-08-26 19:45:42,966 INFO:Finished hotspot generation for all chromosomes. Launching all caller commands.
Caller progress: 0it [00:00, ?it/s]
2022-08-26 19:45:42,967 INFO:Completed runs, checking log files
2022-08-26 19:45:42,967 INFO:Completed running all caller commands correctly, preparing vcf
VCF prep: 0it [00:00, ?it/s]
sort -k1,1d -k2,2n
2022-08-26 19:45:43,010 INFO:Completed runs. Results in local/results.output.vcf

But the VCF file does not contain any calls. I've checked that the BAM is readable inside the container, so I don't think that's the problem. I've also tried to install the software outside of the docker container to be sure, but I'm running into many install issues - would it be possible to make the Dockerfile available so I could see the necessary requirements for installation?

@anands-repo
Copy link
Owner

Hi,

Please find the information below

Dockerfile:

FROM ubuntu:bionic

# Initial installations
RUN apt-get -y update && apt-get -y install wget build-essential vim tmux vcftools bedtools

# Path where everything happens
WORKDIR /opt

# Install miniconda3
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.2-Linux-x86_64.sh
RUN bash Miniconda3-py37_4.8.2-Linux-x86_64.sh -b -p /opt/miniconda3
ENV PATH=/opt/miniconda3/bin:$PATH

# Install boost with boost::python
COPY user-config.jam /root/boost_1_69_0/tools/build/src/user-config.jam
RUN wget https://boostorg.jfrog.io/artifactory/main/release/1.69.0/source/boost_1_69_0.tar.gz && \
        tar xvfz boost_1_69_0.tar.gz && cd boost_1_69_0 && ./bootstrap.sh --prefix=/opt/boost && \
        ./b2 install && rm -rf boost_1_69_0*

# Install conda packages
RUN conda install pytorch==1.2.0 torchvision==0.4.0 -c pytorch && \
        conda install -y -c bioconda pysam pybedtools && \
        pip install pyvcf scikit-bio && \
        conda install -y biopython cmake h5py intervaltree more-itertools scikit-learn matplotlib tqdm

# Additional environment settings (weird that it doesn't work by itself)
RUN ldconfig
ENV LD_LIBRARY_PATH=/opt/boost/lib:$LD_LIBRARY_PATH

# Cleanup
RUN rm -rf /tmp/*

The Dockerfile uses the following user-config.jam file:

using python
    : 3.7
    : /opt/miniconda3/bin/python
    : /opt/miniconda3/include/python3.7m
;

@anands-repo
Copy link
Owner

Regarding the issue - from the logs, it looks like the initial heuristic that filters reads, and shortlists putative loci to run the Neural Network on, doesn't return anything. The following script can be run standalone to see what's happening:

python \
    python/HotspotDetectorDVFiltered.py \
        --bam BAMFILE \
        --ref REFERENCE \
        --region chr16 \
        --mapq_threshold 5 \
        --output /tmp/hotspots.json > /tmp/hotspots_logs.log 2>&1

@maryawood
Copy link

Thanks for your response! I'm still trying to get a local install working for further testing based on your Dockerfile, but I'm running into some issues with conda permissions based on the install location. In the meantime, I tried out the hotspot command you shared from within the available docker container. Even if I run it using the --debug flag, all I get in the log file is this:

2022-08-29 22:12:32,644 Started script
2022-08-29 22:16:40,987 Completed running the script

Should I be seeing additional messages?

@anands-repo
Copy link
Owner

Yes if --debug flag is used there will be a lot of text displayed (or in this case written to the log files). Did you notice any output in the /tmp/hotspots.json file, or is it still empty?

If it is empty, my best guess is that the reads are being filtered off at the quality control stage and not being used in any analysis. These are the read quality control filters:

    usable    = True;
    usable    = usable and not alignment.is_unmapped;  # Remove unmapped reads
    usable    = usable and not (alignment.is_secondary or alignment.is_supplementary);  # Remove secondary or supplementary alignments
    usable    = usable and not alignment.is_duplicate;  # Remove duplicate reads
    usable    = usable and (not alignment.is_paired or alignment.is_proper_pair);  # Remove improper pairs in the case of paired alignments
    usable    = usable and (alignment.mapping_quality > 0);  # Remove 0-mapping quality reads

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants