Skip to content

Exome Copy Number Variation Polisher via Deep Learning

Notifications You must be signed in to change notification settings

ciceklab/DECoNT

 
 

Repository files navigation

Polishing Copy Number Variant Calls on Exome Sequencing Data via Deep Learning


DECoNT is a deep learning based software that corrects CNV predictions on exome sequencing data using read depth sequences.

Deep Learning, Copy Number Variation, Whole Exome Sequencing


Authors

Furkan Ozden, Can Alkan, A. Ercument Cicek


Questions & comments

[firstauthorname].[firstauthorsurname]@bilkent.edu.tr


Reproducing the results given in the manuscript and toy example

To reproduce results given in the manuscript, Polishing Copy Number Variant Calls on Exome Sequencing Data via Deep Learning, please refer to https://zenodo.org/record/3865380#.XtVRchMzZds. This repository also includes the toy example tutorial.

Table of Contents

Warning: Please note that DECoNT software is completely free for academic usage. However it is licenced for commercial usage. Please first refer to the License section for more info.


Installation

  • DECoNT does not require installation in conventional manner. DECoNT is a python3 script ready to run with required packages installed.

Requirements

  • Python 3.7.6
  • NumPy 1.16.2
  • Pandas 1.0.0
  • TensorFlow 1.14.0
  • Keras 2.2.4
  • Scikit-Learn 0.22.1
  • keras-metrics 1.1.0
  • cudnn 7.6.5 (optional, for gpu support only) (keras-gpu 2.2.4 requires it)

For easy requirement handling, you can use DECoNT_linux.yml or DECoNT_mac.yml files to initialize appropriate environment with conda using:

$ conda env create -f DECoNT_linux.yml
$ conda activate DECoNT_linux

or

$ conda env create -f DECoNT_mac.yml
$ conda activate DECoNT_mac

Features

  • DECoNT provides GPU support optionally. See GPU Support section.
  • DECoNT provides ETA for the analysis with progress bar.
  • Upcoming version: custom training, custom call polishing.

Instructions Manual

Important notice: Please call the DECoNT_polish.py script from the scripts directory.

Required Arguments

-m, --model

  • For version 0.1, DECoNT provides pretrained weights for polishing CNV calls from the following WES-based CNV callers: (i) XHMM; (ii) CoNIFER; (iii) CODEX2; (iv) Control-FREEC.
  • If you want to use pretrained DECoNT weights for polishing set this argument to pretrained.
  • If you want to use custom model weights for DECoNT obtained using DECoNT_train.py script, please provide path to model weights with .h5 extension instead.

-cn, --callername

  • For version 0.1, DECoNT supports only XHMM, CoNIFER, CODEX2 and Control-FREEC. For future versions, DECoNT will be able to polish any CNV output format with a required CNV output template.
  • Set to one of the WES-based CNV caller names above for DECoNT to understand the required weights for the polishing process.

-i, --input

  • Relative or direct path to output file of selected WES based CNV caller.

-o, --output

  • Relative or direct output directory path to write DECoNT output file.

-s, --samples

  • Relative or direct directory path to read depth files of samples in the analysis (i.e. samples used in WES CNV calling). Note that, all read depth files must be in the format specified below in the examples section. The provided directory must not include any other files. Read depth files generated by Sambamba tool are directly accepted with no formatting requirements.
  • Read depth file names must have the following format: SAMPLENAME.read_depths.txt (e.g. HG00096.read_depths.txt)
  • The sample names should be consistent between obtained WES-CNV outputs and read depth file names.

Optional Arguments

-g, --gpu

  • Set to PCI BUS ID of the gpu in your system.
  • You can check, PCI BUS IDs of the gpus in your system with various ways. Using gpustat tool check IDs of the gpus in your system like below:
$ gpustat

-v, --version

-Check the version of DECoNT.

-h, --help

-See help page.

Usage Examples

Usage of DECoNT is very simple, also it comes with ETA and progress bar features!

Step-1: Use your preferred WES-based CNV caller to call CNVs on your WES dataset.

Step-2: Obtain read depth files for samples used in WES CNV calling.

  • Read depth counts obtained using Sambamba tool are directly accepted by DECoNT. Note that you should use -w option of sambamba with parameter 1000. By doing so, sambamba sets base-pair resolution to 1000bp. You can run sambamba on your inputs as follows:
$ sambamba depth window -w 1000 HG00096.wes.bam > /home/user/sambamba_read_depths/HG00096.wes.bam_read_depths.txt
  • Note that, all read depth files must have SAMPLENAME. prefix in the file name.
  • You can use any read depth generator you like, however for DECoNT to have unified input format, we require the following format for read depth files:

-Note that, DECoNT does not use mean coverage information column provided in the above file format figure. You can fill that column with all 0's.

  • For purposes of this tutorial, lets call the directory containing all described read depth files: /home/user/sambamba_read_depths/

Step-3: Run DECoNT on data obtained in Step-1 and Step-2

  • Requirements of DECoNT must be satisfied. For easy handling of requirements download DECoNT_mac.yml or DECoNT_linux.yml file and initialize environment of DECoNT as follows (optional).
$ conda env create -f DECoNT_mac.yml
$ conda activate DECoNT_mac
  • Note: for the scope of this tutorial, we assume that WES CNV calls are obtained using XHMM software. If you obtain WES CNV calls using any other software just change the -cn argument to that software.
  • After initializing the envorinment, run decont as follows:
$ python ./DECoNT_polish.py -m pretrained -cn XHMM -i /home/user/analysis.txt -o /home/user/ -s /home/user/sambamba_read_depths/
  • Optionally, if you have available gpu's, you can set -g argument to PCI BUS ID of the GPU you want to use. Please refer to Optional Arguments section. By default, script will use CPU.
$ python ./DECoNT_polish.py -g 5 -m pretrained -cn XHMM -i /home/user/analysis.txt -o /home/user/ -s /home/user/sambamba_read_depths/ 

Output file of DECoNT

  • At the end of the polishing procedure, DECoNT will write its output file to the directory given with -o option. In this tutorial it is /home/user/
  • Output file of DECoNT is a tab-delimited .bed like format.
  • Columns in the output file of DECoNT are the following with order: 1. Sample Name, 2. Chromosome, 3. CNV Start Index, 4. CNV End Index, 5. XHMM Prediction (XHMM name changes according to the -cn argument), 6. DECoNT Polished Prediction
  • Following figure is an example of DECoNT output file.


Running quick experiment with DECoNT:

Just follow the steps above, instead of analysis.txt use DATA_chaisson_hg00733.xcnv file provided. Also instead of /sambamba_read_depths/ directory use the directory in this link and repeat the steps.

Citations


License

  • CC BY-NC-SA 2.0
  • Copyright 2020 © DECoNT.
  • For commercial usage, please contact.

About

Exome Copy Number Variation Polisher via Deep Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%