DeSeq-Free (Whole genome Deep Sequencing analysis of Cell Free tumor DNA ) is a Snakemake workflow, aimed to analyze WGS of circulating cell-free DNA (cfDNA) in the plasma of cancer patients together with their matched germline and tumour samples in a reproducible, automated, and partially contained manner. It is implemented such that alternative or similar analysis can be added or removed.
We assume that you already have conda installed, otherwise you can easily install it:
To install conda: https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html
-
Input:
metafile (can be .xlsx or .csv) with raw fastq.gz data that looks as follows:
sample, lane, fq1, fq2, type Sample1, lane1, S1_L001_R1_001.fastq.gz, S1_L001_R2_001.fastq.gz, 0 Sample1, lane2, S1_L002_R1_001.fastq.gz, S1_L002_R2_001.fastq.gz, 0
Each row represents a single-end fastq file. Rows with the same sample identifier are considered technical replicates and will be automatically merged.
type
refers to sample type (0= buffy coat, 1= plasma, 2=tumor).-
Reference genome
Before starting, a user need to download reference genome.
Download from NCBI, Ensembl, or any other autorities
wget https://ftp.ensembl.org/pub/release-100/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz
-
Index reference genome for bwa-mem2
Prepare indexed genome for bwa-mem2 to boost mapping. Refer to the bwa-mem2 instruction.
Example code:
./bwa-mem2 index <in.fasta> Where <in.fasta> is the path to reference sequence fasta file and
-
-
-
Code:
git clone https://github.com/mdelcorvo/DeSeq-Free.git cd DeSeq-Free && conda env create -f envs/workflow.yaml conda activate DeSeq-Free_workflow snakemake --use-conda \ --config \ input=inputfile.xlsx \ output=output_directory \ genome=genome.fasta
- Somatic variant analysis
- Variant allele frequency
- Annotation of somatic variants
- Somatic signatures
- Analysis of somatic CNAs
- Fragment size analysis