Skip to content

mdelcorvo/DeSeq-Free

Repository files navigation

DeSeq-Free

DeSeq-Free (Whole genome Deep Sequencing analysis of Cell Free tumor DNA ) is a Snakemake workflow, aimed to analyze WGS of circulating cell-free DNA (cfDNA) in the plasma of cancer patients in a reproducible, automated, and partially contained manner. It is implemented such that alternative or similar analysis can be added or removed.

Contents

Using the DeSeq-Free workflow

We assume that you already have conda installed, otherwise you can easily install it:

To install conda: https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html

In order to ease the use of DeSeq-Free, we provide a yml file for conda with all required tools, including Snakemake.

To use DeSeq-Free:

git clone https://github.com/mdelcorvo/DeSeq-Free.git
cd DeSeq-Free && conda env create -f envs/workflow.yaml
conda activate DeSeq-Free_workflow

#edit config and prepare a (csv or excel) input file

snakemake --use-conda \
--config \
input=inputfile.xlsx \
output=../output_directory \
genome=genome.fasta

Prepare reference genome

  • Reference genome
    Before starting, a user need to download reference genome.

    Download from NCBI, Ensembl, or any other autorities

    wget https://ftp.ensembl.org/pub/release-100/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz
    
  • Index reference genome for bwa-mem2
    Prepare indexed genome for bwa-mem2 to boost mapping. Refer to the bwa-mem2 instruction.

    • Example code:
      ./bwa-mem2 index <in.fasta>
      Where 
      <in.fasta> is the path to reference sequence fasta file and 
      

Documentation

The pipeline leverages several tools to QC DeSeq-Free library, create statistics/interactive report and calculate/annotate interaction matrices at different bin size: bwa mem, pairtools, juicer, cooler, pairix, Macs2 and FitHiChIP.

You will need to specify the location of the reference genome (hg38) in fasta/fa format with bwa index. Use the parameter genome_data in the config file to add it.

Input files

Users are required to provide a metadata file for running the DeSeq-Free workflow:

  • metadata file – a tab-delimited text file listing the name of the samples, the sequencing technology and the paths to raw paired FASTQ files
sample platform fq1 fq2
Sample1 ILLUMINA data/S1_1.fastq.gz data/S1_2.fastq.gz
Sample2 ILLUMINA data/S2_1.fastq.gz data/S2_2.fastq.gz
Sample3 ILLUMINA data/S3_1.fastq.gz data/S3_2.fastq.gz
  • configuration file

The configuration file (config.yaml) contains all the paths to input, output and reference files and additional parameters to customize the pipeline and the performed tests. All of these need to be carefully specified in accordance with the specific experiment.

Important: ALL relative paths will be interpreted relative to the directory where the Snakefile is located. Alternatively, you can use absolute paths.

  • reference in a fasta file format, e.g. hg38 with bwa index

Output files

  • Somatic variant analysis
  • Variant allele frequency
  • Annotation of somatic variants
  • Somatic signatures
  • Analysis of somatic CNAs
  • Fragment size analysis

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published