-
Notifications
You must be signed in to change notification settings - Fork 4
Home
This repository contains all code for the Guttman Lab SPRITE pipeline.
The main steps of this pipeline are:
- Identify barcodes in your sequenced reads.
- Align these reads to your genome of interest.
- Discard alignments that don't meet certain criteria.
- Group alignments into clusters.
- Create heatmaps from clusters.
Each of the individual steps are explained in more detail on each steps corresponding documentation page.
All of these steps have been automated using the Snakemake workflow management system. For detailed instruction of how to setup and use the pipeline please see the Snakemake-Pipeline page.
To install, simply clone this repository. The Java code has been packaged as a JAR file which seems to run without problem on Mac and Linux systems.
The SPRITE pipeline has been tested on a high performance computing cluster running CentOS 7 and a local environment with 30GB of RAM, an i7-8750H CPU runningUbuntu 18.04.3 LTS. Local runtime for fastq files with around 45 million reads was approximately 7 hrs.
- Snakemake pipeline software (https://snakemake.readthedocs.io/en/stable/)
- Conda package (https://docs.conda.io/projects/conda/en/latest/) or miniconda package (https://docs.conda.io/en/latest/miniconda.html)
- Python 3.7.3 (https://www.python.org/)
- Java 8 (https://www.java.com/en/download/)
- R software v3.6.1 (https://www.r-project.org/)
- Hi-Corrector v1.2 software (https://github.com/jasminezhoulab/Hi-Corrector) (also included in this repository)
- Bowtie2 v2.3.5
- Bedtools v2.29.0
- Multiqc v1.6
- Samtools v1.9
- Trim galore! V0.6.2
- Cutadapt v2.5
- Pigz v2.3.4
- Fastqc v0.11.8
- Python packages:
- Pysam v0.15.0.1
- Numpy v1.17.2
- R packages
- Ggplot2 v3.1.1
- Gplots v3.0.1.1
- Readr v1.3.1
- Optparse v1.6.2
If you need to create a new JAR file rather than use the provided one, you'll need
- Java 8
- the Apache Commons CLI library
- our lab's bio library.
The bio library also uses the htsjdk library for parsing BAM files. This workflow doesn't parse BAM files with Java/bio, but your IDE may display a lot of ugly warnings if htsjdk isn't on your build path.