#kallisto-nf-reproduce This repository contains the software, scripts and data to reproduce the RNA-Seq results decribed in the Nextflow publication.
The repository contains two versions of a tradtional bash style pipeline for Mac and Linux (kallisto-mac and kallisto-linux) as well as the Nextflow version of the pipeline compatible across platforms (kallisto-nf).
kallisto-nf exisits as a git submodule within this repository. To clone the repository, including the submodule, one can include the --recursive
flag:
git clone --recursive https://github.com/cbcrg/kallisto-nf-reproduce.git
cd kallisto-nf-reproduce
All data is available from the original sources, as well as a compressed tarball (~22GB).
To download and uncompress the data use the following command:
mkdir data
wget -O- http://genome.crg.es/~cnotredame/data/supp/nextflow/kallisto_data.tar.gz | tar xz -C data
If you wish to retrieve the data from the original sources, you can find it here:
- Reads: All Illumina HiSeq2000 read data can be downloaded from the NCBI SRA GEO: GSE37703.
- Transcriptome: The transcriptome GRCh38 release 79 (cDNA all) is available from the kallisto website here.
Install Kallisto version 0.42.4.
Install Sleuth
Launch the kallisto bash pipeline script running the following command:
./kallisto-linux/kallisto-std.sh \
data/raw_reads \
data/transcriptome/Homo_sapiens.GRCh38.rel79.cdna.all.fa \
data/exp_info/hiseq_info.txt \
results-linux
Install Kallisto version 0.42.4.
Install Sleuth
Launch the kallisto bash pipeline script running the following command:
./kallisto-mac/kallisto-std.sh \
../data/raw_reads \
../data/transcriptome/Homo_sapiens.GRCh38.rel79.cdna.all.fa \
../data/exp_info/hiseq_info.txt \
results-mac
Install Nextflow with the following command:
curl -fsSL get.nextflow.io | bash
Install Docker following the instruction at this page.
Pull the Docker images used for this experiment (optional):
docker pull cbcrg/kallisto-nf:1.1
Once the read data has been downloaded from SRA, it is possible to reproduce the Nextflow version of the pipeline from the kallisto-nf directory using the following command:
nextflow run kallisto.nf \
--reads 'data/raw_reads/SRR4933*_{1,2}.fastq' \
--transcriptome data/transcriptome/Homo_sapiens.GRCh38.rel79.cdna.all.fa \
--experiment data/exp_info/hiseq_info.txt \
--output kallisto-nf-results \
-with-docker