Steps to start running the RNA-seq pipeline on the O2 cluster

Login to O2 cluster with your HMS username & password using Putty or ....
Begin an interactive session by running:

srun --pty -p interactive -t 0-2:0:0 --mem 150G -c 15 /bin/bash

You can request extra memory or multiple cores (up to 20). More information is found here.
Install the required modules by running:

module load conda2/4.2.13
module load rcbio/1.0
module load cellranger/2.2.0
module load bcl2fastq/2.20.0.422
module load gcc/6.2.0
module load star/2.5.4a
module load samtools/1.9
module load python/2.7.12
module load htseq/0.9.1
module load fastx/0.0.13
Create a new directory to put all your files and name it RNA-seq.

mkdir RNA-seq
Generating a genome index using STAR:

Step 5 can be ignored if you will use an index from a shared directory, which has the lastest version of the mouse genome. Otherwise you will have to create a new one as shown below.
- Go to RNA-seq folder you created in step 4, generate a new folder and name it Index as such:
  
  ls RNA-seq
  
  mkdir Index
- Go to Index folder you just created and generate a new genome index as such:
  
  ls Index
  
  STAR --runMode genomeGenerate --genomeDir /home/kb246/RNA-seq/Index/ --genomeFastaFiles /home/kb246/genome/Mus_musculus.GRCm38.dna.primary_assembly.fa --sjdbGTFfile /home/kb246/genome/Mus_musculus.GRCm38.97.gtf --sjdbOverhang 50
  
  You should replace the path of the directories above with your username. (/home/username/RNA-seq/Index).
Go back to RNA-seq you generated in the previous step and create a new folder, name it fastqFiles as such:

cd ..

mkdir fastqFiles

Copy all your fastq files into this folder created using WinSCP (for Windows) & (...for Linux).
Download pipeline.sh from Github and copy it into RNA-seq folder using (WinSCP).
Run the commands in the file pipeline.sh by running:

runAsPipeline pipeline.sh "sbatch -p short -t 20:0 -n 1" noTmp run

Note: You must make sure that the file is in linux format and not windows. If you see $'\r': command not found" That is what the issue is. Note: To be able to run the above, you should have the file pipeline.sh inside RNA-seq folder, otherwise you will face an error.

Extra: Make sure you look at the fastqc results and trim them if needed.
fastx_trimmer -l N [-l N] = Last base to keep. Default is entire read.

fastqc -t 6 *.fq #note the extra parameter we specified for 6 threads

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Steps for running the pipeline.md

Steps for running the pipeline.md

Steps to start running the RNA-seq pipeline on the O2 cluster

Generating a genome index using STAR:

Files

Steps for running the pipeline.md

Latest commit

History

Steps for running the pipeline.md

File metadata and controls

Steps to start running the RNA-seq pipeline on the O2 cluster

Generating a genome index using STAR: