-
Login to O2 cluster with your HMS username & password using Putty or ....
-
Begin an interactive session by running:
srun --pty -p interactive -t 0-2:0:0 --mem 150G -c 15 /bin/bash
You can request extra memory or multiple cores (up to 20). More information is foundhere
. -
Install the required modules by running:
module load conda2/4.2.13
module load rcbio/1.0
module load cellranger/2.2.0
module load bcl2fastq/2.20.0.422
module load gcc/6.2.0
module load star/2.5.4a
module load samtools/1.9
module load python/2.7.12
module load htseq/0.9.1
module load fastx/0.0.13
-
Create a new directory to put all your files and name it RNA-seq.
mkdir RNA-seq
-
Step 5 can be ignored if you will use an index from a shared directory, which has the lastest version of the mouse genome. Otherwise you will have to create a new one as shown below.
- Go to RNA-seq folder you created in step 4, generate a new folder and name it Index as such:
ls RNA-seq
mkdir Index
- Go to Index folder you just created and generate a new genome index as such:
ls Index
STAR --runMode genomeGenerate --genomeDir /home/kb246/RNA-seq/Index/ --genomeFastaFiles /home/kb246/genome/Mus_musculus.GRCm38.dna.primary_assembly.fa --sjdbGTFfile /home/kb246/genome/Mus_musculus.GRCm38.97.gtf --sjdbOverhang 50
You should replace the path of the directories above with your username. (/home/username/RNA-seq/Index
).
- Go to RNA-seq folder you created in step 4, generate a new folder and name it Index as such:
-
Go back to RNA-seq you generated in the previous step and create a new folder, name it fastqFiles as such:
cd ..
mkdir fastqFiles
Copy all your fastq files into this folder created using WinSCP (for Windows) & (...for Linux). -
Download pipeline.sh from Github and copy it into RNA-seq folder using (WinSCP).
-
Run the commands in the file pipeline.sh by running:
runAsPipeline pipeline.sh "sbatch -p short -t 20:0 -n 1" noTmp run
Note: You must make sure that the file is in linux format and not windows. If you see $'\r': command not found" That is what the issue is. Note: To be able to run the above, you should have the file pipeline.sh inside RNA-seq folder, otherwise you will face an error.
Extra: Make sure you look at the fastqc results and trim them if needed.
fastx_trimmer -l N
[-l N] = Last base to keep. Default is entire read.
fastqc -t 6 *.fq #note the extra parameter we specified for 6 threads