Skip to content

Latest commit

 

History

History
48 lines (41 loc) · 2.77 KB

Steps for running the pipeline.md

File metadata and controls

48 lines (41 loc) · 2.77 KB

Steps to start running the RNA-seq pipeline on the O2 cluster

  1. Login to O2 cluster with your HMS username & password using Putty or ....

  2. Begin an interactive session by running:

    srun --pty -p interactive -t 0-2:0:0 --mem 150G -c 15 /bin/bash

    You can request extra memory or multiple cores (up to 20). More information is found here.

  3. Install the required modules by running:

    module load conda2/4.2.13
    module load rcbio/1.0
    module load cellranger/2.2.0
    module load bcl2fastq/2.20.0.422
    module load gcc/6.2.0
    module load star/2.5.4a
    module load samtools/1.9
    module load python/2.7.12
    module load htseq/0.9.1
    module load fastx/0.0.13

  4. Create a new directory to put all your files and name it RNA-seq.

    mkdir RNA-seq

  5. Generating a genome index using STAR:

    Step 5 can be ignored if you will use an index from a shared directory, which has the lastest version of the mouse genome. Otherwise you will have to create a new one as shown below.

    • Go to RNA-seq folder you created in step 4, generate a new folder and name it Index as such:

      ls RNA-seq

      mkdir Index

    • Go to Index folder you just created and generate a new genome index as such:

      ls Index

      STAR --runMode genomeGenerate --genomeDir /home/kb246/RNA-seq/Index/ --genomeFastaFiles /home/kb246/genome/Mus_musculus.GRCm38.dna.primary_assembly.fa --sjdbGTFfile /home/kb246/genome/Mus_musculus.GRCm38.97.gtf --sjdbOverhang 50

      You should replace the path of the directories above with your username. (/home/username/RNA-seq/Index).

  6. Go back to RNA-seq you generated in the previous step and create a new folder, name it fastqFiles as such:

    cd ..

    mkdir fastqFiles

    Copy all your fastq files into this folder created using WinSCP (for Windows) & (...for Linux).

  7. Download pipeline.sh from Github and copy it into RNA-seq folder using (WinSCP).

  8. Run the commands in the file pipeline.sh by running:

    runAsPipeline pipeline.sh "sbatch -p short -t 20:0 -n 1" noTmp run

    Note: You must make sure that the file is in linux format and not windows. If you see $'\r': command not found" That is what the issue is. Note: To be able to run the above, you should have the file pipeline.sh inside RNA-seq folder, otherwise you will face an error.

Extra: Make sure you look at the fastqc results and trim them if needed.
fastx_trimmer -l N [-l N] = Last base to keep. Default is entire read.

fastqc -t 6 *.fq #note the extra parameter we specified for 6 threads