GitHub - DrRadan/RNAseq-processing-pipeline

OUTLINE

This analysis pipeline takes de-multiplexed, single- or paired-end, directional RNA-seq reads, tagged with the PCR duplicates’ marker, and produces a map of where these reads originated from in the genome.

WHAT IT DOES

STEP1 Fastq files, produced from an Illumina run with the our custom-made adaptors, demoultiplexed are read in, and any residual rRNA contained in this digital sample is removed. The report of this alignment can be found at /scratch/NetID/NNNNNNN.dRNAseq1end.joberr (or /scratch/NetID/NNNNNNN.dRNAseq2end.joberr for paired-end)

STEP2 The cleaned-up reads are now aligned with optimized parameters for yeast RNA-seq experiments. Spliced reads are not taken into account in this step. The report of this alignment can be found at /scratch/NetID/NNNNNNN.dRNAseq1end.joberr (or /scratch/NetID/NNNNNNN.dRNAseq2end.joberr for paired-end)

STEP3 The few (comparatively) reads that failed to align in the previous step is now tested for splice intron-exon junctions and aligned taking splicing into account. Whatever read still fails to align after this step, is reported in {name}_unaligned_1.fastq and {name}_unaligned_2.fastq.

STEP4 The aligned reads from the two processes are concatenated into a single alignment file wich is further trimmed by removing reads with poor mapping quality (MAPQ<15, can not be changed by the user at the moment. If you have concerns you could slightly modify the relevant samtools script within the RNAseq_1end_looseRibo.pbs or RNAseq_2end_looseRibo.pbs and save it in a private folder.

STEP5 The last step is the removal of possible PCR duplicates from the report and has three outputs: a. Possible PCR duplicates are not removed. These alignment report are named {name}_MAPQ15.sam (the .bam version is also available) and can imported into cufflinks and/or Rsubread/EdgeR for further analysis. b. Possible PCR duplicates removed with the long-published “coordinates” method. These alignment files are {name}_MAPQ15.noDUP.sam (the .bam version is also available) c. [Recommended] Possible PCR duplicates removed with our lab’s new method with custom-made adaptors incorporating PCR-indexes. These alignment files are {name}.rmdup.sam (the .bam version is also available) .If the “sorted” version is required for certain downstream analysis software use the {name}.rmdup.sort sam (the .bam version is also available) file.

DIRECTIONS

The read files must be de-multiplexed (DGseq demultiplexing) and have the .fastq suffix. Everything that precedes the fastq suffix is the name that will identify all the output files produced.
Log in your NYU HPC account and create a folder containing all the demultiplexed read files you want to analyze. This folder will be the /source/folder.
(Optional) Make a second folder that will become the /destination/folder. Alternatively, you can use the same directory every time you are asked for either the /source/folder or /destination/folder.
Go to the folder /data/cgsb/gresham/LABSHARE/SHARED/CODE/. You need to be listed as a member of the Gresham lab, registered with the NYU HPC team managing /data/cgsb/, to be able to read, write or execute the programs in this folder.
To submit the job and start the analysis of your results type:

Single-end reads

qsub –M {your NetID}@nyu.edu RNAseq_1end.pbs –v source={source/folder},destination={ destination/folder}, ref={reference genome}, name={name}

-M    [Optional] Enables automatic emails, upon initiation, termination or if the run is aborted (an exit status=0 indicates the job was terminated without problems). 

-v    There should not be any spaces between   n the commas
      {source/folder}	This is the folder you created in step 2).
      {destination/folder}	[Optional] This is the folder you created in step 3).
      {reference genome}	Currently only yeast is supported. The possible values are Scer3 (w/o spike-ins), ScERCC (w/ ERCC spike-ins).
      {name}	This is everything that precedes the .fastq suffix. This name will follow and identify all the results concerning this sample.

Paired-end reads

qsub –M {your NetID}@nyu.edu RNAseq_2end.pbs –v source={source/folder},destination={ destination/folder}, ref={reference genome}, name={name}

-M    [Optional] Enables automatic emails, upon initiation, termination or if the run is aborted (an exit status=0 indicates the job was terminated without problems). 

-v    There should not be any spaces between the commas
      {source/folder}	This is the folder you created in step 2).
      {destination/folder}	[Optional] This is the folder you created in step 3).
      {reference genome}	Currently only yeast is supported. The possible values are Scer3 (w/o spike-ins), ScERCC (w/ ERCC spike-ins).
      {name}	This is everything that precedes the _R1.fastq or _R2.fastq suffix. This name will follow and identify all the results concerning this sample.

OUTPUT

Two folders within the /scratch/NetID of the logged-in user, in the following format:NNNNNNN.dRNAseq_1end.jobout and NNNNNNN.dRNAseq_1end.joberr for single-end reads or NNNNNNN.dRNAseq_2end.jobout and NNNNNNN.dRNAseq_2end.joberr for paired-end reads. In the previous example, NNNNNNN is the unique job ID allocated automatically by the scheduler. These files contain some useful records of the analysis process and possible alerts.
At the assigned /destination/folder/:

The reads that failed to align in {name}_unaligned_1.fastq and {name}_unaligned_2.fastq for each read direction.
The alignment files containing any PCR duplicate cases that may have occurred in the process. Files {name}_MAPQ15.bam and {name}_MAPQ15.sam
The alignment files after removing PCR duplicates in the “coordinate method” alone {name}_MAPQ15.noDUP.bam and {name}_MAPQ15.noDUP.sam
[Recommended for further use] The alignment files after removing PCR duplicates in the “index method” {name}.rmdup.bam and {name}.rmdup.sam
The subfolder metrics/ that contains useful information about the various stages of the process you should expect tho see the folders {name}.rmdp_stat, {name}.ALN_Metrics and {name}.RNA_metrics

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Paired end		Paired end
Single end		Single end
Alignment_paired_end.pbs		Alignment_paired_end.pbs
Alignment_single_end.pbs		Alignment_single_end.pbs
README.md		README.md
check_dup_v2-1.pl		check_dup_v2-1.pl
check_pcr.pbs		check_pcr.pbs
check_pcr_singlEnd.pbs		check_pcr_singlEnd.pbs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OUTLINE

WHAT IT DOES

DIRECTIONS

OUTPUT

About

Releases

Packages

Languages

DrRadan/RNAseq-processing-pipeline

Folders and files

Latest commit

History

Repository files navigation

OUTLINE

WHAT IT DOES

DIRECTIONS

OUTPUT

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages