-
Notifications
You must be signed in to change notification settings - Fork 0
Home
This pipeline is designed to process RNA-Seq data generated by total RNA-Seq strategy. One advantage of total RNA-Seq strategy is it can capture and sequence both linear and circular mRNA isoforms at a single run. circRNA is not considered in most existing gene expression quantification tools and pipelines which focused on polyA enriched RNA-Seq data. To solve this problem, we designed this total RNA-Seq analysis pipeline. It identifies circRNAs from total RNA-Seq first. Then, RNA-Seq reads will be distributed to linear and circular mRNA isoforms to quantify their expression.
The pipeline performs the following steps:
- Build BWA index
- Map RNA-seq to genome using bwa. The mapped sam will be used for circRNA calling
- circRNA calling using CIRI
- Add gene names to CIRI outputs by the in-house script
- Add circRNA to gene annotation in the gft format by the in-house script
- Extract both linear and circRNA transcripts using RSEM
- Convert linear transcript of circRNA to pseudo linear transcript. It also removes transcript with length less than reads length and any transcripts with “N”
- Generate transcript and gene mapping table for RSEM index
- Build RSEM index using transcript from step 7 and mapping from step 8
- Run RSEM to perform gene and transcript quantification
- Summarize RSEM output
- Combine and summarize the results from all samples
The pipeline can process fastq files stored in GDC or locally/S3.
-
Input files are stored in GDC.
- Sample catalog file. The pipeline uses the default catalog file for CPTAC3 samples if this file not provided, otherwise user should prepare a catalog file with similar format.
- Case ID file. A list of case IDs that needs to be processed. Each ID should be in a separate row.
The IDs must match the
case
column in the catalog file.
-
Input files are stored in local file system or S3.
-
Sample catalog file with the following tsv format.
sample_name RNAseq_R1 RNAseq_R2 sample1 /path/to/sample1_r1.fastq.gz /path/to/sample1_r2.fastq.gz sample2 /path/to/sample2_r1.fastq.gz /path/to/sample2_r2.fastq.gz ... ... ... Note: here the path can be absolute path in local file system or s3 path
-
There is no case id file needed.
-