Home

Introduction

This pipeline is designed to process RNA-Seq data generated by total RNA-Seq strategy. One advantage of total RNA-Seq strategy is it can capture and sequence both linear and circular mRNA isoforms at a single run. circRNA is not considered in most existing gene expression quantification tools and pipelines which focused on polyA enriched RNA-Seq data. To solve this problem, we designed this total RNA-Seq analysis pipeline. It identifies circRNAs from total RNA-Seq first. Then, RNA-Seq reads will be distributed to linear and circular mRNA isoforms to quantify their expression.

Pipeline summary

The pipeline performs the following steps:

Build BWA index
Map RNA-seq to genome using bwa. The mapped sam will be used for circRNA calling
circRNA calling using CIRI
Add gene names to CIRI outputs by the in-house script
Add circRNA to gene annotation in the gft format by the in-house script
Extract both linear and circRNA transcripts using RSEM
Convert linear transcript of circRNA to pseudo linear transcript. It also removes transcript with length less than reads length and any transcripts with “N”
Generate transcript and gene mapping table for RSEM index
Build RSEM index using transcript from step 7 and mapping from step 8
Run RSEM to perform gene and transcript quantification
Summarize RSEM output
Combine and summarize the results from all samples

Inputs

The pipeline can process fastq files stored in GDC or locally/S3.

Input files are stored in GDC.
- Sample catalog file. The pipeline uses the default catalog file for CPTAC3 samples if this file not provided, otherwise user should prepare a catalog file with similar format.
- Case ID file. A list of case IDs that needs to be processed. Each ID should be in a separate row. The IDs must match the case column in the catalog file.
Input files are stored in local file system or S3.
- Sample catalog file with the following tsv format.
  
  sample_name RNAseq_R1 RNAseq_R2
  
  sample1 /path/to/sample1_r1.fastq.gz /path/to/sample1_r2.fastq.gz
  
  sample2 /path/to/sample2_r1.fastq.gz /path/to/sample2_r2.fastq.gz
  
  ... ... ...
  
  Note: here the path can be absolute path in local file system or s3 path
- There is no case id file needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Introduction

Pipeline summary

Inputs

Clone this wiki locally

sample_name	RNAseq_R1	RNAseq_R2
sample1	/path/to/sample1_r1.fastq.gz	/path/to/sample1_r2.fastq.gz
sample2	/path/to/sample2_r1.fastq.gz	/path/to/sample2_r2.fastq.gz
...	...	...