Skip to content

Data processing steps

Zhiao Shi edited this page Jul 20, 2021 · 8 revisions

circRNA identification

  1. Map RNA-seq to genome using bwa. The mapped sam will be used for circRNA calling.

    • bwa mem -T 19 GRCh38.p13.genome.fa RNAseq_R1.fastq RNAseq_R2.fastq -o bwa-pe-for-CIRI.sam
  2. circRNA calling using CIRI.

    • perl CIRI2.pl 16 -I bwa-pe-for-CIRI.sam -O results_CIRI.txt -F GRCh38.p13.genome.fa -A gencode.v34.basic.annotation.gtf

Prepare index for linear and circRNA quantification

  1. Add gene names to CIRI outputs by the in-house script.

    • perl 1_add_gene_name_to_CIRI_results.pl
  2. Add circRNA to gene annotation in the gft format by the in-house script.

    • perl 2_add_linear_circular_isoform_to_gtf.pl
  3. Extract both linear and circRNA transcripts using RSEM.

    • rsem-extract-reference-transcripts GRCh38.d1.vd1.linear.and.circrna.isoforms 0 gencode.v34.basic.annotation.gtf None 0 GRCh38.p13.genome.fa
  4. Transfer linear transcript of circRNA to pseudo linear transcript.

    It will remove transcript with length less than reads length and any transcripts with “N”.

    • perl 3_circular_linear_to_psedo_linear.pl
  5. Generate transcript and gene mapping table for RSEM index.

    • perl 4_gene_isoform_mapping_with_circular_rna.pl
  6. Build RSEM index using Bowtie2 as mapping tool and transcript from step 4 and mapping from step 5

    • rsem-prepare-reference -p 16 --transcript-to-gene-map gene_isoform_mapping_with_circular_rna_for_RSEM.txt --bowtie2 --bowtie2-path bowtie2-2.3.3/ GRCh38.d1.vd1.linear.and.circrna.as.pseudo.linear.transcripts.fa RSEM_index/hg38

Run RSEM to do quantification for both linear and circular transcripts.

  • rsem-calculate-expression --bowtie2 --bowtie2-path bowtie2-2.3.3/ --paired-end RNAseq_R1.fastq RNAseq_R2.fastq RSEM_index/hg38 RSEM_results/RNAseq

Summarize RSEM output to gene and circRNA level

  • perl 5-summary-gene-quantification.pl