Skip to content

A pipeline to identify A-to-I RNA editing sites using RNA-seq data.

Notifications You must be signed in to change notification settings

fachrulm/A-to-I-Catcher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 

Repository files navigation

A-to-I-Catcher

A pipeline to identify A-to-I RNA editing sites using RNA-seq data. This method was adapted from this paper by Ramaswami et al. (2013), following GATK's most current best practices.

STEPS for VC:

  1. Run 2-pass mapping using STAR (VCmapSTAR.sh, then 2pass_VCmapSTAR.sh).

NOTE: Check BAM file with Picard's ValidateSamFile (validbam.sh) each time a BAM is generated.

  1. Add read group using Picard's AddOrReplaceReadGroups (addReadGroup.sh).
  2. Identify and remove duplicate reads with Picard's MarkDuplicates (picardup.sh).
  3. Filter reads with low MAPQ (<20) with samtools (filtersam.sh).
  4. Index BAM file from the previous step (index.sh).
  5. Split N Trim BAM file of N CIGAR reads using GATK's SpliNCigarReads (splitncigar.sh).
  6. Base Score Recalibration with GATK's BaseRecalibrator (base_recalibrator.sh).
  7. Apply base recalibration with GATK's applyBQSR (applybqst.sh), then run variant calling with GATK's HaplotypeCaller (gvcf_haplotypeCaller.sh).
  8. Merge GVCF files into a single VCF file with GATK's GenotypeGVCFs (genotypegvcfs.sh).

NOTE: Check VCF file with Picard's ValidateVCF (validvcf.sh).

  1. Variant Score Recalibration with GATK's VariantRecalibrator (variantRecalib.sh), then applyVQRSR (applyvqsr.sh) to generate a variant-recalibrated VCF.
  2. Select only variants from VCF (snponly.sh), then filter variants against known SNPs [avsnp138] and splicing junctions [dbscsnv11] with ANNOVAR (inputannovar.sh, dbsnp_annovar.sh and spl_annovar.sh).
  3. Filter only A-to-I editing sites (AtoIFilter.sh).
  4. Separate variants in Alu and non-Alu regions (alufilter.sh).
  5. For in Alu variants, directly annotate to UCSC's knownGene (knownGene.sh).

The rest of the steps are meant for non Alu variants.

  1. Remove simple repeats, annotation from UCSC's RepeatMasker (bedfilter.sh).
  2. Remove variants in homopolymer regions (homopolymer.sh).
  3. Ensure unique mapping using BLAT (BLAT.sh).
  4. Separate variants into repetitive and non-repetitive non Alu variants (repeatorno.sh).
  5. Annotate to UCSC's knownGene (nonALU_knownGene.sh).

About

A pipeline to identify A-to-I RNA editing sites using RNA-seq data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published