Skip to content

Pipeline for the detection of alternative polyadenylation

Notifications You must be signed in to change notification settings

sudlab/pipeline_apa

Repository files navigation

Pipeline APA

Author: Ian Sudbery
Release:0.01
Date: 21/11/16
Tags:Python

Overview

This pipeline aims to identify cases of alternate exon useage from RNAseq data. It uses two different appraoched. The DaPars program will be applied, which bulids models of read depth over final exons to identify cases of APA. The second is to use DEXSeq to identify cases of alternate exon usage where the exon in question is the last exon in a transcript.

Usage

See :ref:`PipelineSettingUp` and :ref:`PipelineRunning` on general information how to use CGAT pipelines.

Configuration

The pipeline requires a configured :file:`pipeline.ini` file.

Default configuration files can be generated by executing:

python <srcdir>/pipeline_apa.py config

By default the pipeline will try to guess the experimental design but a design file can be provided, called :file:`design.tsv` to contain a different design. The file has three columns, a column with the comparison name, and two columns with regular expressions that match file in condition1 and condition2 respectively. e.g:

#name    pattern1           pattern2
tissue   heart-control.+    brain-control.+
kd       heart-kd.+         heart-control.+

If a design file is not present, files with control in the second part of the file name will be matched as controls for those with same first part, but different second part.

e.g.

if heart-control-r1 and heart-kd-r1 are present, the first will be used as the control for the second.

Input files

The input files are indexed bam files, named with three part names, seperated by a dash. Traditionally part 1 is the tissue or cell type, or experiment name, part 2 is the condition, and part 3 is the replicate. e.g.

heart-control-R1.bam

would be the heart control from replicate one.

Requirements

The pipeline requires the results from :doc:`pipeline_annotations`. Set the configuration variable :py:data:`annotations_database` and :py:data:`annotations_dir`.

On top of the default CGAT setup, the pipeline requires the following software to be in the path:

Requirements:

  • samtools >= 1.1
  • DaPars
  • R
  • DEXSeq
  • ExperimentR
  • bedtools
  • bgzip & tabix

Pipeline output

Most of the output is in the sqlite database associated with the pipeline (csvdb by default). Also exported are the last exon chunks found to be differentially used by DEXSeq in the export directory.

Diagram

pipeline_diagram.png

About

Pipeline for the detection of alternative polyadenylation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published