Author: | Ian Sudbery |
---|---|
Release: | 0.01 |
Date: | 21/11/16 |
Tags: | Python |
This pipeline aims to identify cases of alternate exon useage from RNAseq data. It uses two different appraoched. The DaPars program will be applied, which bulids models of read depth over final exons to identify cases of APA. The second is to use DEXSeq to identify cases of alternate exon usage where the exon in question is the last exon in a transcript.
See :ref:`PipelineSettingUp` and :ref:`PipelineRunning` on general information how to use CGAT pipelines.
The pipeline requires a configured :file:`pipeline.ini` file.
Default configuration files can be generated by executing:
python <srcdir>/pipeline_apa.py config
By default the pipeline will try to guess the experimental design but a design file can be provided, called :file:`design.tsv` to contain a different design. The file has three columns, a column with the comparison name, and two columns with regular expressions that match file in condition1 and condition2 respectively. e.g:
#name pattern1 pattern2 tissue heart-control.+ brain-control.+ kd heart-kd.+ heart-control.+
If a design file is not present, files with control in the second part of the file name will be matched as controls for those with same first part, but different second part.
e.g.
if heart-control-r1 and heart-kd-r1 are present, the first will be used as the control for the second.
The input files are indexed bam files, named with three part names, seperated by a dash. Traditionally part 1 is the tissue or cell type, or experiment name, part 2 is the condition, and part 3 is the replicate. e.g.
heart-control-R1.bam
would be the heart control from replicate one.
The pipeline requires the results from :doc:`pipeline_annotations`. Set the configuration variable :py:data:`annotations_database` and :py:data:`annotations_dir`.
On top of the default CGAT setup, the pipeline requires the following software to be in the path:
Requirements:
- samtools >= 1.1
- DaPars
- R
- DEXSeq
- ExperimentR
- bedtools
- bgzip & tabix
Most of the output is in the sqlite database associated with the pipeline (csvdb by default). Also exported are the last exon chunks found to be differentially used by DEXSeq in the export directory.